Without metrics there are no Insights when you needs them, which results in confusing, loss of control, misplaced blame. Good Pervasive Telemetry Practices are:
- Make metrics it easy
- Writing to and reading from metrics must be easy
- Access to metrics must be easy and highly visible
- Like a status page, so everyone can see & radiate all progress / change information transparently
- Monitor every source
- Err on the side of too much monitoring
- Collect metrics holistically (e.g. no separation between application and operations)
- Extract metrics from logs => create statistics
- Write all appropriate logs & use appropriate log levels
- If it’s worth implementing, it’s worth monitoring (or not vice versa not).
- Any new business functionality must result in appropriate new business metrics
- Every Deployment Stage must have metrics
- Monitor every layer (4-Layer Architecture + Deployment)
- Application layer metrics must contain resource use, auth, session, timing, ..
- Business/Domain layer metrics must always map to business goal (or they are vanity = superfluous != useful <=> actionable)
- Infra layer metrics must be relatable to services (so devs can understand them)
- Deploy layer metrics, when related to the other metrics, put them into context of code deploys => allow devs to debug / work