This page is part of my personal knowledge database, that helps me to store and navigate my learnings.
Read on here for details


Having Alarming in place, means to have automated systems that are connected to telemetry, that trigger actions - like notifying operators in case of non-automatic-resolvable incidents.

While such tooling is required within Complex Systems - so that small teams can operate large scale services within Programmable Infrastructure with a huge number of components - it is also an indicator of Technical Debt, if overused.

Predictive Alarming - that is notifying operators before an incident happens, based on probability predictors - can be a good tool for optimizing an already reliable system, but should not used during the early transformation phases.