Better living through reliability.
As we migrated from host base alerts to application alerts with Prometheus, devs asked which alerts they should implement. A reasonable question. The common and (in my opinion) not super useful answer are the Four Golden Signals. The golden signals, while technically correct, remain open to interpretation which is friction to understanding.
So instead I'll give you the list of alerts you should have for every production service (in alphabetical order):
These come out of experience running distributed systems from several prior companies. Now one might argue that some of these are redundant or not applicable to certain circumstances, and you'd be right! But if you're arguing with me about nuance, then I've done my job to provide enough of a framework for you to build on and mold to your specific needs.
In future posts, we'll expand on each of these, explaining why each is useful, what each intends to catch, what types of services might not need these alerts, etc.