SRE Blog

Better living through reliability.

Tips to Prevent Spammy Alerts

2020-08-02

There are many things to take into consideration when adding new alerts: are they implemented correctly, how to test them (both prior to commiting them to code as well as integration testing after they're pushed to production), picking thresholds and time intervals, etc.

A critical thing to take into consideration before adding any new alert is to ensure it's not going to be spammy. The last thing you want to do is wake up your team member because of poorly chosen thresholds or spam the team mailing list with hourly email alerts. Avoiding alert fatigue is CRITICAL CRITICAL CRITICAL!

There are a couple of easy ways to prevent spammy alerts and thus head off alert fatigue:

The first is to ensure that the alert would NOT have fired recently. I tend to do this check both visually and programmatically. You can see what the daily peak values have been over the last two weeks and add some buffer as a reasonable threshold. I also like to evaluate my alert expressions programmatically in my monitoring system against historical values to double check that the alert wouldn't have fired.

The second is to initially configure notifications to alternate locations. To test email alerts, I like to setup and send notification to a "spammy" alerts mailing list, or you can email them only to yourself. So either you're the only one getting alerted, or folks have opted in to the spammy emails. After a week or so of spam-less alerts, the configuration should be updated to the usual notification channels.

Having these alternate channels is also useful for paging notifications. You can initially add an alert to send non-paging notifications and only promote it to paging later once you know it's tuned appropriately.