SRE Blog

Better living through reliability.

ServerLatencyTooHigh Alert


Our next alert (alphabetically) in our series on base alerts is ServerLatencyTooHigh. This fires when average (or mean) server response latency is higher than expected.

Server response latency to clients should be under a certain known or desired amount of time. It's usually easiest to monitor the response latency as returned in the server itself, even though this latency may not be representative of the total latency clients experience (such as in cases where there are reverse proxies upstream of the service buffering responses).

In certain (most?) monitoring systems it may be easiest to measure the average or mean latency of responses as opposed to measuring some tail percentile latencies. Measuring the distribution of latency is preferred as tail latency is often much worse than average latency, but pulling all the data together from many server instances to compute the distribution is often not implemented by monitoring systems.

Response latency MAY be measured both at an individual server level and in aggregate across all server instances. It's usually easier, less noisy, and a more accurate measure of customer impact to measure in aggregate across all instances.

Alert on a fixed threshold based on a meaningful latency to your customers. Also, you probably want to alert on the latency across all (non-healthcheck) urls (as opposed to alerting on every url). For urls important to customer experience (such as login), additional individual alerts SHOULD be added to ensure critical urls' latency aren't lost in the noise of other traffic.

Recommended thresholds are paging alerts over one or two seconds. The value should be above the threshold between 5 and 20 minutes before firing.