SRE Blog

Better living through reliability.

Postmortem Tip of the Day: Summaries

2020-08-07

The summary is the first thing your stakeholders read. Most of them are busy so may not read much beyond it, so make it count.

A summary should be at most a few sentences and high level. It will clearly establish primary effect (or impact), primary cause, and the fix (or resolution). Other sections of the postmortem cover each of these in more depth, so keep the summary terse and dense.

A basic formula is something like:

  1. A sentence succinctly describing the primary/most important thing that broke (e.g. logins, database, pipelines). Include the high level impact to that thing (e.g. logins were failing, all microservices were crashlooping) and some pretty specific numbers of impact (e.g. number of users impacted, for how long it was broken, etc.).
  2. A sentence or two that makes it clear what the root cause was (e.g. ran out of database connections, typo in config, etc.).
  3. And a very brief sentence outlining the fix/resolution (rolled back a release, resized disk, etc.).