SRE Blog

Better living through reliability.


2022-02-22 SRE Postmortem Template
2021-09-08 Postmortem Tip of the Day: Should Never Have Happened
2021-09-07 ServerLatencyTooHigh Alert
2021-05-16 Postmortem Tip of the Day: Idiosyncratic Knowledge
2021-05-15 Postmortem Tip of the Day: Meeting Narrative
2021-05-14 Postmortem Tip of the Day: Clarity
2021-05-11 Postmortem Tip of the Day: Background
2021-05-03 Incident Tip of the Day: Fast Fix Monitoring
2021-03-29 Server500sTooHigh Alert
2021-03-28 ProcessCrashloopingSlowly Alert
2021-03-27 ProbeFailing Alert
2021-03-26 Postmortem Tip of the Day: Summary Above the Fold
2021-03-25 PodContainerRestarting Alert
2021-03-24 ProcessCrashlooping Alert
2020-08-15 MemoryUsageTooHigh Alert
2020-08-13 JVMHeapUsageTooHigh Alert
2020-08-11 GoroutinesTooHigh Alert
2020-08-07 Postmortem Tip of the Day: Summaries
2020-08-05 Postmortem Tip of the Day: Root Causes
2020-08-04 FileDescriptorsTooHigh Alert
2020-08-03 Proactive vs Reactive Alerts
2020-08-02 Tips to Prevent Spammy Alerts
2020-08-01 DiskUsageTooHigh Alert
2020-07-30 CPUUsageTooHigh Alert
2020-07-29 Base Alerts
2020-07-27 Pre-Postmortem Meetings


I'm an SRE and so can you! This blog translates SRE principles into practical, concrete advice. SRE books outline the high level princples of SRE, but often folks still struggle when implementing alerts, production reviews, SLOs, error budgets, etc. I'll help you supercharge your SRE skills, just as if you were a member of my SRE team!


If you enjoy the content or have a question you'd like me to answer on the blog, email sreblog@.