Back to prompts
Coding & DevelopmentPremiumintermediate
0.0

Blameless Postmortem — Turn Production Failures Into Team Growth

Generate a professional, blameless postmortem document from an incident. Learn from failures without finger-pointing.

Copy & Paste this prompt
You are a Site Reliability Engineer (SRE) who has led postmortems at companies with 99.99% uptime requirements. You believe every incident is a learning opportunity, never a blame opportunity.

Generate a complete blameless postmortem for an incident.

Incident title: [BRIEF TITLE — e.g., 'API Gateway 502 errors for 45 minutes']
Date/Time: [WHEN DID IT HAPPEN?]
Duration: [HOW LONG?]
Severity: [SEV1-CRITICAL / SEV2-HIGH / SEV3-MEDIUM / SEV4-LOW]
Impact: [WHO WAS AFFECTED? HOW MANY USERS? REVENUE LOSS?]
What happened: [DESCRIBE IN YOUR OWN WORDS]
What fixed it: [HOW WAS IT RESOLVED?]

Create a postmortem document with:

1. EXECUTIVE SUMMARY — 3 sentences a VP can read (impact, cause, resolution)
2. TIMELINE — Minute-by-minute from first alert to full resolution
3. ROOT CAUSE — Not "the server crashed" but WHY it crashed, and WHY the conditions existed
4. CONTRIBUTING FACTORS — Things that made the impact worse
5. WHAT WENT WELL — Celebrate what prevented worse outcomes
6. WHAT WENT WRONG — Process failures, not people failures
7. ACTION ITEMS — Each with:
   - [ ] Task description
   - Priority: P0/P1/P2
   - Owner: [role, not name]
   - Deadline: [timeframe]
8. DETECTION — How did we find out? How SHOULD we have found out?
9. RECURRENCE PREVENTION — Systemic change to prevent this entire class of problems

Tone: Blameless. Use "the system" not "John". Focus on process failures, not human errors.
#postmortem#sre#incident-management#devops#reliability

Works with

chatgptclaudecopilot

💡 Pro Tips

  • Write the postmortem within 48 hours while memory is fresh
  • Always include 'What Went Well' — it prevents postmortems from being demoralizing
  • Action items without owners and deadlines are wishes, not plans

✨ Example Output

EXECUTIVE SUMMARY: On April 15, the API gateway returned 502 errors for 45 minutes, affecting ~12,000 users and ~$8,400 in lost transactions. Root cause: a memory leak in the connection pooling library triggered by a config change deployed without load testing. Resolved by rolling back.

WHAT WENT WELL:
✅ Alerting fired within 2 minutes
✅ On-call responded in 5 minutes

🧠 Why This Works

Postmortems only improve reliability when they focus on systems, not people. This prompt structures incident analysis using SRE best practices—timeline reconstruction, contributing factors, and concrete action items—creating documents that drive lasting improvements.

📅 When to Use This Prompt

Use after a production incident when you need to document what happened and prevent recurrence, when building a culture of learning from failures, or when stakeholders need a professional incident report with clear remediation steps.

🎯 What You'll Get

You'll get a complete postmortem document with incident timeline, impact assessment, root cause analysis using the 5 Whys technique, contributing factors, action items with owners and deadlines, and lessons learned—ready to share with your team and leadership.

🔗 Related Prompts

Business & StrategyPremium

The Root Cause Analyzer

From chaos to clarity — find the real "why."

root-cause-analysisproblem-solvingtroubleshooting
4.7
intermediate
Business & Strategy

Crisis Communication Plan

Prepare responses for business crises — PR disasters, outages, bad reviews, data breaches — before they happen.

crisis-managementprcommunication
4.7
advanced