Error Handling Architecture — Stop Catching Generic Errors
Design a complete error handling system for your app. Custom error classes, proper logging, user-friendly messages, retr…
Generate a professional, blameless postmortem document from an incident. Learn from failures without finger-pointing.
You are a Site Reliability Engineer (SRE) who has led postmortems at companies with 99.99% uptime requirements. You believe every incident is a learning opportunity, never a blame opportunity. Generate a complete blameless postmortem for an incident. Incident title: [BRIEF TITLE — e.g., 'API Gateway 502 errors for 45 minutes'] Date/Time: [WHEN DID IT HAPPEN?] Duration: [HOW LONG?] Severity: [SEV1-CRITICAL / SEV2-HIGH / SEV3-MEDIUM / SEV4-LOW] Impact: [WHO WAS AFFECTED? HOW MANY USERS? REVENUE LOSS?] What happened: [DESCRIBE IN YOUR OWN WORDS] What fixed it: [HOW WAS IT RESOLVED?] Create a postmortem document with: 1. EXECUTIVE SUMMARY — 3 sentences a VP can read (impact, cause, resolution) 2. TIMELINE — Minute-by-minute from first alert to full resolution 3. ROOT CAUSE — Not "the server crashed" but WHY it crashed, and WHY the conditions existed 4. CONTRIBUTING FACTORS — Things that made the impact worse 5. WHAT WENT WELL — Celebrate what prevented worse outcomes 6. WHAT WENT WRONG — Process failures, not people failures 7. ACTION ITEMS — Each with: - [ ] Task description - Priority: P0/P1/P2 - Owner: [role, not name] - Deadline: [timeframe] 8. DETECTION — How did we find out? How SHOULD we have found out? 9. RECURRENCE PREVENTION — Systemic change to prevent this entire class of problems Tone: Blameless. Use "the system" not "John". Focus on process failures, not human errors.
EXECUTIVE SUMMARY: On April 15, the API gateway returned 502 errors for 45 minutes, affecting ~12,000 users and ~$8,400 in lost transactions. Root cause: a memory leak in the connection pooling library triggered by a config change deployed without load testing. Resolved by rolling back. WHAT WENT WELL: ✅ Alerting fired within 2 minutes ✅ On-call responded in 5 minutes
Postmortems only improve reliability when they focus on systems, not people. This prompt structures incident analysis using SRE best practices—timeline reconstruction, contributing factors, and concrete action items—creating documents that drive lasting improvements.
Use after a production incident when you need to document what happened and prevent recurrence, when building a culture of learning from failures, or when stakeholders need a professional incident report with clear remediation steps.
You'll get a complete postmortem document with incident timeline, impact assessment, root cause analysis using the 5 Whys technique, contributing factors, action items with owners and deadlines, and lessons learned—ready to share with your team and leadership.
Design a complete error handling system for your app. Custom error classes, proper logging, user-friendly messages, retr…
A comprehensive security review prompt that catches the top 20 vulnerabilities in any codebase. OWASP-aligned.
Stop forgetting why you made that technical decision 6 months ago. This prompt creates a complete ADR in 2 minutes.
From chaos to clarity — find the real "why."
Prepare responses for business crises — PR disasters, outages, bad reviews, data breaches — before they happen.