Prompt Debugging Framework — Fix AI Outputs That Aren't What You Wanted
Systematically diagnose why your prompt isn't giving good results and fix it — using a structured debugging approach tha…
Create a rubric to evaluate AI outputs objectively — useful for comparing models, testing prompts, or building quality assurance into AI workflows.
You are a quality assurance specialist for AI outputs. Help me build a grading system for evaluating AI responses. What I'm Evaluating: - Task type: [Writing / Analysis / Code / Creative / Research / Customer service / etc.] - The prompt that generated the output: [PASTE IT] - The AI's response: [PASTE IT or describe it] - What 'excellent' looks like for this task: [DESCRIBE YOUR GOLD STANDARD] - What I'll use this rubric for: [Comparing models / Testing prompts / QA process / Training] Build my evaluation system: **1. SCORING RUBRIC** Create a task-specific rubric: | Dimension | Weight | 1 (Poor) | 3 (Adequate) | 5 (Excellent) | |-----------|--------|----------|--------------|---------------| | Accuracy | | | | | | Completeness | | | | | | Relevance | | | | | | Clarity | | | | | | [Task-specific] | | | | | **2. GRADE THIS OUTPUT** Apply the rubric to the response I provided: | Dimension | Score | Reasoning | |-----------|-------|-----------| Overall weighted score: X/5 **3. SPECIFIC FEEDBACK** - What this response does WELL (be specific) - What's MISSING or wrong (be specific) - What would make this a 5/5 (exact improvements needed) **4. REWRITTEN 'GOLD STANDARD'** Show me what a perfect response would look like for this prompt. **5. PROMPT IMPROVEMENT** If the output scored low — what prompt changes would improve results? - Changes to wording - Additional context needed - Constraints to add - Format specifications **6. COMPARISON FRAMEWORK** If comparing multiple AI responses to the same prompt: - Side-by-side scoring table - Winner by dimension - Overall recommendation - When to use each model/approach **7. AUTOMATED QA TEMPLATE** A reusable template I can use to quickly evaluate any AI output: - 5 yes/no questions for quick pass/fail - Scoring shorthand for detailed evaluation - Red flags that mean 'do not use this output' Be brutally honest in grading. I want to improve, not feel good about mediocre outputs.
📊 RUBRIC FOR: Marketing Email Copy | Dimension | Weight | 1 (Poor) | 3 (Good) | 5 (Excellent) | |-----------|--------|----------|----------|---------------| | Hook/Subject | 25% | Generic, no curiosity | Decent, would open | Irresistible, must-open | | Relevance | 20% | Generic blast | Segment-aware | Personally resonant | | CTA Clarity | 20% | Buried or vague | Clear but boring | Compelling, specific | | Tone/Voice | 15% | Corporate/AI-sounding | Professional | Sounds like a human I'd reply to | | Brevity | 10% | Wall of text | Reasonable length | Every word earns its place | | Persuasion | 10% | Lists features | Shows benefits | Creates urgency + desire | 📝 GRADING YOUR OUTPUT: | Dimension | Score | Why | |-----------|-------|-----| | Hook | 2/5 | 'I hope this finds you well' — instant delete | | Relevance | 3/5 | Mentions their industry but not their specific pain | | CTA | 4/5 | Clear ask, specific time — good | | Tone | 2/5 | Reads like AI wrote it — too formal | | Brevity | 3/5 | Could cut 40% without losing meaning | | Persuasion | 2/5 | All features, no emotion or urgency | 🏆 OVERALL: 2.6/5 — Needs significant revision ✅ QUICK QA CHECKLIST: - [ ] Would I open this email? (honest answer) - [ ] Is there ONE clear action to take? - [ ] Could this apply to anyone, or is it specific? - [ ] Does it sound like a human wrote it? - [ ] Would I forward this to a friend?
Without a rubric, evaluating AI output is subjective — 'this feels good' isn't a system. This prompt creates objective, repeatable scoring criteria specific to YOUR task, then applies them. It's the difference between 'I think this is okay' and 'This scores 3.2/5 on my rubric, and here's exactly what would make it a 5.'
When testing different prompts to find what works best, when comparing AI models for a specific task, when building QA processes for AI-generated content, or when you need to explain to stakeholders why one AI output is better than another.
A task-specific rubric with weighted dimensions, objective scoring of your AI output, specific improvement suggestions, and a reusable QA template. You'll stop accepting mediocre AI outputs and start systematically improving them.
Systematically diagnose why your prompt isn't giving good results and fix it — using a structured debugging approach tha…
Transform any complex question into a chain-of-thought prompt that forces AI to reason through problems systematically i…
Design a custom AI assistant with specific expertise, personality, and behavior rules — whether for a Custom GPT, Claude…