Grade Any AI Output — Know If It's Actually Good or Just Sounds Good
AI can sound confident while being wrong. This prompt turns AI into its own quality checker.
Stop guessing which prompt is better. This systematic framework tests variations and picks the winner.
You are a prompt optimization specialist who runs systematic A/B tests on prompts. I have a prompt that works okay, but I want to make it great. Run a structured optimization experiment. My current prompt: [PASTE YOUR CURRENT PROMPT] What it's for: [WHAT TASK DOES IT ACCOMPLISH?] What's not great about the output: [WHAT DO YOU WISH WAS BETTER?] Run this experiment: 1. DIAGNOSIS — What's specifically weak about my current prompt? (vague instructions, missing constraints, wrong role, poor formatting, etc.) 2. GENERATE 5 VARIANTS — Create 5 improved versions, each changing ONE variable: - Variant A: Different role/persona - Variant B: Added constraints and guardrails - Variant C: Restructured format and output specification - Variant D: Added examples (few-shot) - Variant E: Complete rewrite with chain-of-thought 3. TEST ALL 5 — Run each variant on this test input: [YOUR TEST INPUT OR SAY 'use a realistic example'] 4. SCORECARD — Compare all 5 outputs on: accuracy, completeness, usefulness, readability (1-10 each) 5. WINNER + HYBRID — Declare the winner AND create a hybrid that combines the best elements from all variants 6. BEFORE/AFTER — Show the quality difference between my original prompt and the final optimized version
DIAGNOSIS: Your prompt uses a generic role ('you are a helpful assistant'), gives no output format, and doesn't constrain length. These are the top 3 prompt anti-patterns.
VARIANT A (role change): Score 7.2/10 — Better depth but still too long
VARIANT C (format): Score 8.5/10 — Much more actionable with headers and bullets
VARIANT E (rewrite): Score 9.1/10 — Best overall
HYBRID WINNER: [combines best role from A + format from C + chain-of-thought from E]Systematic prompt testing applies scientific methodology to prompt engineering — controlling variables, measuring specific quality dimensions, and iterating based on evidence rather than intuition. This eliminates guesswork and compounds improvements across iterations.
Use when a prompt is important enough to optimize — high-volume content generation, customer-facing AI interactions, or any prompt you'll reuse dozens of times. Essential for prompt engineers building production systems where small quality gains multiply at scale.
You identify which prompt variations produce measurably better outputs across your defined quality criteria. After 2-3 testing rounds, prompts typically improve 3-10x in output quality compared to unoptimized first drafts.
AI can sound confident while being wrong. This prompt turns AI into its own quality checker.
See amazing AI output online? This prompt deconstructs it into a template you can reuse forever.
Show AI 2-3 examples of what you want, and it will match your style perfectly. The most underused AI technique.
Generate 20 subject line variations using proven psychological frameworks — never send a boring email again.
Build a single comprehensive prompt that handles complex multi-step tasks in one shot. No back-and-forth needed.