Grade Any AI Output — Know If It's Actually Good or Just Sounds Good

AI can sound confident while being wrong. This prompt turns AI into its own quality checker.

Copy & Paste this prompt

You are an AI output quality analyst. Your job is to critically evaluate AI-generated content — not praise it.

I'll give you an AI output. Grade it ruthlessly on these dimensions:

1. ACCURACY (0-10) — Are the facts correct? Any hallucinations or made-up information? Flag anything that needs verification.
2. COMPLETENESS (0-10) — Does it fully answer the question? What's missing?
3. ACTIONABILITY (0-10) — Can someone actually USE this output? Or is it vague motivational fluff?
4. ORIGINALITY (0-10) — Is this generic copy-paste thinking, or does it show genuine insight?
5. STRUCTURE (0-10) — Is it well-organized? Easy to scan? Properly formatted?

Then provide:
- OVERALL GRADE (A+ to F)
- TOP 3 WEAKNESSES — Be specific, not polite
- REWRITE SUGGESTION — Show how the worst section should have been written
- VERDICT — "Would a human expert approve this?" (Yes / With edits / No / Dangerous)

The AI output to evaluate:
[PASTE THE AI OUTPUT HERE]

Original question/prompt that produced it:
[PASTE THE ORIGINAL PROMPT]

Use as Template

#quality-check#fact-checking#ai-evaluation#critical-thinking

Works with

chatgptclaudegemini

💡 Pro Tips

•Use this BEFORE publishing or acting on any AI-generated content
•Run it on a different AI model than the one that created the output
•Keep a log of common AI failure patterns you discover

✨ Example Output

ACCURACY: 6/10 — Paragraph 3 claims 'React is faster than Vue in all benchmarks' which is incorrect. Vue 3's compiler optimizations outperform React in several scenarios...

ACTIONABILITY: 4/10 — Says 'optimize your code' but never shows HOW. Classic AI fluff.

OVERALL GRADE: C+
VERDICT: With edits — A human expert would flag 3 inaccuracies and demand specifics.

🧠 Why This Works

Using AI as its own quality evaluator exploits the model's ability to apply rubrics consistently. By defining explicit grading criteria — accuracy, completeness, clarity, actionability — you transform subjective quality assessment into structured, repeatable evaluation.

📅 When to Use This Prompt

Use after generating any important AI output before publishing or acting on it. Critical for content creators fact-checking articles, developers reviewing AI-generated code, or researchers validating AI-synthesized information.

🎯 What You'll Get

You get a detailed scorecard with ratings across multiple quality dimensions, specific weaknesses identified, and concrete improvement suggestions. Unreliable claims are flagged with confidence levels so you know exactly what to verify.

🔗 Related Prompts

AI Mastery

The BS Detector — Catch AI Hallucinations Before They Catch You

A systematic prompt that forces AI to flag its own uncertain claims. Trust but verify — automatically.

hallucinationfact-checkingreliability

★0.0

beginner

AI MasteryPremium

A/B Test Your Prompts — Find the Version That Works 10x Better

Stop guessing which prompt is better. This systematic framework tests variations and picks the winner.

optimizationa-b-testingprompt-engineering

★0.0

intermediate

Coding & Development

Senior Engineer Code Review

Get a thorough code review as if a senior engineer is looking at your PR — bugs, patterns, performance, and suggestions.

code-reviewbest-practicessecurity

★4.9

intermediate

Coding & DevelopmentPremium

Generate a Bulletproof Code Review Checklist for Any PR

Turn every pull request into a learning opportunity with a structured, thorough review that catches what linters miss.

codingdevelopmentgenerate

★4.7

intermediate

Decision MakingPremium

Cognitive Bias Detector

Identify cognitive biases affecting your decisions with debiasing techniques.

cognitive-biaspsychologycritical-thinking

★4.6

intermediate