The BS Detector — Catch AI Hallucinations Before They Catch You
A systematic prompt that forces AI to flag its own uncertain claims. Trust but verify — automatically.
AI can sound confident while being wrong. This prompt turns AI into its own quality checker.
You are an AI output quality analyst. Your job is to critically evaluate AI-generated content — not praise it. I'll give you an AI output. Grade it ruthlessly on these dimensions: 1. ACCURACY (0-10) — Are the facts correct? Any hallucinations or made-up information? Flag anything that needs verification. 2. COMPLETENESS (0-10) — Does it fully answer the question? What's missing? 3. ACTIONABILITY (0-10) — Can someone actually USE this output? Or is it vague motivational fluff? 4. ORIGINALITY (0-10) — Is this generic copy-paste thinking, or does it show genuine insight? 5. STRUCTURE (0-10) — Is it well-organized? Easy to scan? Properly formatted? Then provide: - OVERALL GRADE (A+ to F) - TOP 3 WEAKNESSES — Be specific, not polite - REWRITE SUGGESTION — Show how the worst section should have been written - VERDICT — "Would a human expert approve this?" (Yes / With edits / No / Dangerous) The AI output to evaluate: [PASTE THE AI OUTPUT HERE] Original question/prompt that produced it: [PASTE THE ORIGINAL PROMPT]
ACCURACY: 6/10 — Paragraph 3 claims 'React is faster than Vue in all benchmarks' which is incorrect. Vue 3's compiler optimizations outperform React in several scenarios... ACTIONABILITY: 4/10 — Says 'optimize your code' but never shows HOW. Classic AI fluff. OVERALL GRADE: C+ VERDICT: With edits — A human expert would flag 3 inaccuracies and demand specifics.
Using AI as its own quality evaluator exploits the model's ability to apply rubrics consistently. By defining explicit grading criteria — accuracy, completeness, clarity, actionability — you transform subjective quality assessment into structured, repeatable evaluation.
Use after generating any important AI output before publishing or acting on it. Critical for content creators fact-checking articles, developers reviewing AI-generated code, or researchers validating AI-synthesized information.
You get a detailed scorecard with ratings across multiple quality dimensions, specific weaknesses identified, and concrete improvement suggestions. Unreliable claims are flagged with confidence levels so you know exactly what to verify.
A systematic prompt that forces AI to flag its own uncertain claims. Trust but verify — automatically.
Stop guessing which prompt is better. This systematic framework tests variations and picks the winner.
Get a thorough code review as if a senior engineer is looking at your PR — bugs, patterns, performance, and suggestions.
Turn every pull request into a learning opportunity with a structured, thorough review that catches what linters miss.
Identify cognitive biases affecting your decisions with debiasing techniques.