The Plausibility Trap
AI is trained to produce plausible text. Plausibility and accuracy are related but not identical. A response can be entirely plausible — using the right vocabulary, following the right structure, citing the right type of sources — while containing claims that are factually false.
The plausibility trap: you evaluate AI output by asking "does this seem reasonable?" instead of "is this true?" The first question is answerable by reading. The second requires verification.
Expert AI users shift their default question from "does this seem right?" to "what would need to be true for this to be wrong?" This inversion surfaces specific claims to verify rather than general impressions to confirm.
The Five Verification Triggers
You don't need to verify everything. You need to verify outputs that trigger any of these:
- Specific numbers: Any statistic, percentage, or quantitative claim. These are hallucination-prone.
- Citations and references: Always check that cited sources exist and say what AI claims they say.
- Claims about specific individuals or organizations: AI frequently confabulates details about real entities.
- Confident statements in your domain of expertise that contradict your knowledge: Trust your expertise when AI contradicts it — you're probably right.
- Anything that would be embarrassing if wrong: Apply proportional scrutiny to consequential outputs.
Productive Pushback Techniques
Critical thinking with AI isn't just verification — it's active dialogue. When an AI response seems incomplete or incorrect, productive pushback gets better outputs:
"That doesn't match what I know about [X]. Can you explain your reasoning?" — Forces the model to expose its chain of reasoning, which often reveals where it went wrong or why there's apparent disagreement.
"What would need to be true for the opposite position to be correct?" — Forces a more balanced analysis and often surfaces important caveats the first response omitted.
"What's the strongest evidence against your recommendation?" — Steelmanning the alternative reveals whether the AI actually analyzed the question or just generated the most statistically common answer.
"You said [claim]. What's your source, and how confident are you in this?" — Explicit confidence calibration. AI often reduces its certainty when directly asked, revealing where uncertainty was hidden in confident-sounding prose.
The Expert Standard
The best critical evaluation framework: would a genuine expert in this domain find this useful, or would they find it superficially plausible but missing important nuance?
If you're not that expert, find one. The most valuable thing you can do with an important AI-generated analysis is ask someone with deep domain expertise to read it critically. What do they agree with? What do they see as wrong or oversimplified?