Module 16 · Tier 6 — Critical Evaluation

Critical Thinking with AI

When to trust AI, when to push back, and how to spot the errors that look like good answers.

13 min Tier 6 — Critical Evaluation

Why This Matters

The most dangerous AI outputs are not the obviously wrong ones — those you catch immediately. The dangerous ones are the ones that look right: well-structured, confident, plausible, and wrong in ways that are hard to detect without domain expertise or primary-source verification. This module is about developing the critical faculties to catch what casual review misses.

AI is trained to produce plausible text. Plausibility and accuracy are related but not identical. A response can be entirely plausible — using the right vocabulary, following the right structure, citing the right type of sources — while containing claims that are factually false.

The plausibility trap: you evaluate AI output by asking "does this seem reasonable?" instead of "is this true?" The first question is answerable by reading. The second requires verification.

Expert AI users shift their default question from "does this seem right?" to "what would need to be true for this to be wrong?" This inversion surfaces specific claims to verify rather than general impressions to confirm.

Full access is for AIQ members

Unlock all 56 lessons, the certificate pathway, and the SociA|~ community.

56 lessons across all three course tracks
AIQ certification on completion
SociA|~ Society community access

Unlock full access →

The Concept

The Plausibility Trap

The plausibility trap: you evaluate AI output by asking "does this seem reasonable?" instead of "is this true?" The first question is answerable by reading. The second requires verification.

The Five Verification Triggers

You don't need to verify everything. You need to verify outputs that trigger any of these:

Specific numbers: Any statistic, percentage, or quantitative claim. These are hallucination-prone.
Citations and references: Always check that cited sources exist and say what AI claims they say.
Claims about specific individuals or organizations: AI frequently confabulates details about real entities.
Confident statements in your domain of expertise that contradict your knowledge: Trust your expertise when AI contradicts it — you're probably right.
Anything that would be embarrassing if wrong: Apply proportional scrutiny to consequential outputs.

Productive Pushback Techniques

Critical thinking with AI isn't just verification — it's active dialogue. When an AI response seems incomplete or incorrect, productive pushback gets better outputs:

"That doesn't match what I know about [X]. Can you explain your reasoning?" — Forces the model to expose its chain of reasoning, which often reveals where it went wrong or why there's apparent disagreement.

"What would need to be true for the opposite position to be correct?" — Forces a more balanced analysis and often surfaces important caveats the first response omitted.

"What's the strongest evidence against your recommendation?" — Steelmanning the alternative reveals whether the AI actually analyzed the question or just generated the most statistically common answer.

"You said [claim]. What's your source, and how confident are you in this?" — Explicit confidence calibration. AI often reduces its certainty when directly asked, revealing where uncertainty was hidden in confident-sounding prose.

The Expert Standard

The best critical evaluation framework: would a genuine expert in this domain find this useful, or would they find it superficially plausible but missing important nuance?

If you're not that expert, find one. The most valuable thing you can do with an important AI-generated analysis is ask someone with deep domain expertise to read it critically. What do they agree with? What do they see as wrong or oversimplified?

Catching plausible fabrication

An example of plausible fabrication that passes casual review:

A student asks AI to support their thesis with citations. AI provides five citations — author, title, journal, year, page numbers. All formatted correctly. The student uses them.

Two of the citations don't exist. The journals are real, the authors are real researchers, but those specific papers were never written. The fabrication is plausible because all the components are real; only the specific combination is invented.

The catch: a 5-minute Google Scholar search on each citation before use. This is a universal rule for AI-generated citations: verify every single one before citing it. No exceptions.

Hands-On Exercise

Apply critical thinking to a complex AI output

ClaudeChatGPTGemini

Ask an AI a complex question in your professional domain — something where you have genuine expertise and could identify errors. When you get the response: 1. List every specific claim that could be verified independently 2. For each claim, mark: probably right (matches your expertise), uncertain (outside your expertise), suspicious (contradicts what you know) 3. Verify the two claims you marked as most suspicious 4. Apply one productive pushback technique to an aspect of the response you found incomplete Write up: how accurate was the response overall? What type of error did it make most? What did the pushback technique surface that the initial response missed?

Choose a domain where you actually have expertise. You can't evaluate critical thinking quality on a topic you know nothing about.

Active Recall

Before moving on — close this lesson and answer these from memory. Then come back and check. Testing yourself (not re-reading) is how this sticks.

1 What is the plausibility trap in AI evaluation? How do you change your default evaluation question to avoid it?

2 Name the five verification triggers. Give a real example from your work where each one would apply.

Reflection

What is the biggest critical thinking gap in your current AI use? Where are you most likely to accept plausible-sounding output without sufficient verification?

Key Takeaway

The most dangerous AI errors are plausible ones. Shift from 'does this seem right?' to 'what would need to be true for this to be wrong?' Apply the five verification triggers. Use productive pushback to expose reasoning and surface hidden caveats.