Sign In
Why This Matters

You don't need to understand the math to work effectively with AI. But you do need to understand the three mechanics that explain most of the weird things AI does — including why it forgets earlier parts of a conversation, why the same prompt can give wildly different results, and why AI sometimes "goes off the rails." Once you understand tokens, context windows, and temperature, most AI surprises stop being mysterious.

The Concept
Core Reading
Tier 1 — Orientation 11 min

🔤 Tokens: The Unit of Thought

AI doesn't read text the way you do. It breaks language into tokens — chunks roughly corresponding to words. Common words are usually one token. Rare or long words may be several. Punctuation is often its own token.

💡

Why tokens matter in practice:

  • Context window limits (like "128K context") are in tokens, not words
  • Every generated token is influenced by every previous token in the conversation
  • Unusual words or names get split unexpectedly, making the model less reliable on them

🧠 The Context Window: Your AI's Working Memory

An AI model only "knows" what's currently in its context window — your system prompt, the conversation history, and the current message. Think of it as working memory, not long-term memory.

Each conversation with an AI is a fresh piece of paper. Everything on that paper is available to it. Anything not on that paper doesn't exist from the model's perspective.

What this means in practice:

  • In very long conversations, earlier messages may be dropped as the context fills up
  • The AI can't access previous conversations unless the platform stores and re-injects them
  • To make AI "remember" something important, keep it in the current context
  • Context windows are large (100K+ tokens for Claude and ChatGPT) — but not infinite

🎲 Temperature: Why the Same Prompt Gets Different Results

AI doesn't always pick the single most probable next token — that would produce repetitive output. Instead, it samples from a probability distribution. Temperature controls how this sampling works:

Low (0.0–0.3)
Picks highest-probability tokens. Consistent, predictable. Best for code and factual tasks.
Medium (0.5–0.7)
Balances likely and interesting. Most conversational AI defaults here.
High (0.8–1.0+)
Wider range of tokens selected. More creative and varied — but more likely to go off track.

Running the same prompt twice and getting different results isn't a bug. It's how sampling works. For creative tasks, variation is a feature. For consistency, ask explicitly or use platform settings.

🌀 Why AI "Hallucinates"

The model is always generating the most statistically plausible continuation of the text it's seen. When you ask about a specific fact, it doesn't retrieve it from a database — it generates a response that looks like a factual answer, in the style factual answers are written.

If the fact was well-represented in training data, the output is often accurate. If it was rare, obscure, or recent, the model still generates something that looks like a confident factual claim — because that's what confident factual claims look like in text.
⚠️

The practical rule: Any specific claim — a date, a statistic, a quote, a citation — should be independently verified before you use it. Use AI for synthesis, structure, and generation. Use primary sources for facts.

The context window in real life

Three experiments you can run right now to see tokens, context, and temperature in action:

1

Test the context window

Start a conversation with an AI. Tell it your name and one unusual fact about yourself. Have a normal 10-message conversation on a completely different topic. Then ask: "What do you remember about me from the start of this conversation?"

Notice whether it remembered, what it recalled accurately, and what it got wrong or fabricated.

2

Test temperature variation

Ask the same creative prompt three times in fresh conversations: "Write the opening sentence of a short story about someone discovering something unexpected."

Compare the three sentences. They'll differ in structure, tone, and word choice — even though the instruction was identical. This is temperature in action.

💡

These aren't tests to catch AI doing something wrong. They're calibration exercises — building your intuition for how AI memory and variation actually work so you can design better prompts.

Hands-On Exercise

Test the boundaries of AI memory

ClaudeChatGPTGemini

This exercise has two parts — both are quick and revealing.

Part 1 — Test the context window

Start a new conversation. Tell the AI your name and one unusual specific fact about yourself. Have a normal 10-message conversation on a completely different topic. Then ask: "What do you remember about me from the start of this conversation?"

Notice whether it remembered, what it recalled accurately, and what it got wrong or invented.

Part 2 — Test temperature

Ask the AI: "Write the opening sentence of a short story about someone discovering something unexpected." Run this exact prompt three times (start fresh each time). Compare the three results.

Are they similar? Different? What varies — word choices, structure, tone?

The goal is to build intuition through experience, not to catch the AI doing something wrong.
Active Recall

Before moving on — close this lesson and answer these from memory. Then come back and check. Testing yourself (not re-reading) is how this sticks.

1 What is a context window, and what happens to earlier parts of a conversation when the context window fills up?
2 Why does the same prompt sometimes produce different results when you run it twice? What mechanism is responsible?
3 A colleague sends you an AI-generated research summary with five specific statistics cited. What should you do before using those statistics in a presentation, and why?
Reflection
💭

Think of a time AI surprised you — either with something impressively good or unexpectedly wrong. Now that you understand tokens, context windows, and temperature: can you explain what probably happened? How does having a mechanical explanation for AI behavior change how you relate to it?

Key Takeaway

AI processes text in tokens, operates only within the current context window, and samples outputs probabilistically. There is no verification step — the model generates plausible-sounding text regardless of accuracy. Understanding these mechanics turns AI surprises into predictable patterns.