Sign In
Signed out 5
Why This Matters

You don't need to understand the math to work effectively with AI. But you do need to understand the three mechanics that explain most of the weird things AI does — including why it forgets earlier parts of a conversation, why the same prompt can give wildly different results, and why AI sometimes "goes off the rails." Once you understand tokens, context windows, and temperature, most AI surprises stop being mysterious.

The Concept

Tokens: The Unit of Thought

AI doesn't read text the way you do. It breaks language into tokens — chunks of text that are roughly, but not exactly, words. Common words are usually single tokens. Rare or long words might be multiple tokens. Punctuation is often its own token.

This matters because:

  • AI counts tokens, not words. When you see a "context window" limit (like "128K context"), that's tokens, not words or characters.
  • AI processes each token in relation to all previous tokens. Every word it generates is influenced by every token that came before it in your conversation.
  • Tokenization is why AI sometimes mangles unusual words or names. A word it has rarely seen gets split into unexpected token chunks, and the prediction process becomes less reliable.

The Context Window: Your AI's Working Memory

An AI model only "knows" what's currently in its context window. The context window is everything the model can see at once — your system prompt, the conversation history, and the current message.

Think of it as working memory, not long-term memory. Once something leaves the context window, the model genuinely doesn't have access to it. It didn't "forget" the way a person forgets — it simply never had access to it again.

What this means in practice:

  • In a very long conversation, earlier messages may get dropped or summarized as the context fills up
  • The AI can't access your previous conversations (unless the platform stores and re-injects them)
  • If you want the AI to "remember" something important, you need to keep it in the current context — either repeat it or keep the conversation window fresh
  • Context windows have gotten very large (Claude and ChatGPT both support 100K+ tokens), but they still have limits

A useful mental model: each conversation with an AI is a fresh piece of paper. Everything on that piece of paper is available to it. Anything not on that piece of paper doesn't exist from the model's perspective.

Temperature: Why the Same Prompt Gets Different Results

When an AI model is generating text, it doesn't just always pick the single most probable next token. That would produce repetitive, boring output. Instead, it samples from a probability distribution — a ranked list of possible next tokens with associated probabilities.

Temperature is the dial that controls how this sampling works:

  • Low temperature (0.0-0.3): Almost always picks the highest-probability token. Outputs are consistent, predictable, and sometimes repetitive. Good for factual or code tasks.
  • Medium temperature (0.5-0.7): Balances between likely and interesting. Most conversational AI defaults to something in this range.
  • High temperature (0.8-1.0+): Picks from a wider range of possible tokens, including lower-probability options. Outputs are more creative and varied, but also more likely to go off track.

This explains why you can run the same prompt twice and get meaningfully different results — it's not a bug, it's how the sampling process works. For tasks where consistency matters (code, factual summaries), use a platform setting or explicitly ask for the same output style. For creative tasks, the variation is a feature.

Why AI "Hallucinates" (The Technical Explanation)

You now have enough background to understand hallucinations mechanically.

The model is always generating the most statistically plausible continuation of the text it's seen. When you ask it about a specific fact — say, the date of a particular event, or the exact content of a specific document — the model doesn't retrieve that fact from a database. It generates a response that looks like a factual answer, in the style that factual answers are written.

If the correct fact was well-represented in the training data, the most probable continuation will often be accurate. If the fact was rare, obscure, recent, or simply not well-represented in training, the model will still generate something that looks like a confident factual claim — because that's what a confident factual claim looks like in text.

The model has no internal flag that says "I'm not sure about this." It just keeps predicting. The confidence of the output is a stylistic feature of the text it generates, not a signal about the accuracy of the content.

The practical rule: Any specific claim — a date, a statistic, a quote, a citation — should be independently verified before you use it. Use AI for synthesis, structure, and generation. Use primary sources for facts.

The context window in real life

Here's a demonstration you can try: start a very long conversation with an AI, or paste in a very long document, and then ask about something from the very beginning of the conversation. Pay attention to whether the AI accurately recalls it or seems to have lost track.

A simpler version: ask an AI to tell you how many tokens are in its current context, or how much of the context window you've used. Some platforms show this directly in the interface.

Another revealing experiment: ask the same creative prompt twice in a row in a fresh conversation. Notice that the outputs differ in specific ways — word choices, structure, examples — even though the instruction was identical. This is temperature in action.

Hands-On Exercise

Test the boundaries of AI memory

ClaudeChatGPTGemini
This exercise has two parts. **Part 1 — Test the context:** Start a new conversation with any AI tool. Tell it your name and one unusual, specific fact about yourself (make one up if you prefer). Have a normal 10-message conversation about a different topic entirely. Then, at the end, ask: "What do you remember about me from the start of this conversation?" Notice whether it remembered, what it recalled accurately, and what (if anything) it got wrong. **Part 2 — Test temperature:** Ask the AI to write the opening sentence of a short story about someone discovering something unexpected. Run the exact same prompt three times (start fresh each time or ask again immediately). Compare the three opening sentences. Are they similar? Different? What varies? Jot down what you notice. These aren't tests with right answers — they're building your intuition for how AI memory and variation actually work.
The goal is to build intuition through experience, not to catch the AI doing something wrong.
Active Recall

Before moving on — close this lesson and answer these from memory. Then come back and check. Testing yourself (not re-reading) is how this sticks.

1 What is a context window, and what happens to earlier parts of a conversation when the context window fills up?
2 Why does the same prompt sometimes produce different results when you run it twice? What mechanism is responsible?
3 A colleague sends you an AI-generated research summary with five specific statistics cited. What should you do before using those statistics in a presentation, and why?
Reflection

Think of a time AI surprised you — either with something impressively good or unexpectedly wrong. Now that you understand tokens, context windows, and temperature: can you explain what probably happened? How does having a mechanical explanation for AI behavior change how you relate to it?

Key Takeaway

AI processes text in tokens, operates only within the current context window, and samples outputs probabilistically. There is no verification step — the model generates plausible-sounding text regardless of accuracy. Understanding these mechanics turns AI surprises into predictable patterns.