Tokens: The Unit of Thought
AI doesn't read text the way you do. It breaks language into tokens — chunks of text that are roughly, but not exactly, words. Common words are usually single tokens. Rare or long words might be multiple tokens. Punctuation is often its own token.
This matters because:
- AI counts tokens, not words. When you see a "context window" limit (like "128K context"), that's tokens, not words or characters.
- AI processes each token in relation to all previous tokens. Every word it generates is influenced by every token that came before it in your conversation.
- Tokenization is why AI sometimes mangles unusual words or names. A word it has rarely seen gets split into unexpected token chunks, and the prediction process becomes less reliable.
The Context Window: Your AI's Working Memory
An AI model only "knows" what's currently in its context window. The context window is everything the model can see at once — your system prompt, the conversation history, and the current message.
Think of it as working memory, not long-term memory. Once something leaves the context window, the model genuinely doesn't have access to it. It didn't "forget" the way a person forgets — it simply never had access to it again.
What this means in practice:
- In a very long conversation, earlier messages may get dropped or summarized as the context fills up
- The AI can't access your previous conversations (unless the platform stores and re-injects them)
- If you want the AI to "remember" something important, you need to keep it in the current context — either repeat it or keep the conversation window fresh
- Context windows have gotten very large (Claude and ChatGPT both support 100K+ tokens), but they still have limits
A useful mental model: each conversation with an AI is a fresh piece of paper. Everything on that piece of paper is available to it. Anything not on that piece of paper doesn't exist from the model's perspective.
Temperature: Why the Same Prompt Gets Different Results
When an AI model is generating text, it doesn't just always pick the single most probable next token. That would produce repetitive, boring output. Instead, it samples from a probability distribution — a ranked list of possible next tokens with associated probabilities.
Temperature is the dial that controls how this sampling works:
- Low temperature (0.0-0.3): Almost always picks the highest-probability token. Outputs are consistent, predictable, and sometimes repetitive. Good for factual or code tasks.
- Medium temperature (0.5-0.7): Balances between likely and interesting. Most conversational AI defaults to something in this range.
- High temperature (0.8-1.0+): Picks from a wider range of possible tokens, including lower-probability options. Outputs are more creative and varied, but also more likely to go off track.
This explains why you can run the same prompt twice and get meaningfully different results — it's not a bug, it's how the sampling process works. For tasks where consistency matters (code, factual summaries), use a platform setting or explicitly ask for the same output style. For creative tasks, the variation is a feature.
Why AI "Hallucinates" (The Technical Explanation)
You now have enough background to understand hallucinations mechanically.
The model is always generating the most statistically plausible continuation of the text it's seen. When you ask it about a specific fact — say, the date of a particular event, or the exact content of a specific document — the model doesn't retrieve that fact from a database. It generates a response that looks like a factual answer, in the style that factual answers are written.
If the correct fact was well-represented in the training data, the most probable continuation will often be accurate. If the fact was rare, obscure, recent, or simply not well-represented in training, the model will still generate something that looks like a confident factual claim — because that's what a confident factual claim looks like in text.
The model has no internal flag that says "I'm not sure about this." It just keeps predicting. The confidence of the output is a stylistic feature of the text it generates, not a signal about the accuracy of the content.
The practical rule: Any specific claim — a date, a statistic, a quote, a citation — should be independently verified before you use it. Use AI for synthesis, structure, and generation. Use primary sources for facts.