You type a prompt into ChatGPT and hit send. It feels like the model is holding the whole conversation in its head. It is not.
It reads a slice of text called the context window. That window is its working memory, and it is measured in tokens instead of words. Every message you send and every response it generates has to fit inside that window.
This article shows the full path from prompt to tokens to predictions, with small demos you can play with.
From prompt to tokens
Start with the thing you already know: a normal chat prompt. The model does not see letters or words. It sees tokens, which are small pieces of text.
Tokens are not always whole words. A long word might become two or three tokens. Punctuation and spaces count too.
If you want the exact token splits for real models, try the tiktokenizer demo.
Notice the tokens that begin with a blank space. Real tokenizers often include that space inside the token instead of splitting it out. Chat role wrappers are added by the API, not typed directly.
Behind the scenes, every token is an integer ID. The model never sees the letters, only a long list of numbers. That is what a context window really is: a sequence of integers that map back to word fragments.
How the model predicts the next token
Inference is the step where the model predicts the next token. It looks at the whole sequence so far and assigns a probability to many possible next tokens.
In this demo, the model always picks the highest probability token so the behavior is easy to see.
Every new token becomes part of the next input. That is why the model feels consistent as it talks – it is always reading the full context window again.
Every message adds to the window
The context window is not just your prompt. It includes system instructions, your messages, the assistant’s replies, and any tool outputs. It all becomes one long sequence of tokens.
Click through the turns below and watch the window fill up.
Tool calls add tokens too
When you ask for a web search, the assistant pauses and calls a tool. The tool returns text, and that text becomes tokens inside the same window. Then the assistant continues predicting the next tokens.
Other modalities still become tokens
PDFs, images, and voice are not magic inputs. They get converted into text or multimodal tokens before the model reasons. Those tokens count toward the same context window.
What this means for your prompts
Once you understand the window, you can work with it instead of fighting it.
- Keep your main goal in one short sentence near the top.
- Use bullet points for constraints so the model can re-read them quickly.
- Summarize long threads before asking for new work.
- Ask for compact outputs when the conversation gets long.
The practical takeaway
You do not need to count tokens by hand. Just use these three habits:
- Start with a short goal sentence.
- Put constraints in a clean list.
- If the chat gets long, ask for a short summary and continue from that summary.