You already know that language models like ChatGPT and Claude work by predicting the next token. Given some text, they predict what comes next. Do this repeatedly, and you get coherent sentences, paragraphs, entire documents. It feels like writing, but it's prediction all the way down.
This article builds on that foundation. We're going to show you how the same prediction mechanism that produces helpful prose can also produce structured instructions for external systems. The result is what people call an "agent" - but it's not a new kind of intelligence. It's prediction, plus tools, plus a loop.
What You Already Know
Language models predict tokens. A token is roughly a word or part of a word. Given the context "The weather in Noosa is", a model might predict "sunny" or "warm" or "perfect" as likely next tokens. It's not thinking about Noosa's actual weather - it's recognising a pattern and predicting what typically follows.
Try it yourself. Type a partial sentence and see what the model might predict next.
This works because language follows patterns, and the model has absorbed billions of those patterns during training. The prediction feels natural when it produces English words.
Structure, Not Just Prose
Next-token prediction doesn’t only generate fluent prose. If the system prompt defines a format, the model learns to keep its next tokens inside that shape.
Same user question. Two different system prompts. Notice how the output changes when the structure is declared upfront.
What If the Next Token Isn't a Word?
Here's the key insight: the model doesn't care whether it's predicting English words or something else. It predicts tokens based on context. If the context includes information about available tools, the model can predict tool calls instead of prose.
Consider this scenario. A model is told: "You have access to a tool called WEATHER that takes a location and returns the current weather." Then a user asks: "What's the weather in Noosa?"
The model predicts what comes next. And what comes next, given that context, is a tool call.
This isn't magic. The model is doing exactly what it always does - predicting the most likely next tokens given the context. It's just that the context now includes tool definitions, so tool calls become valid predictions.
The Loop
Predicting a tool call is one thing. Actually executing it is another. The model can't run code or make HTTP requests - it just produces text. Something else has to take that text, recognise it as a tool call, execute it, and return the result.
This is where the agent loop comes in. It works like this:
- The model sees the context (system prompt, conversation history, tool definitions)
- The model predicts a response - which might include a tool call
- If there's a tool call, something outside the model executes it
- The result is added to the context
- Go back to step 1
Watch this play out step by step.
Notice what happens: the model doesn't "know" the weather in Noosa. It predicts a tool call. Something external executes that call and returns real data. The model then predicts a natural language response using that data.
The loop is what makes this feel like agency. The model keeps responding to new information until the task is done.
From APIs to a Computer
WEATHER(location) is a toy example. It's a narrow tool that does exactly one thing. Useful for demonstrating the concept, but limited in practice.
The real power comes when you give the model a general-purpose tool. Not "check the weather" but "run a command on a computer."
Compare these two approaches to the same question:
The bash tool doesn't know about weather. It knows how to run commands. The model predicts the right command to get weather data, and bash executes it. This is a fundamentally different level of capability.
Files as Memory
Language models have no memory across conversations. Each session starts fresh. The model doesn't remember what you discussed yesterday or what it learned last week.
But if the model can read and write files, it can simulate memory. It can save information to disk during one conversation and retrieve it in another. The files persist even when the context window is cleared.
The file persists across the conversation boundary. The second conversation has no memory of the first - but it can read the file that the first conversation created. This is how agents can maintain continuity across sessions.
Bash as the Universal Tool
Once you have bash access, you have access to everything the operating system can do. Download files from the internet. Parse and transform data. Install software. Chain operations together.
Watch the model handle a multi-step task that requires combining several tools:
The model doesn't have a built-in "count words on webpage" capability. It predicts the bash commands that would accomplish that task, and the system executes them. Each result becomes part of the context for the next prediction.
Where It Breaks
General-purpose tool access is powerful. It's also dangerous. The same mechanism that lets the model help you organise files can let it delete them. The same prediction that generates a useful curl command can generate a destructive one.
The model doesn't understand consequences. It predicts tokens. A command that wipes a disk is just another valid prediction if the context makes it seem appropriate.
Test your intuition. Which of these commands should raise concern?
This is why real agent systems include sandboxing, permission systems, and human approval steps. The model predicts; something else decides whether to actually execute. The prediction is only as safe as the guardrails around it.
The Punchline
An AI agent is not a new kind of intelligence. It's a language model doing exactly what it always does - predicting tokens - but with two additions:
- Tool definitions in the context that make tool calls valid predictions
- A loop that executes those calls and feeds results back in
The model isn't smarter when it has tools. It's connected to things that extend what prediction alone can accomplish. It still doesn't know, understand, or intend anything. It predicts what a helpful assistant would do, and then does it.
The question isn't "is it intelligent?" The question is: what are you willing to let a prediction engine touch?