How AI Agents Work

You already know that language models like ChatGPT and Claude work by predicting the next token. Given some text, they predict what comes next. Do this repeatedly, and you get coherent sentences, paragraphs, entire documents. It feels like writing, but it's prediction all the way down.

This article builds on that foundation. We're going to show you how the same prediction mechanism that produces helpful prose can also produce structured instructions for external systems. The result is what people call an "agent" - but it's not a new kind of intelligence. It's prediction, plus tools, plus a loop.

What You Already Know

Language models predict tokens. A token is roughly a word or part of a word. Given the context "The weather in Noosa is", a model might predict "sunny" or "warm" or "perfect" as likely next tokens. It's not thinking about Noosa's actual weather - it's recognising a pattern and predicting what typically follows.

Try it yourself. Type a partial sentence and see what the model might predict next.

This works because language follows patterns, and the model has absorbed billions of those patterns during training. The prediction feels natural when it produces English words.

Structure, Not Just Prose

Next-token prediction doesn’t only generate fluent prose. If the system prompt defines a format, the model learns to keep its next tokens inside that shape.

Same user question. Two different system prompts. Notice how the output changes when the structure is declared upfront.

Context Window 0 tokens

Same model. Same question. But when the system prompt declares a format, the model predicts tokens that follow it - because that's what a helpful assistant with those instructions would produce.

Notice: the structure is predictable. A program could reliably find the ingredients, extract quantities, separate the steps.

This is the foundation of tool use. Instead of declaring a recipe format, we declare available tools - and the model predicts structured tool calls the system knows how to execute.

Step 0 of 3

What If the Next Token Isn't a Word?

Here's the key insight: the model doesn't care whether it's predicting English words or something else. It predicts tokens based on context. If the context includes information about available tools, the model can predict tool calls instead of prose.

Consider this scenario. A model is told: "You have access to a tool called WEATHER that takes a location and returns the current weather." Then a user asks: "What's the weather in Noosa?"

The model predicts what comes next. And what comes next, given that context, is a tool call.

Context given to the model:

You are a helpful assistant. You have access to WEATHER(location).

User asks:

"What's the weather in Noosa?"

Model predicts:

WEATHER("Noosa")

The model predicts a tool call because the context told it WEATHER is available and the user asked about weather.

This isn't magic. The model is doing exactly what it always does - predicting the most likely next tokens given the context. It's just that the context now includes tool definitions, so tool calls become valid predictions.

The Loop

Predicting a tool call is one thing. Actually executing it is another. The model can't run code or make HTTP requests - it just produces text. Something else has to take that text, recognise it as a tool call, execute it, and return the result.

This is where the agent loop comes in. It works like this:

The model sees the context (system prompt, conversation history, tool definitions)
The model predicts a response - which might include a tool call
If there's a tool call, something outside the model executes it
The result is added to the context
Go back to step 1

Watch this play out step by step.

Context Window 0 tokens

Click "Next" to step through the agent loop.

Step 0 of 5

Notice what happens: the model doesn't "know" the weather in Noosa. It predicts a tool call. Something external executes that call and returns real data. The model then predicts a natural language response using that data.

The loop is what makes this feel like agency. The model keeps responding to new information until the task is done.

From APIs to a Computer

WEATHER(location) is a toy example. It's a narrow tool that does exactly one thing. Useful for demonstrating the concept, but limited in practice.

The real power comes when you give the model a general-purpose tool. Not "check the weather" but "run a command on a computer."

Compare these two approaches to the same question:

Narrow Tool

Context Window 0 tokens

Click "Next" to begin.

0/4

General Tool (Bash)

Context Window 0 tokens

Click "Next" to begin.

0/4

Same outcome. But one tool is narrow and predefined. The other is open-ended - the model can construct any command a human could type.

The bash tool doesn't know about weather. It knows how to run commands. The model predicts the right command to get weather data, and bash executes it. This is a fundamentally different level of capability.

Files as Memory

Language models have no memory across conversations. Each session starts fresh. The model doesn't remember what you discussed yesterday or what it learned last week.

But if the model can read and write files, it can simulate memory. It can save information to disk during one conversation and retrieve it in another. The files persist even when the context window is cleared.

Context Window 0 tokens

Watch how files provide persistence across conversations.

0/9

File System ~/documents/

No files yet

The file persists across the conversation boundary. The second conversation has no memory of the first - but it can read the file that the first conversation created. This is how agents can maintain continuity across sessions.

Bash as the Universal Tool

Once you have bash access, you have access to everything the operating system can do. Download files from the internet. Parse and transform data. Install software. Chain operations together.

Watch the model handle a multi-step task that requires combining several tools:

Context Window 0 tokens

See how the model chains bash commands to complete a complex task.

0/6

File System ~/projects/

No files yet

The model doesn't have a built-in "count words on webpage" capability. It predicts the bash commands that would accomplish that task, and the system executes them. Each result becomes part of the context for the next prediction.

Where It Breaks

General-purpose tool access is powerful. It's also dangerous. The same mechanism that lets the model help you organise files can let it delete them. The same prediction that generates a useful curl command can generate a destructive one.

The model doesn't understand consequences. It predicts tokens. A command that wipes a disk is just another valid prediction if the context makes it seem appropriate.

Test your intuition. Which of these commands should raise concern?

Select all commands that could cause problems:

This is why real agent systems include sandboxing, permission systems, and human approval steps. The model predicts; something else decides whether to actually execute. The prediction is only as safe as the guardrails around it.

The Punchline

An AI agent is not a new kind of intelligence. It's a language model doing exactly what it always does - predicting tokens - but with two additions:

Tool definitions in the context that make tool calls valid predictions
A loop that executes those calls and feeds results back in

The model isn't smarter when it has tools. It's connected to things that extend what prediction alone can accomplish. It still doesn't know, understand, or intend anything. It predicts what a helpful assistant would do, and then does it.

The question isn't "is it intelligent?" The question is: what are you willing to let a prediction engine touch?