Understanding LLMs
Before you build on top of an LLM-powered platform, you need a working mental model of what the LLM actually does. Not the full transformer architecture. Just enough to understand why certain design decisions produce better results than others.
Next-Token Prediction
At its core, every LLM does one thing: read all the tokens it’s been given and predict the next one.
That’s it. Tokens in, prediction out. When a user says “Book a meeting with Sarah,” the model isn’t “understanding” the request in a human sense. It’s processing tokens and predicting what should come next based on everything it can see.
What counts as a token? Roughly, a token is a word or word fragment. “Meeting” is one token. “Unsubscribe” might be split into two (un + subscribe). Numbers, punctuation, whitespace: all tokens. Every character costs something.
This is next-token prediction, and it’s the only operation the model performs. It reads a sequence of tokens, calculates probabilities for what comes next, picks one, appends it, and repeats. The entire output, whether it’s a sentence, a plan, or a function call, gets generated one token at a time.
The model’s prediction quality depends entirely on what’s in its input. Better input produces better predictions. This single fact drives almost every best practice in this documentation.
The Context Window
The input that the model reads before generating a response is called the context window. Think of it as a container with a hard capacity limit. Everything the model reasons about must fit inside.
In an agentic AI system, the context window holds everything: system instructions, tool descriptions, user messages, API responses, and conversation history. All serialized into tokens and fed to the model as a single input. The model doesn’t get to pick which parts to read. It reads all of it, every time, on every turn.
For a detailed breakdown of each component and how the window fills up during a conversation, see Context Windows.
Every token you add to a description, every field in an action output, every line in a system prompt competes for the model’s attention. You’re not “giving the model more information.” You’re adding signal that must compete with everything else in the window.
Why This Matters for AI Engineers
Context management is the skill that separates AI engineers who build reliable systems from those who build fragile ones. Understanding next-token prediction and the context window changes how you approach everything on an LLM-powered platform.
The model has finite space and finite attention. The context window isn’t just a size constraint. Models pay uneven attention across their input. Content in the middle of a long context tends to get less focus than content at the beginning or end. Even if your input fits, the model might not weigh every part equally.
Better context means relevant and focused, not more. The instinct is to give the model as much information as possible. But every token you add dilutes the tokens already there. A concise, well-structured input will outperform a verbose one that technically contains more facts.
Design decisions flow from these constraints. When you’re writing a slot description, deciding how much data an action returns, or choosing between chaining actions versus combining them, you’re spending a limited budget of tokens and attention.
Moveworks handles a lot of this for you. The reasoning engine manages how the context window is assembled, and platform features like validators, resolvers, and compound actions move logic out of the window entirely. You don’t need to manually manage token placement.
But understanding why those features exist makes you a better engineer. When you know that every token in a slot description competes for attention, you write tighter descriptions. When you know chained action outputs push data into the attention dead zone, you reach for compound actions instead. The platform gives you the tools — these fundamentals tell you when to use them.
The next article covers Context Windows in depth, including how the window fills up during a conversation and practical strategies for managing its contents.