Context Windows

View as MarkdownOpen in Claude

Every large language model has a context window: a fixed-size container that holds everything the model reads before generating a response. Think of it as the model’s working memory. Whatever fits inside the window is what the model reasons about. Anything outside it doesn’t exist.

The window has a hard capacity limit measured in tokens (roughly ¾ of a word). A model with a 128K-token context window can process about 96,000 words in a single pass. That sounds like a lot, but it fills up faster than you’d expect.

For information about managing your Moveworks Assistant’s conversation context window, see Conversational Context.

What Goes Into the Context Window

At the start of a conversation, the context window is mostly empty. A baseline set of instructions and configuration takes up the first chunk of space. Then, as the conversation progresses, more content gets added.

Here’s what a typical context window contains:

ComponentDescriptionTypical Size
System promptBase instructions that shape the model’s behaviorSmall to moderate
Plugin configurationDescriptions of available tools, their parameters, and schemasModerate
Slot descriptionsDefinitions, validation rules, and resolver configs for each input the model collectsVaries by complexity
User messagesWhat the end user actually typedSmall per message
Action outputsFull API responses from every action that has firedCan be large (1-5KB+ of JSON each)
Conversation historyAll previous turns in the conversationGrows with every exchange

The first few items are always present. They’re the cost of doing business. User messages tend to be small. The real space consumers are action outputs and conversation history, which accumulate over the life of a conversation and never leave the window.

How the Window Fills Up

Here’s a step-by-step look at context consumption during a multi-step interaction. Watch the progress bar fill as each component enters the window:

Context Window
Plugin Config
slot: meeting_time | description: "The date and time..." | data_type: string
User Message
"Book a meeting with Sarah tomorrow at 2pm in the big conference room"
Action #1: lookup_calendar
{ events: [...], free_slots: [...], timezone: "PST" }
Action #2: fetch_rooms
{ rooms: [...], capacity: [...], building: "HQ-3" }
Action #3: check_availability
{ conflicts: [...], suggestions: [...], attendees: [...] }
Context usage0%
The model re-reads all of this on every turn. More noise = worse predictions.
Step 0 of 5

Each action response stays in the window for the rest of the conversation. The model re-reads all of it on every subsequent turn to decide what to say and do next.

The Math of Context Consumption

Consider a practical example. A configuration with verbose descriptions (500 tokens), a system prompt (200 tokens), and three actions each returning roughly 2KB of data (~1,500 tokens per response) is already at 5,000+ tokens before the user has said anything meaningful. That’s a significant chunk of the window consumed by intermediate data.

Every token in the context window has a cost. That cost shows up in three ways:

  • Degraded accuracy: More noise in the window means worse predictions. The model has to sort through irrelevant data to find what matters.
  • Slower responses: More tokens to process means longer inference times.
  • Higher compute cost: Token count directly drives the cost of each model call.

If the model doesn’t need a piece of data to make its next decision, that data is noise. Noise degrades performance.

Why This Matters

The context window is a shared, finite resource. Every component competing for space inside it, from system instructions to API responses to conversation history, reduces the room available for everything else. Understanding this constraint is the foundation for building reliable AI applications.

The key tradeoff: you want the model to have enough context to make good decisions, but not so much that it drowns in irrelevant information. Striking that balance is what separates well-designed AI workflows from brittle ones.

To learn how models process the content inside the window (and where they struggle), continue to Attention & Limitations.

For an overview of all LLM concepts covered in this section, see Understanding LLMs.