***

title: Context Windows
position: 1
excerpt: ''
deprecated: false
hidden: false
metadata:
title: ''
description: ''
robots: index
-------------

Every large language model has a **context window**: a fixed-size container that holds everything the model reads before generating a response. Think of it as the model's working memory. Whatever fits inside the window is what the model reasons about. Anything outside it doesn't exist.

The window has a hard capacity limit measured in **tokens** (roughly ¾ of a word). A model with a 128K-token context window can process about 96,000 words in a single pass. That sounds like a lot, but it fills up faster than you'd expect.

<Callout intent="info">
  For information about managing your Moveworks Assistant's conversation context window, see [Conversational Context](/agent-studio/agentic-ai/assistant-behavior-framework).
</Callout>

## What Goes Into the Context Window

At the start of a conversation, the context window is mostly empty. A baseline set of instructions and configuration takes up the first chunk of space. Then, as the conversation progresses, more content gets added.

Here's what a typical context window contains:

| Component                | Description                                                                           | Typical Size                       |
| ------------------------ | ------------------------------------------------------------------------------------- | ---------------------------------- |
| **System prompt**        | Base instructions that shape the model's behavior                                     | Small to moderate                  |
| **Plugin configuration** | Descriptions of available tools, their parameters, and schemas                        | Moderate                           |
| **Slot descriptions**    | Definitions, validation rules, and resolver configs for each input the model collects | Varies by complexity               |
| **User messages**        | What the end user actually typed                                                      | Small per message                  |
| **Action outputs**       | Full API responses from every action that has fired                                   | Can be large (1-5KB+ of JSON each) |
| **Conversation history** | All previous turns in the conversation                                                | Grows with every exchange          |

The first few items are always present. They're the cost of doing business. User messages tend to be small. The real space consumers are **action outputs** and **conversation history**, which accumulate over the life of a conversation and never leave the window.

## How the Window Fills Up

Here's a step-by-step look at context consumption during a multi-step interaction. Watch the progress bar fill as each component enters the window:

<ContextWindowFilling />

Each action response stays in the window for the rest of the conversation. The model re-reads all of it on every subsequent turn to decide what to say and do next.

## The Math of Context Consumption

Consider a practical example. A configuration with verbose descriptions (500 tokens), a system prompt (200 tokens), and three actions each returning roughly 2KB of data (\~1,500 tokens per response) is already at 5,000+ tokens before the user has said anything meaningful. That's a significant chunk of the window consumed by intermediate data.

**Every token in the context window has a cost.** That cost shows up in three ways:

* **Degraded accuracy**: More noise in the window means worse predictions. The model has to sort through irrelevant data to find what matters.
* **Slower responses**: More tokens to process means longer inference times.
* **Higher compute cost**: Token count directly drives the cost of each model call.

If the model doesn't need a piece of data to make its next decision, that data is noise. Noise degrades performance.

## Why This Matters

The context window is a shared, finite resource. Every component competing for space inside it, from system instructions to API responses to conversation history, reduces the room available for everything else. Understanding this constraint is the foundation for building reliable AI applications.

The key tradeoff: you want the model to have enough context to make good decisions, but not so much that it drowns in irrelevant information. Striking that balance is what separates well-designed AI workflows from brittle ones.

To learn how models process the content inside the window (and where they struggle), continue to [Attention & Limitations](/agent-studio/agentic-ai/llm-fundamentals/attention-and-limitations).

For an overview of all LLM concepts covered in this section, see [Understanding LLMs](/agent-studio/agentic-ai/llm-fundamentals).