***
title: How the Reasoning Engine Works
position: 1
excerpt: ''
deprecated: false
hidden: false
metadata:
title: ''
description: ''
robots: index
-------------
If you've read through [LLM Fundamentals](/agent-studio/agentic-ai/llm-fundamentals), you know the basics: LLMs are next-token predictors, context windows have hard limits, and attention follows a U-shaped curve where the middle gets lost. Here's how Moveworks built its reasoning engine to work within those constraints, and why the platform features you'll use in Agent Studio exist.
## What Goes Into the Context Window
Every time the reasoning engine makes a prediction, it reads a single assembled input. That input contains everything the engine needs to decide what to do next: system instructions, your plugin's configuration, slot descriptions, user messages, action outputs, and the full conversation history. (For the detailed breakdown, see [Context Windows](/agent-studio/agentic-ai/llm-fundamentals/context-windows).)
The platform serializes all of this into tokens and feeds it as a single sequence. The engine doesn't pick which parts to read. It reads everything, every turn, every time.
The context window is assembled fresh on each turn. As conversations progress and actions fire, the window fills with more data. Plugin config and slot descriptions are always present as a baseline cost. Each action response that returns stays in the window for the rest of the conversation.
This is why what you put into descriptions, how many actions you chain, and how much data your APIs return all directly affect reasoning quality. Every token competes for the engine's finite attention budget.
## How the Engine Reasons
The reasoning engine doesn't just predict once and stop. It runs a loop: propose a plan, check if the plan makes sense, execute it, look at what happened, then decide whether to keep going or respond to the user. This loop can run up to 10 iterations per request.
Here's what a single iteration actually looks like:
The engine picks from a priority-ordered list of reasoning strategies. The primary one is LLM-based — it reads the full context window and predicts the next action (which plugin to call, which slot to fill, what to say). But other policies can preempt it for specific situations: error recovery, keyword matching, or safety checks.
The selected policy generates a candidate plan — one or more steps the engine wants to take next. This could be a single plugin call, a response to the user, or multiple actions to execute in parallel.
Before anything runs, multiple evaluators check the proposed plan. Some are heuristic (structural validation), some are LLM-based (does this plan actually make sense given the context?). If the plan fails evaluation, the engine retries with a different approach. This is why you might see the engine "change its mind" mid-reasoning — it proposed something, the evaluators rejected it, and it tried again.
The accepted plan runs. Plugin actions fire, slots get filled, responses get generated. When multiple actions are independent of each other, the engine can execute them in parallel rather than sequentially.
After execution, the engine processes the results. If a plugin returned a small response, it goes straight into the context window. If it returned a large response (thousands of tokens), the platform stores it as a named variable and gives the engine a truncated preview instead. (More on this in [The Observation Interpreter](#the-observation-interpreter) below.)
The executed step and any interpretation get added to the conversation history. The engine then decides: are there more steps to take, or is it time to respond? If there's more work, the loop repeats with the updated context.
The loop exits when the engine produces a response to the user with no further actions to take.
The engine is a next-token predictor, not a rule executor. Every decision in the loop above — which policy to use, what plan to propose, whether to keep going — comes from the LLM reading the full context window and generating the most probable continuation. Natural language instructions in descriptions are processed as probabilistic signals, not deterministic commands. The engine might follow them 9 out of 10 times and hallucinate on the 10th.
For how these individual cycles chain together into multi-turn planning, execution, and user feedback, see [Reasoning Loops](/agent-studio/agentic-ai/agentic-reasoning-engine/reasoning-loops).
## Why Attention Matters for Your Plugins
You already know from [LLM Fundamentals](/agent-studio/agentic-ai/llm-fundamentals) that attention is finite and U-shaped. Here's what that means in practice when you're building plugins.
Every token you add to a slot description, every field in an action response, every line in a system prompt competes for the same fixed attention budget. As the context window fills, each individual token gets less focus. And the tokens in the middle of the window (between the plugin config at the start and the most recent user message at the end) sit in the attention dead zone.
This creates three specific problems that show up repeatedly in real plugins:
A 150-word slot description generates 60+ tokens of instructions competing with everything else in the window. The engine might follow all of it, half of it, or none of it. You have no guarantee.
Every API response from an action gets added to the context window. Chain three actions and you've got three payloads stacked up. The middle payload sits in the attention dead zone. The engine reads the first action's output (strong attention), skims the second (weak attention), and reads the third (strong attention). If the second action returned the critical data, it's lost in the middle.
Natural language instructions compete with everything else for attention. "Always format currency as \$X,XXX.XX" in a description is a hope, not a rule. The engine processes it as a suggestion. It'll work in testing and break in production on the edge case you didn't anticipate.
## The Observation Interpreter
When a plugin action fires, its response goes into the context window for the engine to reason about on the next iteration. But what happens when that response is huge — thousands of tokens of JSON from a search query or a database lookup?
The platform runs an **observation interpreter** between action execution and the next reasoning cycle. It checks the size of each plugin response:
* **Small responses** (under the token threshold) pass through unchanged. They go into the context window as-is.
* **Large responses** get stored as a named variable with a JSON schema describing its structure. The engine receives a truncated preview of the data plus an instruction that the full dataset is available through the Code Interpreter plugin. This keeps the context window from being flooded with raw API data while still giving the engine enough information to reason about what it got back.
* **Image responses** get compressed and re-uploaded in a format the engine can process.
Both the original action result and the interpreter's summary get appended to the conversation history. The engine sees enough to decide what to do next without burning its entire context budget on one large payload.
If your plugin's action returns large JSON responses and you notice the engine isn't referencing specific fields from the output, this is likely why. The engine is working from a truncated preview, not the full response. Design your action outputs to put the most important data first, or use compound actions to pre-filter the response before it enters the context window.
## Moving Logic Out of the LLM
The problems above — bloated descriptions, lost-middle data, unenforced rules — all share a root cause: you're asking the LLM to do work that doesn't require intelligence. Parsing a date, validating an email format, combining three API calls — none of that needs probabilistic reasoning. It needs code.
Moveworks' platform features let you move that logic from the LLM (probabilistic, attention-limited) into deterministic execution (runs every time, no token cost for the logic itself). The data still flows through the context window, but the *decision-making* about how to process it doesn't.
Run validation and transformation in code. The slot description still sits in the context window — the engine needs it to understand *what* to collect. But instead of writing a paragraph telling the engine *how* to parse dates or validate email formats (instructions it might ignore), you configure a resolver that executes deterministically. The engine collects the raw input; the resolver handles the rest. Keep descriptions focused on what a slot captures, not processing rules.
Combine multiple API calls and return a single result. The intermediate results never enter the context window — only the final output does. Instead of three separate payloads stacking up (with the middle one in the attention dead zone), the engine sees one clean response. Fewer payloads means less total context consumption and no lost-middle problem.
Control the shape of data the engine sees. You define a JSON structure inside the output mapper, and that structure is what the observation interpreter and reasoning engine work with. This is the real enforcement mechanism — the engine receives data in the exact shape you specified, not whatever raw format the API returned. Design your data mappers carefully, because that mapped output is exactly what enters the context window on every subsequent turn.
**Descriptions inform understanding; platform features enforce behavior.** Use descriptions to help the engine understand intent and context. Use resolvers and validators for input processing, compound actions to reduce context pressure, and structured outputs in output mappers to control what data shape the engine sees. If you find yourself writing instructions in a description that say "always do X," that's a signal to use a platform feature instead.
## What to Read Next
How individual prediction cycles chain into planning, execution, and user feedback loops.
How the platform manages semantic memory, episodic memory, and conversation context across turns.
Governance mechanisms that ensure safety and accuracy of assistant interactions.
*Coming soon.* Deep dive into how the platform handles large action outputs, variable passing, and image processing between reasoning cycles.