Structured Data Analysis

View as MarkdownOpen in Claude

The Problem: LLMs Can’t Do Math at Scale

LLMs are powerful reasoners, but they have a fundamental limitation: they struggle with mathematical and analytical tasks. A classic example is asking a model “How many r’s are in strawberry?” - it confidently answers two, when the correct answer is three. Without code execution, the model is guessing at character counts rather than actually counting. Ask it to verify with code, and it gets it right every time.

Now scale that problem up. If an LLM can’t reliably count letters in a single word, imagine asking it to compute an NPS score across 2,000 survey responses, or total ARR by industry across 500 accounts. It will summarize from a tiny sample, hallucinate totals, or silently truncate the data. The results look plausible but aren’t trustworthy. The Reasoning Engine needs a different tool for the job.

The Solution: Code Interpreter

The answer is to let the Reasoning Engine do what it’s good at. Understanding the question and writing code and then execute that code deterministically against the full dataset. Instead of the Reasoning Engine trying to reason 500 ARR values, it writes a Python script that computes the exact total and hands the result back.

This is called the Code Interpreter, and it’s a built-in skill of the Reasoning Engine. In the Moveworks platform, this capability is called Structured Data Analysis (SDA).

What is SDA?

SDA is the Reasoning Engine’s Code Interpreter capability. When a plugin returns structured data that’s too large for inline summarization, the Reasoning Engine automatically writes and executes Python code against the full dataset - performing aggregations, groupings, filters, charts, and more - then synthesizes the result into a final user-facing message.

SDA activates automatically. After your plugin’s final action completes, the output is checked against the 7K token limit. If the output exceeds that limit and is valid structured JSON, SDA takes over. There’s nothing to enable, wire up, or configure.

Without SDAWith SDA
Limited to ~7K tokens of inline dataHandles thousands of rows via code execution
Reasoning Engine attempts to summarize, often truncatingWrites targeted Python - aggregations, filters, charts
Users hit “too much data” errors on large datasetsTransparent to the user - they just get an answer

Consider a plugin that returns 2,247 NPS survey responses. Without SDA, the Reasoning Engine sees only a handful of records that fit within its context window. It produces a surface level summary with no actual NPS calculation, no regional breakdown, no theme frequency counts. With SDA, the same plugin and same dataset produces a complete statistical analysis: overall NPS score, promoter/passive/detractor splits by region, comment theme extraction with frequencies, and actionable signals - all computed from the full 2,247 records.

SDA triggers based on output size alone. There’s no configuration needed. Your only job is to return well-structured JSON. This guide covers how to do that effectively.

How SDA Works

Data Flow Architecture

Every plugin response follows one of three paths based on its size:

  1. ≤ 7K tokens: The Reasoning Engine summarizes the data. No code execution involved.
  2. > 7K tokens: The output routes to SDA. SDA samples a snippet of the dataset to understand its structure, writes Python code against the full data, executes it, and hands the result back to the Reasoning Engine for the final user facing message.
  3. > 100MB: Hard failure - the entire plugin response is rejected and the user sees an error.

The key implication: your plugin doesn’t decide whether SDA runs. The size of the final output decides. Two conditions must both be true for SDA to activate:

  1. Output exceeds 7K tokens. Below this threshold, the Reasoning Engine handles everything with reasoning alone.
  2. Output is valid structured JSON. SDA will not activate on raw text, semi-structured content, or data embedded inside a string field (e.g., "data": "{\"accounts\": [...]}" ). If your source API returns stringified JSON, parse it in your output mapper before returning it.

Critical Limits

ThresholdValueWhat HappensWhy It Matters
Token limit7K tokens (~28KB)SDA activatesThis is the routing threshold. Below it, the Reasoning Engine handles your data with reasoning alone. Above it, the Code Interpreter takes over.
Response size100MBHard failure - the plugin returns nothingThe entire response is rejected.

Designing Your Output

SDA can only work with what it receives. It doesn’t know what your plugin could return it only sees what it did return, plus any metadata you attached. That makes output design the highest leverage area for improving SDA quality.

Include All Data SDA Needs in the Final Output

SDA receives only the output of your plugin’s action activities. It can’t see intermediate results inside compound actions, it can’t call APIs, and it can’t ask for more data. If a field isn’t in the final output, SDA will either skip it or produce an incorrect analysis.

This matters most for multi-step plugins. If your process fetches accounts from one API and enrichment data from another, make sure the final action merges both into a single output. Use MAP() to combine data from multiple actions:

1return:
2 output_mapper:
3 processed_employee_list:
4 MAP():
5 # Iterate over the output of Action A (List of users)
6 items: data.raw_user_list
7 converter:
8 # Map fields from the current item
9 id: item.user_id
10 display_name: item.name.$TITLECASE()
11
12 # Merge data from Action B (Company Context) into every item
13 office_location: data.company_context.primary_hq_address
14
15 # Create a derived field combining both sources
16 email_link: $CONCAT(["mailto:", item.username, "@",
17 data.company_context.email_domain], "")

In this example, data.raw_user_list comes from one action and data.company_context comes from another. The final output merges both so SDA has everything it needs in a single flat list.

Use Friendly Field Names

SDA uses field names as its primary signal for understanding what each column represents. A name like churn_rate is self-documenting; ChurnRate__c forces the model to infer meaning from a system-generated suffix. Friendly names also improve the user experience - these names surface in messages and charts.

System NameFriendly Name
ARR__cannual_recurring_revenue
MRR__cmonthly_recurring_revenue
NPS_Score__cnet_promoter_score
ChurnRate__cchurn_rate
Territory__cterritory
NumberOfEmployeesemployees
LastModifiedDatedate_last_modified

Flatten Nested Structures

SDA works best with tabular data - flat rows and columns that map cleanly to a dataframe. Deeply nested JSON forces SDA to write extra code to unpack the structure before it can even start the analysis, and that extra complexity increases the chance of errors.

Bad - Nested with cryptic API field names:

1{
2 "d": {
3 "results": [
4 {
5 "rec_id": "001A00000",
6 "attr": {
7 "n1": "Acme Corp",
8 "loc": {
9 "line1": "New York",
10 "geo_cd": "US-NY"
11 }
12 },
13 "fin": {
14 "amt1": 250000,
15 "amt2": 20833,
16 "pct_r": 8.2
17 },
18 "_metadata": {
19 "sys_mod": "2024-01-15",
20 "rec_type": "enterprise"
21 }
22 }
23 ]
24 }
25}

Good - Flat with friendly names:

1{
2 "accounts": [
3 {
4 "account_name": "Acme Corp",
5 "city": "New York",
6 "region": "US-NY",
7 "annual_recurring_revenue": 250000,
8 "monthly_recurring_revenue": 20833,
9 "churn_rate": 8.2
10 }
11 ]
12}

Complete Output Mapper Example

Here’s a production-ready output mapper that ties together everything above - friendly names, flat structure, and display_instructions_for_model for baseline analysis rules:

1MAP():
2 items: response.accounts
3 converter:
4 account_name: item.Name
5 annual_recurring_revenue: item.ARR__c
6 monthly_recurring_revenue: item.MRR__c
7 net_promoter_score: item.NPS_Score__c
8 churn_rate: item.ChurnRate__c
9 territory: item.Territory__c
10 industry: item.Industry
11 employees: item.NumberOfEmployees
12 next_renewal_date: item.NextRenewalDate__c
13 open_opportunities: item.OpenOpportunities__c
14 closed_won_opportunities: item.ClosedWonOpportunities__c
15 support_tickets: item.SupportTickets__c
16 display_instructions_for_model:>
17 CRM accounts. annual_recurring_revenue is USD. monthly_recurring_revenue is USD.
18 churn_rate is 0-100%. Group by territory when comparing regions.
19 Flag churn_rate > 15 with renewal in 90 days as at-risk.

Steering SDA with Instructions

You steer SDA’s behavior at two levels: the plugin name + description and the display_instructions_for_model field (which steers SDA on how to interpret the data once it has it).

Name and Describe Your Plugin Effectively

SDA uses your plugin’s name and description as context when writing code against your data. A descriptive name like NPS_Score_Analysis immediately tells SDA what domain it’s working in and what kinds of calculations make sense. A vague name like excel_data_plugin gives it nothing to work with.

The description matters even more. When your description lists the fields your plugin returns, SDA can map the user’s question to the right columns and choose appropriate analysis methods. If your description says “Returns insights on NPS score with account owner, internal scoring, customer, and region,” SDA knows to look for score fields, group by region, and segment by customer. If your description just says “Returns data from excel,” SDA is lacking critical context.

QualityPlugin NamePlugin Description
GoodNPS_Score_Analysis”Returns insights on NPS score with account owner, internal scoring, customer, and region”
Badexcel_data_plugin”Returns data from excel”

The bad example may fail a question like “Which accounts have the highest churn rate?” - nothing in the description signals that churn data is available.

Use display_instructions_for_model for Baseline Rules

Once the Reasoning Engine routes to your plugin, SDA decides what code to write based on two inputs: the user’s question and the display_instructions_for_model field attached to your data. Think of the user’s question as what to compute and the instructions as how to interpret the data while computing it.

The distinction matters. If you put "Calculate total ARR by industry" in your instructions, SDA might execute that calculation even when the user asks a completely different question. Instructions should define baseline rules that apply regardless of the query, not specific analyses.

Stick to rules that should hold true for every query against this dataset:

  • Units and formats - "annual_recurring_revenue is USD", "churn_rate is 0-100%"
  • Default groupings - "Group by territory when comparing regions"
  • Business rules - "Flag churn_rate > 15 with renewal in 90 days as at-risk"
1display_instructions_for_model:>
2 CRM accounts. annual_recurring_revenue is USD. churn_rate is 0-100%.
3 Group by territory. Flag churn_rate > 15 with renewal in 90 days as at-risk.

What NOT to Put in Instructions

Don’t use instructions to dictate specific analyses. That’s the user’s job - their question tells SDA what to compute. Instructions that overlap with user intent create conflicts, and SDA may ignore one or both.

QualityInstruction ExampleWhy
Good"annual_recurring_revenue is USD"Tells SDA the unit - applies to any query involving revenue
Good"Flag churn_rate > 15 as at-risk"Business rule that should surface in every analysis touching churn
Bad"Calculate total ARR by industry"If the user asks about churn by territory, SDA now has conflicting directives
Bad"Please fuzzy match Telecam to Telecom"One-off data quality issues belong in your data pipeline, not in model instructions
Analysis Instructions Are Not Guaranteed

Instructions that tell the model how to analyze (e.g., "fuzzy match Telecam to Telecom") are unreliable. SDA may skip them entirely, or worse, apply them in contexts where they don’t make sense. Fix data quality issues upstream in your output mapper or API call, not in instructions.

Size Optimization

Your plugin’s response has a hard ceiling (100MB), and every field you return eats into that budget. But size isn’t just about avoiding failures. Smaller, focused datasets also produce better analysis. When SDA has 15 fields to work with instead of 50, it writes simpler code with fewer opportunities for error.

Optimize at three levels: pre-filter at the API, prune the schema, and post-filter with DSL.

A 500-row dataset with 30 fields is dramatically larger than the same 500 rows with 10 fields. Pruning aggressively can be the difference between a dataset that fits size limitations and can shave seconds off your plugin run time.

Pre-filtering at the API Level

The cheapest data is data you never fetch. Use plugin inputs to narrow the query before hitting the external API. You can define multiple filter inputs on the same plugin and mark them as optional, so the plugin works broadly but can be scoped when the user provides constraints:

  • Date ranges - start_date and end_date to bound time-series queries
  • Current user - assigned_to the requesting user, so each person sees only their records
  • Numeric thresholds - e.g., only accounts with more than 500 employees
  • Optional inputs - Mark all of the above as optional so the plugin still works when the user doesn’t specify a filter

Schema Pruning with MAP()

MAP() lets you project only the fields SDA needs from the raw API response. Drop system metadata, internal IDs, audit timestamps, and long text blobs. If SDA won’t reference a field in a calculation or summary, it shouldn’t be in the output.

1MAP():
2 items: response.accounts
3 converter:
4 account_name: item.Name
5 annual_recurring_revenue: item.ARR__c
6 territory: item.Territory__c
7 churn_rate: item.ChurnRate__c
8 # Omitted: Id, SystemModstamp, CreatedById,
9 # LastModifiedById, Notes__c, Description, etc.

Post-filtering with FILTER() and other Data Mapper Syntax

When the source API doesn’t support server-side filtering, use DSL and Data Mapper expressions to remove records inside your plugin’s process before the output reaches SDA:

1FILTER():
2 items: data.accounts
3 condition: item.annual_recurring_revenue > 100000

Debugging and Validation

After launching your plugin, verify SDA is working correctly:

  1. Look for the “Running analysis” indicator. When SDA activates, the chat shows a progress indicator during code execution. If you don’t see it, your output is likely under the 7K token threshold or isn’t valid structured JSON.
  2. Check the code citation. SDA generates a reference labeled “Analysis Code” alongside its response. Expand it to see the exact Python code that ran - including how it parsed your fields, what aggregations it performed, and whether it interpreted your data correctly. This is your primary debugging tool.
  3. Test across question types. The same dataset should support different analyses depending on the user’s question. Ask for totals, groupings, trends, and filters to confirm SDA handles your schema correctly across all of them.

Common Pitfalls

PitfallWhat Goes WrongFix
Incomplete final outputYour plugin fetches accounts in step 1 and enrichment data in step 2, but only step 2’s output reaches SDA - so SDA can’t answer questions about the account fields.Merge all data into the final action’s output. Verify it contains every field the user might ask about. May need a compound action
System field names (ARR__c)SDA sees ARR__c and doesn’t know if it’s annual revenue, an account type, or a status code. It may write code that treats it as a string or ignores it entirely.Use MAP() to rename: annual_recurring_revenue: item.ARR__c
Deeply nested JSONSDA has to write extra code to unpack response._sfdcMeta.rec_obj.info before it can start the actual analysis. More code means more room for bugs.Flatten to { "city": "NY" }. Keep nesting only where it’s semantically meaningful.
Data serialized as a stringYour API returns data inside a string field ("data": "{\"accounts\": [...]}") instead of structured JSON. SDA can’t parse or operate on it.Return real JSON objects and arrays. If the source API returns stringified data, parse it in your output mapper.
Analysis logic in instructionsYou add "Calculate total ARR by industry" to instructions. A user asks about churn by territory, and SDA now has two conflicting directives.Instructions are for baseline rules (units, formats, thresholds). Let the user’s question drive the analysis.
Multiple overlapping pluginsThree plugins cover the same Salesforce accounts with different filters. The Reasoning Engine picks GetEnterpriseAccountsPlugin when the user wanted all accounts.One plugin per data source. Let SDA and user queries handle the filtering.