Data Retrieval Cookbook

Pattern Overview

The Data Retrieval pattern is how Agent Studio plugins connect a natural language request to the right query against an external system like ServiceNow, Workday, or a data warehouse. Instead of dumping all available data, the plugin carefully pulls back the specific records or fields needed to answer the user’s request, while applying business logic, filters, and performance safeguards along the way.

In practice, this pattern can power both simple lookups and complex analytics.

For example, a simple lookup plugin might fetch a clean list of current applicants for an open job requisition from Workday, showing just the names, stages, and last updated dates.

1 [
2   {
3     "name": "Alice Chen",
4     "stage": "Phone Screen",
5     "last_updated": "2025-09-15"
6   },
7   {
8     "name": "Brian Lopez",
9     "stage": "Onsite Interview",
10     "last_updated": "2025-09-18"
11   },
12   {
13     "name": "Sofia Patel",
14     "stage": "Offer Extended",
15     "last_updated": "2025-09-20"
16   }
17 ]

Conversational LLM output

“You currently have three applicants in the pipeline.

Alice Chen is at the phone screen stage, last updated on September 15.

Brian Lopez is further along and has an onsite interview scheduled, updated on September 18.

And Sofia Patel already has an offer extended, with the latest update on September 20.”

You could scale up to a heavy analytics scenario, like analyzing thousands of rows in a data warehouse to surface insights into Net Promoter Score (NPS) trends—summarizing the overall score, highlighting changes across regions, and extracting themes from customer comments. Below is an example of how a heavy analytics scenario can work well conversationally.

Given a snippet of a large dataset returned by an API

Respondent	Region	Score	Comment
001	Americas	9	“Love the product, great support.”
002	EMEA	3	“Hard to integrate with our stack.”
003	APAC	7	“Good overall, but slow response times.”
004	Americas	10	“Fantastic experience.”
005	EMEA	4	“Too expensive for the value.”

Conversational summarization

(Reasoning engine tries to summarize large dataset without structured analysis)

Data-aware explanation (with SDA)

(Reasoning Engine powered by the Data Retrieval pattern with Structured Data Analysis)

Whether the task is small or large, the principle is the same: translate user intent into a precise, efficient query, return only what’s relevant, and present results in a structured way that the model and the end-user can trust

Query Language Generation

When building retrieval plugins, we combine slots and static typed query components for reliability and precision.

Static components: define the non-negotiable parts of a query (e.g., object type, mandatory fields for downstream reasoning).
Slots: dynamically filled from user input (e.g., dates, account names, owner emails).

This blend is expressed in the Moveworks data mapping language and DSL, which let developers define reusable templates with placeholders that the reasoning engine populates at runtime.

Plugin consumers are the business owners of the plugin. They supply the business logic by deciding which fields must always be included and which can remain optional. Encoding these rules up front ensures queries are accurate and efficient, returning only the necessary data, reducing payload size, minimizing latency, and avoiding irrelevant attributes.

This manipulation of the query logic should happen as the query is passed into the action in your conversational process as an input mapping with Moveworks Data Mapper as highlighted below

1 query: $CONCAT(["SELECT", data.account_name, "FROM Opportunity"]," ")

These examples demonstrate how these can work in actions

Hardcoded query

The entire query is hardcoded in the action and should always return all data.

Slot-filled component

Single component in the query supplied by the conversational/compound action

Fully dynamic query

Entire statement supplied by the conversational/compound action

Simple Queries

For simple use cases, the query can be fully static or involve only a single slot . A fully static query might always return the same fixed set of fields such as fetching the list of all open requisitions, or returning the current user’s profile record. A single-slot query adds just one dynamic element, like filtering opportunities by a user-provided close date, or pulling applicants for a specific role. These are straightforward to configure because the business logic is minimal: you define the core fields once, ensure they’re always included, and allow one slot to vary based on user input. This keeps the template lightweight, predictable, and fast to execute.

Example simple query input mapping (SOQL)

Complex Queries

For advanced scenarios, queries often combine multiple slots with conditional logic. A Salesforce opportunity query, for instance, might include close_date_range, owner_email, and include_notes. If the user specifies a date filter, it’s applied; if they also request account team details, fields like Owner.Name and CSM__r.Name are added dynamically.

The Moveworks Data Mapper handles these optional “property slots” and enforces business rules such as which attributes are safe to fetch or which filters must be hardcoded. The goal: return only what’s needed to answer the request, avoiding overfetching while still letting users add conditions in natural language.

Example advanced input mapping (SOQL)

1 query:
2   EVAL():
3     args:
4       select_sql:
5         RENDER():
6           template:
7             - "SELECT {{ field_list }} FROM Opportunity"
8           args:
9             field_list: 
10               CONCAT():
11                 items: 
12 	                - Id
13 	                - Name
14                 separator: ", "
15       filters:
16         FILTER():
17           items:
18             - CONDITIONAL():
19                 condition: data.close_date_before != "ANY"
20                 on_pass: |
21                   $CONCAT(["CloseDate >= ", data.close_date_before])
22                 on_fail: '""'
23             - CONDITIONAL():
24                 condition: data.close_date_after != "ANY"
25                 on_pass: |
26                   $CONCAT(["CloseDate <= ", data.close_date_after])
27                 on_fail: '""'
28 
29     expression: |
30       IF $LENGTH(filters) > 0 THEN
31         $CONCAT([select_sql, " WHERE ", filters.$CONCAT(" AND ", true)])
32       ELSE
33         select_sql

Property Selection In Action Input Arguments

Two patterns when deciding on single plugin per business use case vs one plugin to many business use cases

A. One Plugin per Attribute Set

This approach creates a single, purpose-built plugin for a defined set of attributes. Each plugin has a fixed property list, which means queries are straightforward, with little to no conditional logic. Because the schema is predictable, these plugins are easy for the reasoning engine to handle, making them highly reliable and fast to execute. The simplicity keeps complexity low and ensures consistent results. However, the trade-off is scalability: every new attribute set requires its own dedicated plugin, which can quickly lead to a proliferation of many small, narrowly scoped plugins.

Input arguments for Plugin for AE’s to perform lookup

1 query: '"SELECT StageName, Amount, CloseDate, Owner.Name, Champions__c, Notes__c
2 FROM Opportunity
3 WHERE CloseDate = THIS_QUARTER'"

Input arguments for Plugin for SDR’s to perform lookup

1 query: '"SELECT Account.Owner.Name, StageName, Tech_Stack_Flags__c, Notes__c
2 FROM Opportunity
3 WHERE IsActive__c = true'"

B. Property Group Slots (Recommended for Breadth)

Instead of creating a separate plugin for each attribute set, this pattern uses conditional logic to insert groups of semantically related fields—called property group slots—based on the persona or the user’s utterance. This approach allows a single plugin to flex across multiple use cases without exploding into dozens of narrow plugins. It gives breadth and flexibility: account executives, SDRs, renewals managers, and executives can all query the same plugin, but each gets the fields returned relevant to their role. The key is balance keeping the slot count low improves performance and reasoning quality.

Example SOQL groupings:

AE: SELECT StageName, Amount, CloseDate, Owner.Name, Champions__c, Notes__c FROM Opportunity WHERE CloseDate = THIS_QUARTER
SDR: SELECT Account.Owner.Name, StageName, Tech_Stack_Flags__c, Notes__c FROM Opportunity WHERE IsActive__c = true
Renewal: SELECT Renewal_Date__c, Contract_End_Date__c, Growth_ARR__c, User_Count__c FROM Opportunity WHERE Renewal_Date__c < NEXT_90_DAYS
Exec : SELECT Team__c, Key_Touch_Fields__c, Growth_ARR__c, Exec_Notes__c FROM Opportunity WHERE Amount > 100000

Input arguments in Action for single a plugin using user department on the user records:

1 query:
2   RENDER():
3     template: {{ selected_sql }}
4     args:
5       selected_sql:
6         LOOKUP():
7           key: meta_info.user.department.$LOWERCASE()
8           mapping:
9             ae: > 
10               "SELECT StageName, Amount, CloseDate, Owner.Name, Champions__c, Notes__c
11                  FROM Opportunity WHERE CloseDate = THIS_QUARTER"
12             sdr: >
13               "SELECT Account.Owner.Name, StageName, Tech_Stack_Flags__c, Notes__c
14                   FROM Opportunity WHERE IsActive__c = true"
15             renewal: >
16               "SELECT Renewal_Date__c, Contract_End_Date__c, Growth_ARR__c, User_Count__c
17                   FROM Opportunity WHERE Renewal_Date__c < NEXT_90_DAYS"
18             exec: >
19               "SELECT Team__c, Key_Touch_Fields__c, Growth_ARR__c, Exec_Notes__c
20                   FROM Opportunity WHERE Amount > 100000"
21           default: >
22             "SELECT Name, Title, Department, Manager.Name, Email, Phone
23                   FROM User WHERE IsActive = true"

This could also be done by creating a slot in the conversational process for what type of lookup the user is attempting by creating a slot named “query_intent” with a description like “The intent of the users lookup for fields relating to one of these personas either “renewal”, “executive”, “sales development rep”, “account executive” or “generic”. The value of this slot is one of the personas identified or generic if unidentifiable from the query”

1 query:
2   RENDER():
3     template: {{ selected_sql }}
4     args:
5       selected_sql:
6         LOOKUP():
7           key: data.query_intent.$LOWERCASE()
8           mapping:
9             ae: > 
10               "SELECT StageName, Amount, CloseDate, Owner.Name, Champions__c, Notes__c
11                  FROM Opportunity WHERE CloseDate = THIS_QUARTER"
12             sdr: >
13               "SELECT Account.Owner.Name, StageName, Tech_Stack_Flags__c, Notes__c
14                   FROM Opportunity WHERE IsActive__c = true"
15             renewal: >
16               "SELECT Renewal_Date__c, Contract_End_Date__c, Growth_ARR__c, User_Count__c
17                   FROM Opportunity WHERE Renewal_Date__c < NEXT_90_DAYS"
18             exec: >
19               "SELECT Team__c, Key_Touch_Fields__c, Growth_ARR__c, Exec_Notes__c
20                   FROM Opportunity WHERE Amount > 100000"
21             generic: >
22               "SELECT Name, Title, Department, Manager.Name, Email, Phone
23                   FROM User WHERE IsActive = true"
24           default: >
25             "SELECT Name, Title, Department, Manager.Name, Email, Phone
26                   FROM User WHERE IsActive = true"

The logic you choose should be determined based on what aspects you want to be deterministic vs probabilistic there may be scenarios where you want guardrails on the query and will always enforce the query based on user profile attributes or the scope of which fields are determined by the user and reasoning engine are more limited it is not one size fits all. The other thing to be cognizant of is over-asking of slots, in a lot of cases it will provide better user experience to statically determine more slots or infer them rather than require the user provide them.

Slots, Validation, and Resolvers

1. Slots: The Foundation

If you think of conversational processes as programs, then you can think of Slots as your program’s variables. The Moveworks Reasoning Engine’s job is to navigate the fluidity/ambiguity of a natural language conversation and fill appropriate (often structured) values for Slots so that the conversational process can run smoothly and operate on the right information. For data retrieval processes, Slots can be a powerful mechanism to properly collect critical information that needed to execute a query.

The value for a Slot can involve pulling information directly from the user or even retrieving dynamic objects from other systems (e.g. using a Resolver to get a ServiceNow ticket)”. To get a value for a Slot, the reasoning engine heavily relies on the slot name, description, and data type that you provide in the configuration:

Good slot names are expressive and unambiguous in name, and often allude to the slot’s broader purpose in one go (close_date, customer_name, contract_term_months). This is a primary indicator to the reasoning engine of what the slot truly represents and why it matters. (Tip: programming conventions for “good variable naming” often apply here).
Clear descriptions eliminate ambiguity and tell the remaining story of the slot. You want to set up the reasoning engine for success: instead of saying “The date”, say “The target due date for the project task, to be extracted in RFC 3339 timestamp format (e.g., 2025-09-22T00:00:00Z)”. This tells the Moveworks Reasoning Engine not just the type of date that the slot represents, but also additional key formatting information so the value can be appropriately expressed when it’s later used as input in a downstream API.
Explicit data types (e.g., string, number, boolean, array) indicate to the reasoning engine how the ultimate value for the slot should be shaped, and ensures the value is compatible for downstream uses. Based on the data type (especially if the type is complex), the reasoning engine can also lean on special mechanisms (Resolvers) attached to the type to optimize the value retrieval for the Slot.

Example:

1 - name: filtering_date
2   description: "The date the user provided for filtering the response, always in RFC 3339 timestamp format (e.g., 2020-06-18T17:24:53Z)"
3   data_type: string

If the user says “tomorrow”, the LLM will output 2025-09-23T00:00:00Z in the specified format rather than the raw text.

2. Slot Validation: Guardrails for Business Logic

Slots need to be appropriately precise and adherent to constraints in order to meet the structured needs of data retrieval processes. The reasoning engine’s is adept at correlating fluid/complex conversational context with a variety of configuration info (e.g. slot name/description/type metadata) to generate a formal slot value. However, due to its probabilistic nature and varying precision, it’s not as ideal for taking care of precise operations or exact computations on its own.

To enforce rules or compute results over multiple slots, we need to lean on Slot Validation Policies, powered by Moveworks DSL. Encoding a formal policy will enforce rules deterministically and perform more reliably than overprompting slot descriptions with logical instructions (e.g. “Make sure the date is in the future and after the contract start date”). An explicit validation policy ensures that slots produce values that are not just valid in format, but also compliant with business rules.

Example:

1 - name: start_date
2   description: "The date the user provided, in RFC 3339 format (e.g., 2025-09-22T00:00:00Z)"
3   data_type: string
4   slot_validation_policy: value.$PARSE_TIME() > $TIME()

Here the slot captures any user-provided date, but the validation policy enforces that it must not be in the past by using DSL we take the slot collected value “value” , use $PARSE_TIME() to convert the string to unix format, then use a comparison against $TIME() (the current time in unix format). This pattern also works for cross-field checks (end_date > start_date) or validating against values in the data bank (like comparing to a previously retrieved contract date).

Important note is if you compare 2 slots against each other in this way you must ensure the other slot has been collected already.

3. Resolver Strategies: Extending Slots

Resolvers extend the behavior of slots by offering an explicit mechanism to determine the value of a slot. Instead of leaving everything to inference, a resolver can either force a value to be selected from a fixed list (static resolver) or retrieved from a live dataset (dynamic resolver).

Static Resolvers: Used when the valid values are known and stable. You define a list of allowed options, and the user is presented the list to select from.
- Example: priority = low, medium, high.
- Great for categories, statuses, and enums.

Dynamic Resolvers: Used when the slot value should originate from an external system. The resolver makes an action call (e.g., “list Jira issues assigned to the user”), retrieves structured options, and then will fill the slot with the desired option, working with the user as necessary (e.g. match to user’s intent, ask the user to select from the retrieved options).
- Example: User says “my highest priority bug” → resolver looks up their Jira issues, compares them, and selects the right record.
- Perfect for dynamic objects like tickets, accounts, applicants, or documents. Strongly recommend this for strong types such as a User, serviceNowTicket, etc.
- Dynamic resolvers will call the action configured in the resolver when the slot collection happens.

Note: the output mapper must always point to an array if the output cardinality is set to interpret as a list of candidate values.

Given an action with an output like

1 {
2   "hasErrors": false,
3   "results": [
4     {
5       "referenceId": "ref1",
6       "id": "0015g00000N1ABC",
7       "success": true,
8       "errors": [],
9       "fields": {
10         "Name": "Acme Corporation",
11         "Industry": "Technology",
12         "BillingCity": "San Francisco",
13         "Phone": "+1-415-555-1234",
14         "Website": "https://www.acme.com"
15       }
16     },
17     {
18       "referenceId": "ref2",
19       "id": "0035g00000M9XYZ",
20       "success": true,
21       "errors": [],
22       "fields": {
23         "FirstName": "John",
24         "LastName": "Doe",
25         "Title": "CTO",
26         "Email": "john.doe@example.com",
27         "Phone": "+1-415-555-1111"
28       }
29     }
30   ]
31 }

The output mapping would be response.results

Resolvers will prompt the user during conversation for a selection if there isn’t enough context provided by the user to disambiguate on its own
The collection of either type will happen in conversation at the point when they are set as required slots inside an action or decision policy in a conversational process

4. Putting It All Together

Slots, validation, and resolvers are all tools in the same toolbox, but each is suited to a different layer of the problem.

When To Use Just A Basic Slot (name + description+ simple type)

Use a plain slot when:

The user’s input can be inferred directly from natural language without any business rules or external lookups.
Formatting and type expectations are enough to make the value useful (e.g., converting “tomorrow” into an RFC-3339 timestamp).
There are no strict constraints on the value to deterministically enforce.

Example: Capturing an email address, a date, or a free-text search term.

When To Add Validation

Add a slot validation policy when:

You need to enforce rules about the captured value (e.g., must be a future date, must be greater than zero, must fall within a certain range).
You want to achieve consistent slot values to meet strict business constraints, but different users might express their information in varying ways.
You’re validating against other fields or previously collected data.

Example: Ensuring end_date is later than start_date.

When to Use a Resolver

Introduce a resolver when:

The value cannot just be inferred by the reasoning engine, it must come from a constrained set of valid options.
For static resolvers, the list of valid values is known and unchanging (e.g., priority = low, medium, high).
For dynamic resolvers, the valid values depend on external systems and change frequently (e.g., retrieving the list of current Jira issues, Salesforce opportunities, or Workday applicants).

Example: Resolving “my open tickets” into a specific JiraIssue object returned by a lookup action.

Filter Semantics

How filters are built

Static components: non negotiable predicates.
Slots: user-driven values resolved at runtime.
Composition: STATIC AND SAFETY AND (CONTROLLED_VOCAB) AND (IDENTITY_OR_OWNERSHIP) AND (USER_SLOTS)
Defaults: treat missing/unknown as ANY → filter omitted.
Order of ops: resolve vocab → apply safety → add identity → then user slots.

A. Controlled Vocab Mappings

Goal: Normalize messy natural language to canonical filters.

Pattern

Maintain a lookup table from slot values → predicate fragments.
Accept synonyms and fall back to ANY (omit filter) if unmapped.

Examples (SOQL)

“customers” → Type IN ('Customer','Customer via MSP','Customer via Parent')
“prospects” → Type IN ('Prospect','Trial','POC')
“partners” → Type IN ('Partner','Reseller')

Example mapping (YAML / Data Mapper)

1 filters:
2   CONTROLLED_VOCAB():
3     items:
4       - CONDITIONAL():
5           condition: data.account_classification != "ANY"
6           on_pass:
7             LOOKUP():
8               key: LOWER(data.account_classification)
9               mapping:
10                 customer: '"Type IN ('Customer','Customer via MSP','Customer via Parent')"'
11                 prospect: '"Type IN ('Prospect','Trial','POC')"'
12                 partner:  '"Type IN ('Partner','Reseller')"'
13                 any:      '""'     # omit
14           on_fail: '""'

B. Safety / Trust Filters

Goal: Hard guarantees that results are fresh, relevant, and policy-safe.

Pattern

Always-on predicates; not controlled by the user.
Enforce recency, status, visibility, and tenancy.

Examples (SOQL)

Active/current records: CurrentOrUpcoming_Active__c = TRUE
Hide archived: IsArchived__c = FALSE
Tenant scope: OrgId__c = {{TENANT_ID}}
Recency guard: LastModifiedDate = THIS_YEAR (or parametric window)

Example (YAML / Data Mapper)

1 safety_filters:
2   FILTER():
3     items:
4       - '"CurrentOrUpcoming_Active__c = true"'
5       - '"IsArchived__c = false"'
6       - $CONCAT(["OrgId__c = ", data.tenant_id, "'"])
7       - CONDITIONAL():
8           condition: data.recency_window_days != "ANY"
9           on_pass:  $CONCAT(["LastModifiedDate >= ", data.recency_window_days])
10           on_fail:  '""'

C. Identity / Ownership Filters

Goal: Resolve “my X” to all relevant ownership fields, not just Owner.

Pattern

Register all fields that imply ownership or responsibility.
OR them together; keep each predicate selective.

Common fields

Owner.Email, CSM__r.Email, SE__r.Email, Implementation_Lead__r.Email, Renewal_Manager__r.Email, AE__r.Email, BDR__r.Email

Examples (SOQL)

“my accounts”

1 (Owner.Email = meta_info.user.email_addr
2  OR CSM__r.Email = meta_info.user.email_addr
3  OR SE__r.Email = meta_info.user.email_addr
4  OR Implementation_Lead__r.Email = meta_info.user.email_addr
5  OR Renewal_Manager__r.Email = meta_info.user.email_addr)

Combine with safety + classification:

1 WHERE IsArchived__c = false
2   AND CurrentOrUpcoming_Active__c = true
3   AND Type IN ('Customer','Customer via MSP','Customer via Parent')
4   AND (
5     Owner.Email = meta_info.user.email_addr OR
6     CSM__r.Email = meta_info.user.email_addr OR
7     SE__r.Email = meta_info.user.email_addr OR
8     Implementation_Lead__r.Email = meta_info.user.email_addr OR
9     Renewal_Manager__r.Email = meta_info.user.email_addr
10   )

Example (YAML / Data Mapper)

1 identity_filter:
2   RENDER():
3     template: >
4       WHERE IsArchived__c = false
5         AND CurrentOrUpcoming_Active__c = true
6         AND Type IN ('Customer','Customer via MSP','Customer via Parent')
7         AND (
8           Owner.Email = {{ user_email }} OR
9           CSM__r.Email = {{ user_email }} OR
10           SE__r.Email = {{ user_email }} OR
11           Implementation_Lead__r.Email = {{ user_email }} OR
12           Renewal_Manager__r.Email = {{ user_email }}
13         )
14     args:
15       user_email: meta_info.user.email_addr

Limitations & Pagination

General guidance

Prefer one call: Retrieve all necessary data in a single query when possible, then filter or manipulate results after.
Guardrails: Apply selective filters (date ranges, ownership, classification) and return only required fields to keep payloads lightweight.
No while support: Compound actions do not support looping until a condition is met (e.g., “keep fetching until next_page token is null”).

“For Each” in a compound action

When you need to make multiple calls (e.g., enriching a list of users or account IDs with detailed lookups), use the for construct inside a compound action. This lets you iterate deterministically over a known list and collect results.

1 for:
2   each: acct_id                 # current item variable for each iteration
3   index: acct_index             # optional index, useful for ordering
4   in: data.account_ids          # reference to an array produced by a prior step
5   output_key: accounts_detailed # aggregated results will collect here
6   steps:
7     - action:
8         name: fetch_account
9         output_key: accounts
10         args:
11           query:
12             RENDER():
13               template: >
14                 SELECT Id, Name, Owner.Name, ARR__c, Renewal_Date__c
15                 FROM Account
16                 WHERE Id = {{ account_id }}
17               args:
18                 account_id: acct_id
19     - return:
20         output_mapper:
21           account_list: data.accounts

Static arrays for known limits

If your dataset is static with a known upper bound, you can create a fixed array for pagination. For example, if the dataset is always 1,000 records and the API limit is 200 per call, you can predefine a 5-element array:

1 for:
2   each: page_marker
3   index: page_index              # use index to calculate offset
4   in: ["dummy","dummy","dummy","dummy","dummy"]   # static array of 5 items
5   output_key: paged_results
6   steps:
7     - action:
8         name: fetch_accounts
9         output_key: accounts
10         args:
11           query:
12             RENDER():
13               template: >
14                 SELECT Id, Name
15                 FROM Account
16                 ORDER BY Id
17                 LIMIT {{ limit }}
18                 OFFSET {{ offset }}
19               args:
20                 limit: 200
21                 offset: page_index * 200   # 0, 200, 400, 600, 800
22     - return:
23         output_mapper:
24           account_list: data.accounts

This approach works when:

The dataset size is stable and predictable.
You can calculate offsets deterministically.
You don’t need dynamic “until empty” behavior.

Property Matching (Fuzzy, Aliases, Acronyms)

What it is

Property matching is about converting what a user says in natural language—like “T Mobile”, “T-mobile”, or “SNow”—into the canonical identifiers that an external system expects. This is a constant challenge because users often rely on nicknames, acronyms, or misspellings that don’t match the exact stored values. Without reliable matching, queries risk returning empty sets or irrelevant data.

How to Approach It

Approach 1: At Query time (in input arguments)

Substring matching

If fuzzy matching is unavailable on the system side, you can issue substring targeted queries (e.g., LIKE '%SNow%', LIKE '%T-Mobile%') . This can work for substring matches but will often fail on acronyms or abbreviations like “SNow” for Service Now. This works best for commonly typed fields that users will likely always provide proper values.

Approach 2: At response time in the payload

Simple / Small Datasets (< ~7k tokens)

For smaller sets of candidate values (e.g., names, project codes, accounts), you can pull the full list from the system and let the LLM evaluate directly. This “enumerate & select” method works well within token limits, as the reasoning engine can compare user input against all possible options to identify the best match.

Larger or Complex Datasets (> ~7k tokens)

When the candidate list is too large, rely on Structured Data Analysis (SDA) to chunk, rank, and filter values before final selection. SDA can handle thousands of rows, applying heuristics and embeddings to propose likely matches, then confirming with the model.

Guardrails & Best Practices

Prefer fuzzy matching at the API level whenever possible. This ensures the dataset is narrowed before it ever reaches the model, which saves tokens and improves precision.
Always return top-k candidates when ambiguity exists, and let the reasoning layer, user, or generative action resolve conflicts.
Never overfetch: only bring back the candidate values truly needed to disambiguate, rather than entire records with unnecessary fields.

Response Formatting

The goal is to deliver just enough, well-typed data for the Reasoning Engine to reason, summarize, or write SDA code. In addition to returning trimmed fields, you can append an extra key such as display_instructions_for_model to explicitly guide the LLM on how to use the data. This can include instructions for analysis, how verbose or concise to be, or even what text formatting to apply.

Principles

Minimize payloads. Use Moveworks data mapper + DSL to project only the fields needed for the user’s ask (and for downstream summarization). Drop raw IDs, internal keys, verbose metadata, timestamps, and blobs unless they’re essential to the answer.
Use well-typed keys. Field names should be descriptive, consistent, and self-explanatory. This gives the reasoning engine clear signals about how to interpret the data—whether for summarization, comparison, or code generation. For example, account_name, nps_score, and close_date are immediately meaningful, whereas generic keys like field1 or opaque IDs are not.

Stabilize shape. Keep a consistent, compact schema across responses so follow-up turns can be chained reliably. Flatten nested structures as much as possible while maintaining logical groupings (e.g., return contract_start_date and contract_end_date as separate top-level fields rather than embedding a nested contract object).

ie: Overnested

1 {
2   "meta": {
3     "status": {
4       "ok": true,
5       "request": {
6         "id": "req_01J9ZABC12345",
7         "operation": "create_customer_with_orders"
8       }
9     }
10   },
11   "data": {
12     "customer": {
13       "profile": {
14         "details": {
15           "id": "cust_5001",
16           "name": { "first": "Alice", "last": "Johnson" },
17           "contact": {
18             "email": { "primary": "alice.j@example.com" },
19             "phone": { "mobile": "+1-202-555-7890" }
20           }
21         }
22       },
23       "orders": {
24         "list": [
25           {
26             "order": {
27               "info": {
28                 "id": "ord_1001",
29                 "product": { "name": "Laptop Pro 15" },
30                 "quantity": { "value": 1 },
31                 "price": { "amount": 1499.99 },
32                 "date": { "placed": "2025-09-01" }
33               }
34             }
35           },
36           {
37             "order": {
38               "info": {
39                 "id": "ord_1002",
40                 "product": { "name": "Wireless Mouse" },
41                 "quantity": { "value": 2 },
42                 "price": { "amount": 39.99 },
43                 "date": { "placed": "2025-09-02" }
44               }
45             }
46           }
47         ]
48       }
49     }
50   },
51   "metadata": {
52     "timestamps": { "created_at": "2025-09-25T14:03:12Z" },
53     "warnings": [],
54     "errors": []
55   }
56 }

ex: Flattened and structured well

1 {
2   "ok": true,
3   "request_id": "req_01J9ZABC12345",
4   "operation": "create_customer_with_orders",
5   "created_at": "2025-09-25T14:03:12Z",
6   "customer": {
7     "id": "cust_5001",
8     "name": "Alice Johnson",
9     "email": "alice.j@example.com",
10     "phone": "+1-202-555-7890"
11   },
12   "orders": [
13     {
14       "id": "ord_1001",
15       "product_name": "Laptop Pro 15",
16       "quantity": 1,
17       "price": 1499.99,
18       "order_date": "2025-09-01"
19     },
20     {
21       "id": "ord_1002",
22       "product_name": "Wireless Mouse",
23       "quantity": 2,
24       "price": 39.99,
25       "order_date": "2025-09-02"
26     }
27   ],
28   "warnings": [],
29   "errors": []
30 }

Summarize at the edge. Where possible, use DSL to perform deterministic calculations (counts, aggregations, percentages, thresholds) before returning results. This ensures the model sees concise, ready-to-use summaries instead of raw tables or unstructured data.
Append instructions for display. Add a dedicated key like display_instructions_for_model to explicitly tell the LLM how to handle the data. For example: “Summarize accounts grouped by region with concise bullet points” or “Render results in a table with columns for account, NPS score, and trend.” This reduces ambiguity and ensures outputs follow business or stakeholder preferences.

From this example dataset, a good output mapper would:

Drop unnecessary system fields (Id, LastModifiedDate, SystemModstamp, nested Owner.Id).
Flatten nested objects like Owner and Contract__c.
Convert fields into well-typed keys such as account_name, region, nps_score, delta_qoq, owner_name, contract_start_date, etc.
Optionally append display_instructions_for_model to guide the LLM on formatting (e.g., show results in bullet points with account, region, NPS score, and trend delta).

Example Action output data:

1 [
2   {
3     "Id": "001xx000003DyzQAA0",
4     "Name": "T-Mobile US, Inc.",
5     "Region__c": "North America",
6     "ARR__c": 1450000,
7     "NPS_Score__c": 72,
8     "NPS_Bucket__c": "Promoter",
9     "Delta_QoQ__c": 0.12,
10     "Owner": {
11       "Id": "005xx000001AbcFAAS",
12       "Name": "Jane Doe",
13       "Email": "jane.doe@moveworks.ai"
14     },
15     "Contract__c": {
16       "StartDate": "2024-01-15",
17       "EndDate": "2025-01-14",
18       "TermMonths": 12
19     },
20     "LastModifiedDate": "2025-09-15T12:34:56.000Z",
21     "SystemModstamp": "2025-09-15T12:34:56.000Z"
22   },
23   {
24     "Id": "001xx000004AbcFAA0",
25     "Name": "Acme Corp",
26     "Region__c": "North America",
27     "ARR__c": 980000,
28     "NPS_Score__c": 45,
29     "NPS_Bucket__c": "Neutral",
30     "Delta_QoQ__c": -0.03,
31     "Owner": {
32       "Id": "005xx000002XyzFBB0",
33       "Name": "John Smith",
34       "Email": "john.smith@moveworks.ai"
35     },
36     "Contract__c": {
37       "StartDate": "2023-11-01",
38       "EndDate": "2024-10-31",
39       "TermMonths": 12
40     },
41     "LastModifiedDate": "2025-09-10T08:15:00.000Z",
42     "SystemModstamp": "2025-09-10T08:15:00.000Z"
43   }
44 ]

Example output mapping:

1     total: response.$LENGTH()
2     accounts:
3       MAP():
4         items: response
5         converter:
6           account_name: item.Name
7           region: item.Region__c
8           nps_score: item.NPS_Score__c
9           nps_bucket: item.NPS_Bucket__c     
10           delta_qoq: item.Delta_QoQ__c
11           account_owner_email: item.Owner.Email
12           contract_start_date: item.Contract__c.StartDate
13           contract_end_date: item.Contract__c.EndDate
14     display_instructions_for_model: >
15       "Summarize the top 20 accounts in bullet points .
16       Include account_name, region, nps_score, and delta_qoq.
17       Be concise and avoid unnecessary commentary."

Moveworks Reasoning Engine (plugin responses)

Token Constraints

The reasoning engine operates under a 7,000 token limit. This means that the combined output of any action call plus any instructions or metadata must fit within this boundary. If responses exceed that size, the Reasoning Engine alone cannot handle them effectively and SDA will kick in to run analysis on the data before the Reasoning Engine summarizes the plugin output.

How the Engine Builds Responses

When generating a response, the reasoning engine takes into account:

Action title (what the action is called)
Action description (what it’s designed to do)
Structured data returned (fields and values mapped in your output)
Additional instructions (e.g., display_instructions_for_model)

It then summarizes or reformats this data in whatever way it deems most useful to the user. The cleaner and more structured your data, the better the summarization and follow-up responses will be.

Designing for Conversation

Because the reasoning engine is tuned for a conversational experience, long, exhaustive outputs are discouraged. Instead:

Aim to return a summary or top K results (usually 5–10 items) by default.
When a list is longer, the output itself will usually include a prompt to the user such as “show more” or “show condensed details”. This lets the user decide whether they want additional items or a different view.
Avoid sending extremely verbose payloads, as they are difficult for users to parse in chat and may push you past the token limit.

Handling Large Outputs

If a dataset exceeds the 7,000 token threshold, the reasoning engine will not be able to process it effectively. In these cases, the system automatically falls back to Structured Data Analysis (SDA). SDA is designed to handle very large inputs—potentially thousands of rows—and can run deeper analyses or answer follow-up questions without attempting to display everything inline.

Best Practices

Keep action responses small and focused.
Use display_instructions_for_model to steer how results should be summarized or formatted.
Default to summaries and top K results rather than full datasets.
Design queries so the initial response is immediately useful, with the option to drill down further.
Expect the reasoning engine to prompt users automatically with “show more” or “show condensed details” when longer lists are available.
Let SDA handle cases where the dataset is too large for the reasoning engine to handle.