This cookbook helps you connect your Moveworks Assistant to external AI agents, LLMs, and AI-powered applications. It covers choosing the right integration approach, managing conversation context across turns, and handling asynchronous APIs that require polling. Before diving in, it’s important to understand the trade offs of different integration approaches so you can choose the right architecture for your use case.

Choosing the Right Integration Approach

Not all integration patterns are created equal. When connecting to external systems, the approach you choose has a significant impact on reliability, controllability, and user experience. Below is a stack-ranked guide from most to least recommended. Before building an agent to agent plugin make sure you understand these tradeoffs.

Approach 1: API Integration via Moveworks Primitives (Recommended)

Use Moveworks plugins with HTTP Actions to call external APIs directly.

This is the most robust approach. By wrapping an external API call inside a Moveworks plugin, you retain full control over:

Tool selection: The Agentic Reasoning Engine uses your plugin’s name, description, and utterances to decide when to invoke it. You can fine-tune triggering behavior with specific utterances, trigger keywords, and clear descriptions.
Input control via slots: You define exactly what information is collected from the user and how it’s validated before being sent to the external system.
Response handling: Your AI Assistant has all the context to always give a great response.
Security & governance: Data flows through your configured connectors with enterprise grade authentication and audit trails.

If you’re trying to connect to a Foundation Model like GPT, try our built-in plugin: QuickGPT.

Approach 2: MCP Workspace

Use MCP Workspace to connect and govern a trusted MCP server. It is a good fit for bounded lookups and standard actions that the server already exposes. Use an Agent Studio plugin when you need organization-specific workflow logic or tighter control over inputs and outputs.

Approach 3: Agent-to-Agent Communication (Use with Caution)

Direct agent to agent communication, where your Moveworks agent delegates work to another autonomous agent is the least recommended approach.

The core issue is that our reasoning engine has no context of the other agent’s working memory. When Moveworks’ reasoning engine delegates to another agent, it has no visibility into that agent’s internal capabilities, tool inventory, or decision making logic. It’s sending a request into a black box and hoping for the best.

Think of it this way: imagine you need to ask a colleague for help, but you have no idea what they’re actually capable of. They have dozens of specialized skills and tools, but none of those are explained to you up front. You just send a message and hope they figure out which of their many capabilities to apply. That’s the experience from the reasoning engine’s perspective — it can’t make an informed decision about what to delegate because it doesn’t understand the other agent’s strengths, limitations, or how it will process the request.

This makes selecting that agent every time for the right plugin extremely difficult.

If you must connect agent to agent, we outline the recommended approaches below

Summary

Approach	Tool Selection	Context Control	Controllability	Recommended For
API via Moveworks Primitives	Full control via plugin triggers & utterances	Full: slots, validation, inference policies	High: end-to-end	Production use cases
MCP Workspace	Server and tool descriptions guide selection	Server-defined	Server-defined	Bounded lookups and standard actions from trusted MCP servers
Agent-to-Agent	None: remote agent decides	None: no shared working memory	Low: opaque execution	Last resort only

Architecture Decisions

Context Engineering

There are three ways that you can manage context, each with their own pros & cons.

Strategy	Description	Pros	Cons
Slots	Let the Agentic Reasoning Engine decide what conversation history to provide your model.	Can intelligently combine context with your org-specific knowledge (e.g. via the Search plugin).	Context is lossy. Reasoning engine won’t provide ALL the detail for your external API to use.
API-Managed Threads	If the external API keeps track of the thread for you (e.g., returns a `thread_id` that you pass back on subsequent calls), generate a `thread_id` and collect it as a slot.	All of your conversation context will be preserved between turns — the external API manages the full thread history on its side.	Limited availability across AI vendors. More complex setup.
Custom Database	Store user & system messages in a custom database	Full control over the context engineering approach.	Increases the # of systems touching your personal data. Databases will need to be secured.

Example 1: Reasoning Engine Context via Slots (Simplest)

This approach lets the Agentic Reasoning Engine manage conversation context for you. The reasoning engine tracks the conversation history and decides what context to pass to your external API on each turn. This is the fastest way to get started, no thread tracking or database needed.

Your plugin will look something like this:

For the easiest implementation, we recommend the following high-level approach.

Create a Conversation Process with an action activity

Create a Conversation Process with an action activity for your agent’s API. This is the core of your plugin — it defines the flow that calls the external API and returns the response. Start by creating the process and adding an action activity that points to the HTTP Action you’ll configure in the next step.

Set up an HTTP action

Set up an HTTP action to call the external agent’s API. Here’s an example using the Anthropic API:

$ curl https://api.anthropic.com/v1/messages \
>   -X POST \
>   -H 'Content-Type: application/json' \
>   -H "x-api-key: $ANTHROPIC_API_KEY" \
>   -H 'anthropic-version: 2023-06-01' \
>   -d '{
>     "model": "claude-3-5-sonnet-20241022",
>     "max_tokens": 1024,
>     "messages": [
>       { "role": "user", "content": "{{user_query}}" }
>     ]
>   }'

Create two slots

Create two slots to capture the user’s query and conversation context.

Slot Name	Data Type	Slot Description
`query`	string	The query a user has input to you
`conversation_context`	object	Capture the immediate conversational context by recording the last user message and the last bot response. This object should NEVER be requested from the user; it should be populated automatically based on the conversation history to maintain relevance and continuity for subsequent turns. Properties: `last_user_message` (string) — the literal message of the last relevant message the user sent. Make it exact, do not summarize. `last_bot_message` (string) — the literal message of the last relevant message you sent. Focus on the content replied with, not progress updates. Make it exact, do not summarize.

Map the slots to the action activity

Map the slots to the action activity in your conversation process. Pass the slots into the API call using DSL:

1 user_query: |
2     $CONCAT([
3         "'UserInput:'",data.query,
4         "'PreviousBotMessage:'",$TEXT(data.conversation_context)
5     ])

Add a content activity

Add a content activity to help the AI assistant select your plugin on subsequent turns.

Choose an invocation phrase

Choose an invocation phrase for your LLM. Here we are using “Hey Claude”:

Example 2: API-Managed Thread with an Optional Thread ID Slot (recommended if available)

Some external APIs keep track of the conversation thread for you — you send a thread_id with each request and the API maintains the full message history on its side. Examples include OpenAI’s Assistants API, where the API stores all messages in a thread and you simply reference the thread ID on subsequent calls.

The key design pattern is to make the thread_id slot optional so that it sends null on the first turn (when no thread exists yet) and carries the returned thread ID forward on subsequent turns.

Slot Configuration

Create a slot for the thread ID with the following configuration:

Slot Name	Data Type	Inference Policy	Slot Description
`thread_id`	string	Always Infer	NEVER ask the user for this value. This is the thread_id returned by the external API from a previous turn in this conversation. If no thread_id exists in the conversation context, set this to null. This value is used to maintain conversation continuity with the external system.

The inference policy set to “Always Infer” means:

First turn: No thread ID exists in context, so the reasoning engine infers null.
Subsequent turns: The thread ID was returned in the previous response and exists in context, so the reasoning engine infers it automatically.

Conversation Process Implementation

This can be handled directly in a Conversation Process — no compound action or switch needed. The thread_id slot is truthy/falsy, so you can use DSL to conditionally pass it to the external API.

HTTP Action

Set up a single HTTP Action that accepts both the thread_id and the user message. The API should always return a thread_id in the response so it can be carried forward.

If the external API doesn’t automatically generate a new thread when thread_id is null, add an action step or logic in your compound action to create a new thread first, then pass the resulting ID to the main API call.

$ curl https://api.example.com/v1/chat \
>   -X POST \
>   -H 'Content-Type: application/json' \
>   -H "Authorization: Bearer $API_KEY" \
>   -d '{
>     "thread_id": "{{{thread_id}}}",
>     "message": "{{{user_query}}}"
>   }'

Conversation Process Setup

In your Conversation Process, wire the slots directly to the action activity. The input mapping uses DSL — since thread_id is falsy on the first turn, you can pass it as-is:

1 user_query: data.query
2 thread_id: data.thread_id

The action activity returns the response to the user. The thread_id from the API response is now part of the conversation context, so on the next turn the reasoning engine will automatically infer it into the slot.

Important: Make sure the thread_id is visible in the response output shown to the conversation. This is what allows the reasoning engine to pick it up as context on the next turn and infer it into the slot automatically. If the API returns it but it’s not surfaced in the process output, the reasoning engine won’t have it available to infer.

Example 3: Build Your Own Thread Store (ServiceNow Table or Custom Database)

Many external agents and LLMs don’t offer an API that keeps track of the thread for you, which means every API call is stateless — the external system has no memory of prior turns. You can solve this by creating your own thread tracking mechanism using a ServiceNow table (or any database accessible via API).

Create the ServiceNow Table

In your ServiceNow instance, navigate to System Definition > Tables and create a new custom table. A recommended setup:

Column Name	Type	Max Length	Description
`u_thread_id`	String	64	Unique identifier for the conversation thread. Auto-populated via business rule (see below).
`u_user_id`	String	128	The email or sys_id of the user who initiated the conversation. Used for lookups on subsequent turns.
`u_external_session_id`	String	256	Optional. If the external API returns its own session or conversation ID, store it here for correlation.
`u_conversation_history`	String	65000	A JSON string storing the array of message pairs (user + assistant). Set the max length high to accommodate multi-turn conversations.
`u_created_at`	Date/Time	—	Timestamp of when the thread was created. Useful for cleanup and TTL policies.
`u_updated_at`	Date/Time	—	Timestamp of the last update. Useful for identifying stale threads.

Table name example: u_agent_thread_log | Label: Agent Thread Log

Set the u_conversation_history column to a max length of 65000 (the ServiceNow string max) or use a multi-line text field. For very long conversations, consider a strategy to trim older messages and keep only the most recent N turns.

Add a Business Rule for Auto-generating Thread IDs

Create a Before Insert business rule on u_agent_thread_log to automatically generate a unique u_thread_id when a new record is created. This way, your compound action only needs to POST the u_user_id and the first message — the thread ID is generated server-side.

1 // Business Rule: Generate Thread ID
2 // Table: u_agent_thread_log
3 // When: Before Insert
4 (function executeRule(current, previous) {
5     current.u_thread_id = gs.generateGUID();
6     current.u_updated_at = new GlideDateTime();
7 })(current, previous);

Create the ServiceNow REST APIs

You need three operations, which you can accomplish via the standard Table API or a Scripted REST API:

Option A: Use the standard Table API

Operation	Method	Endpoint	Purpose
Create thread	POST	`/api/now/table/u_agent_thread_log`	Create a new record with `u_user_id` and initial `u_conversation_history`
Get thread	GET	`/api/now/table/u_agent_thread_log?sysparm_query=u_thread_id={thread_id}`	Retrieve conversation history for an existing thread
Update thread	PATCH	`/api/now/table/u_agent_thread_log/{sys_id}`	Append the latest message pair to `u_conversation_history`

Option B: Create a Scripted REST API for cleaner endpoints and built-in logic (e.g., auto-trimming old messages, validating JSON structure). This is recommended if you want to encapsulate the history-append logic server-side rather than in your compound action.

Build HTTP Actions in Agent Studio

Create three HTTP Actions in Agent Studio, one for each operation:

Create_Thread_Action — POST to create a new record. Send the user’s first message as the initial u_conversation_history value (e.g., [{"role": "user", "content": "..."}]). Returns the sys_id and u_thread_id.
Get_Thread_Action — GET to retrieve the conversation history by u_thread_id. Returns the u_conversation_history JSON string.
Update_Thread_Action — PATCH to update the record with the latest user message and assistant response appended to the history.

Wire It Together in Your Compound Action

How the flow works:

On the first turn, the compound action calls Create_Thread_Action, then calls the external LLM API with the user’s message, then calls Update_Thread_Action to store both the user message and the LLM response. The u_thread_id is returned to the reasoning engine.
On subsequent turns, the reasoning engine passes the u_thread_id (collected as a slot with inference policy set to auto-infer). The compound action calls Get_Thread_Action to retrieve history, constructs the full message array, calls the external LLM API, then calls Update_Thread_Action to append the new exchange.
Collect the thread_id as a slot with an inference policy set to automatically infer from context — the reasoning engine will carry it forward across turns without asking the user.

Add Housekeeping

Consider adding a Scheduled Job in ServiceNow to clean up stale threads (e.g., delete records where u_updated_at is older than 24 hours). This prevents the table from growing indefinitely and avoids surfacing outdated context.

This approach gives you full context continuity with any stateless API, and the conversation history lives in a system you control.

Handling Asynchronous APIs

Some external agents and APIs don’t return results immediately. Instead, they accept a request, return a job or task ID, and require you to poll for the result. You can handle this pattern in Agent Studio using a compound action with chained action steps and delay_config to space out polling attempts.

The pattern: Submit, wait, and poll with stacking intervals

Rather than polling aggressively (which wastes API calls and may hit rate limits) or waiting too long (which degrades user experience), use a stacking wait strategy that starts short and gets progressively longer:

Submit the request — Call the external API to kick off the async job. Capture the job_id or task_id from the response.
Wait 15 seconds, then poll — Use delay_config on the next action step to pause, then call the status endpoint.
If not ready, wait 1 minute, then poll again — Use a switch to check the status. If still processing, hit a second polling step with a longer delay.
If still not ready, wait 5 minutes, then poll a final time — A last attempt with a longer window for slow-running jobs.
Return the result — If the job completes at any polling step, return the result.

1 # Example: Async API polling with stacking wait times
2 steps:
3   # Step 1: Submit the async request
4   - action:
5       action_name: Submit_Async_Job_Action
6       output_key: job_submission
7       input_args:
8         prompt: data.user_query
9 
10   # Step 2: Wait 15 seconds, then poll
11   - action:
12       action_name: Poll_Job_Status_Action
13       output_key: poll_1
14       delay_config:
15         seconds: "15"
16       input_args:
17         job_id: data.job_submission.job_id
18 
19   # Step 3: Check result - if done, return; otherwise keep polling
20   - switch:
21       cases:
22         - condition: data.poll_1.status == "completed"
23           steps:
24             - return:
25                 output_mapper:
26                   result: data.poll_1.result
27 
28         - condition: data.poll_1.status != "completed"
29           steps:
30             # Step 4: Wait 1 minute, then poll again
31             - action:
32                 action_name: Poll_Job_Status_Action
33                 output_key: poll_2
34                 delay_config:
35                   minutes: "1"
36                 input_args:
37                   job_id: data.job_submission.job_id
38 
39             - switch:
40                 cases:
41                   - condition: data.poll_2.status == "completed"
42                     steps:
43                       - return:
44                           output_mapper:
45                             result: data.poll_2.result
46 
47                   - condition: data.poll_2.status != "completed"
48                     steps:
49                       # Step 5: Wait 5 minutes, final poll
50                       - action:
51                           action_name: Poll_Job_Status_Action
52                           output_key: poll_3
53                           delay_config:
54                             minutes: "5"
55                           input_args:
56                             job_id: data.job_submission.job_id
57 
58                       - switch:
59                           cases:
60                             - condition: data.poll_3.status == "completed"
61                               steps:
62                                 - return:
63                                     output_mapper:
64                                       result: data.poll_3.result
65                             - condition: data.poll_3.status != "completed"
66                               steps:
67                                 - return:
68                                     output_mapper:
69                                       result: '''The request is still processing. There may be an error'''

Adjust the polling intervals based on the expected response time of your external system. For APIs that typically respond in under a minute, you might use 5s -> 15s -> 1m. For long-running jobs, consider 1m -> 5m -> 15m. Set the last poll for the upper bound of the system you are connecting to.

Token Consumption and Cost

LLM providers charge based on the number of tokens processed (both input prompt and output generation). Long conversations or large documents can become expensive quickly.

Best Practices:

Set Limits: Always use the max_tokens parameter in your API calls to cap the length of the response and prevent unexpectedly large (and expensive) outputs.
Be Concise: Encourage users and design system prompts to be as concise as possible.
Monitor Usage: Regularly check your API usage and cost dashboards on the LLM provider’s platform.
Choose the Right Model: For simpler tasks, consider using smaller, faster, and cheaper models instead of the most powerful (and most expensive) ones.

Data Security & Privacy

Standard public LLM APIs may use your prompt data to train their models. Sending Personally Identifiable Information (PII) or sensitive company data is a significant risk.