Connect to other AI Agents & Applications
This cookbook helps you connect your Moveworks Assistant to external AI agents, LLMs, and AI-powered applications. It covers choosing the right integration approach, managing conversation context across turns, and handling asynchronous APIs that require polling. Before diving in, it’s important to understand the trade offs of different integration approaches so you can choose the right architecture for your use case.
Choosing the Right Integration Approach
Not all integration patterns are created equal. When connecting to external systems, the approach you choose has a significant impact on reliability, controllability, and user experience. Below is a stack-ranked guide from most to least recommended. Before building an agent to agent plugin make sure you understand these tradeoffs.
Approach 1: API Integration via Moveworks Primitives (Recommended)
Use Moveworks plugins with HTTP Actions to call external APIs directly.
This is the most robust approach. By wrapping an external API call inside a Moveworks plugin, you retain full control over:
- Tool selection: The Agentic Reasoning Engine uses your plugin’s name, description, and utterances to decide when to invoke it. You can fine-tune triggering behavior with specific utterances, trigger keywords, and clear descriptions.
- Input control via slots: You define exactly what information is collected from the user and how it’s validated before being sent to the external system.
- Response handling: Your AI Assistant has all the context to always give a great response.
- Security & governance: Data flows through your configured connectors with enterprise grade authentication and audit trails.
If you’re trying to connect to a Foundation Model like GPT, try our built-in plugin: QuickGPT.
Approach 2: MCP (Model Context Protocol)
MCP allows external tools and data sources to be exposed to an AI agent through a standardized protocol.
While functional, MCP introduces trade-offs compared to native API integration:
- Loss of tool selection control: MCP exposes a wide surface area of tools to the reasoning engine. Unlike a focused Moveworks plugin with curated utterances and descriptions, MCP tools arrive as a broad catalog. The reasoning engine must choose among many options without the fine-tuned triggering signals that plugins provide.
- Reduced controllability: You have less ability to shape how inputs are collected (no slot validation, inference policies, or custom data types) and less control over how outputs are presented.
- Wider surface area: More available tools means more ambiguity for the reasoning engine when deciding which tool to invoke for a given request.
MCP can be appropriate when a vendor only exposes their capabilities through MCP and does not offer a REST API.
Approach 3: Agent-to-Agent Communication (Use with Caution)
Direct agent to agent communication, where your Moveworks agent delegates work to another autonomous agent is the least recommended approach.
The core issue is that our reasoning engine has no context of the other agent’s working memory. When Moveworks’ reasoning engine delegates to another agent, it has no visibility into that agent’s internal capabilities, tool inventory, or decision making logic. It’s sending a request into a black box and hoping for the best.
Think of it this way: imagine you need to ask a colleague for help, but you have no idea what they’re actually capable of. They have dozens of specialized skills and tools, but none of those are explained to you up front. You just send a message and hope they figure out which of their many capabilities to apply. That’s the experience from the reasoning engine’s perspective — it can’t make an informed decision about what to delegate because it doesn’t understand the other agent’s strengths, limitations, or how it will process the request.
This makes selecting that agent every time for the right plugin extremely difficult.
If you must connect agent to agent, we outline the recommended approaches below
Summary
Architecture Decisions
Context Engineering
There are three ways that you can manage context, each with their own pros & cons.
Example 1: Reasoning Engine Context via Slots (Simplest)
This approach lets the Agentic Reasoning Engine manage conversation context for you. The reasoning engine tracks the conversation history and decides what context to pass to your external API on each turn. This is the fastest way to get started, no thread tracking or database needed.
Your plugin will look something like this:
For the easiest implementation, we recommend the following high-level approach.
Create a Conversation Process with an action activity
Create a Conversation Process with an action activity for your agent’s API. This is the core of your plugin — it defines the flow that calls the external API and returns the response. Start by creating the process and adding an action activity that points to the HTTP Action you’ll configure in the next step.

Set up an HTTP action
Set up an HTTP action to call the external agent’s API. Here’s an example using the Anthropic API:
Map the slots to the action activity
Map the slots to the action activity in your conversation process. Pass the slots into the API call using DSL:
Example 2: API-Managed Thread with an Optional Thread ID Slot (recommended if available)
Some external APIs keep track of the conversation thread for you — you send a thread_id with each request and the API maintains the full message history on its side. Examples include OpenAI’s Assistants API, where the API stores all messages in a thread and you simply reference the thread ID on subsequent calls.
The key design pattern is to make the thread_id slot optional so that it sends null on the first turn (when no thread exists yet) and carries the returned thread ID forward on subsequent turns.
Slot Configuration
Create a slot for the thread ID with the following configuration:
The inference policy set to “Always Infer” means:
- First turn: No thread ID exists in context, so the reasoning engine infers
null. - Subsequent turns: The thread ID was returned in the previous response and exists in context, so the reasoning engine infers it automatically.
Conversation Process Implementation
This can be handled directly in a Conversation Process — no compound action or switch needed. The thread_id slot is truthy/falsy, so you can use DSL to conditionally pass it to the external API.
HTTP Action
Set up a single HTTP Action that accepts both the thread_id and the user message. The API should always return a thread_id in the response so it can be carried forward.
If the external API doesn’t automatically generate a new thread when thread_id is null, add an action step or logic in your compound action to create a new thread first, then pass the resulting ID to the main API call.
Conversation Process Setup
In your Conversation Process, wire the slots directly to the action activity. The input mapping uses DSL — since thread_id is falsy on the first turn, you can pass it as-is:
The action activity returns the response to the user. The thread_id from the API response is now part of the conversation context, so on the next turn the reasoning engine will automatically infer it into the slot.
Important: Make sure the thread_id is visible in the response output shown to the conversation. This is what allows the reasoning engine to pick it up as context on the next turn and infer it into the slot automatically. If the API returns it but it’s not surfaced in the process output, the reasoning engine won’t have it available to infer.
Example 3: Build Your Own Thread Store (ServiceNow Table or Custom Database)
Many external agents and LLMs don’t offer an API that keeps track of the thread for you, which means every API call is stateless — the external system has no memory of prior turns. You can solve this by creating your own thread tracking mechanism using a ServiceNow table (or any database accessible via API).
Create the ServiceNow Table
In your ServiceNow instance, navigate to System Definition > Tables and create a new custom table. A recommended setup:
Table name example: u_agent_thread_log | Label: Agent Thread Log
Set the u_conversation_history column to a max length of 65000 (the ServiceNow string max) or use a multi-line text field. For very long conversations, consider a strategy to trim older messages and keep only the most recent N turns.
Add a Business Rule for Auto-generating Thread IDs
Create a Before Insert business rule on u_agent_thread_log to automatically generate a unique u_thread_id when a new record is created. This way, your compound action only needs to POST the u_user_id and the first message — the thread ID is generated server-side.
Create the ServiceNow REST APIs
You need three operations, which you can accomplish via the standard Table API or a Scripted REST API:
Option A: Use the standard Table API
Option B: Create a Scripted REST API for cleaner endpoints and built-in logic (e.g., auto-trimming old messages, validating JSON structure). This is recommended if you want to encapsulate the history-append logic server-side rather than in your compound action.
Build HTTP Actions in Agent Studio
Create three HTTP Actions in Agent Studio, one for each operation:
- Create_Thread_Action —
POSTto create a new record. Send the user’s first message as the initialu_conversation_historyvalue (e.g.,[{"role": "user", "content": "..."}]). Returns thesys_idandu_thread_id. - Get_Thread_Action —
GETto retrieve the conversation history byu_thread_id. Returns theu_conversation_historyJSON string. - Update_Thread_Action —
PATCHto update the record with the latest user message and assistant response appended to the history.
Wire It Together in Your Compound Action
How the flow works:
- On the first turn, the compound action calls Create_Thread_Action, then calls the external LLM API with the user’s message, then calls Update_Thread_Action to store both the user message and the LLM response. The
u_thread_idis returned to the reasoning engine. - On subsequent turns, the reasoning engine passes the
u_thread_id(collected as a slot with inference policy set to auto-infer). The compound action calls Get_Thread_Action to retrieve history, constructs the full message array, calls the external LLM API, then calls Update_Thread_Action to append the new exchange. - Collect the
thread_idas a slot with an inference policy set to automatically infer from context — the reasoning engine will carry it forward across turns without asking the user.
This approach gives you full context continuity with any stateless API, and the conversation history lives in a system you control.
Handling Asynchronous APIs
Some external agents and APIs don’t return results immediately. Instead, they accept a request, return a job or task ID, and require you to poll for the result. You can handle this pattern in Agent Studio using a compound action with chained action steps and delay_config to space out polling attempts.
The pattern: Submit, wait, and poll with stacking intervals
Rather than polling aggressively (which wastes API calls and may hit rate limits) or waiting too long (which degrades user experience), use a stacking wait strategy that starts short and gets progressively longer:
- Submit the request — Call the external API to kick off the async job. Capture the
job_idortask_idfrom the response. - Wait 15 seconds, then poll — Use
delay_configon the next action step to pause, then call the status endpoint. - If not ready, wait 1 minute, then poll again — Use a
switchto check the status. If still processing, hit a second polling step with a longer delay. - If still not ready, wait 5 minutes, then poll a final time — A last attempt with a longer window for slow-running jobs.
- Return the result — If the job completes at any polling step, return the result.
Adjust the polling intervals based on the expected response time of your external system. For APIs that typically respond in under a minute, you might use 5s -> 15s -> 1m. For long-running jobs, consider 1m -> 5m -> 15m. Set the last poll for the upper bound of the system you are connecting to.
Token Consumption and Cost
LLM providers charge based on the number of tokens processed (both input prompt and output generation). Long conversations or large documents can become expensive quickly.
Best Practices:
- Set Limits: Always use the
max_tokensparameter in your API calls to cap the length of the response and prevent unexpectedly large (and expensive) outputs. - Be Concise: Encourage users and design system prompts to be as concise as possible.
- Monitor Usage: Regularly check your API usage and cost dashboards on the LLM provider’s platform.
- Choose the Right Model: For simpler tasks, consider using smaller, faster, and cheaper models instead of the most powerful (and most expensive) ones.
Data Security & Privacy
Standard public LLM APIs may use your prompt data to train their models. Sending Personally Identifiable Information (PII) or sensitive company data is a significant risk.
Best Practices:
- Consult Your Security Team: Always review the data privacy and terms of service for any LLM provider.
- Prefer Enterprise Offerings: Whenever possible, use enterprise-grade services like Azure OpenAI or an OpenAI Enterprise agreement, which typically guarantee that your data will not be used for model training.
- Anonymize Data: If you must send potentially sensitive information, build steps in your workflow to find and replace sensitive data with placeholders before sending it to the LLM. You can use our LLM Actions to do this.
- Educate users: Inform users about what data is being sent to a third-party service and advise them against submitting sensitive information. You can do this through a Content Activity & enabling the Activity Confirmation Policy on your API call.
Plugin Selection
Triggering reliability can vary depending on the use case and breadth of positive utterance subject matter. Below are some options to optimize your LLM plugins:
- Define Diverse But Specific Utterances: In your plugin’s trigger configuration, provide a wide range of example phrases. For a summarization plugin, this could include:
- “summarize this document”
- “give me the tl;dr”
- “what are the key points of this?”
- “can you create an executive summary”
- Define a trigger keyword: Assign a deterministic triggering phrase to your plugin so that users can trigger the plugin on command — this will help ensure the agent is always called.
- Use a System Prompt: Instead of relying on the user to frame their entire request, use the
systemmessage (or an equivalent field) in your API request body. This pre-prompts the LLM with its role or instructions (e.g., “You are an expert at rewriting text to be more professional”). The user then only needs to provide the core input, making the interaction much smoother.

