***

title: Connect to other AI Agents & Applications
position: 3
deprecated: false
hidden: false
---------------------

For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://help.moveworks.com/agent-studio/cookbooks/llms.txt. For full documentation content, see https://help.moveworks.com/agent-studio/cookbooks/llms-full.txt.

This cookbook helps you connect your Moveworks Assistant to external AI agents, LLMs, and AI-powered applications. It covers choosing the right integration approach, managing conversation context across turns, and handling asynchronous APIs that require polling. Before diving in, it's important to understand the trade offs of different integration approaches so you can choose the right architecture for your use case.

# Choosing the Right Integration Approach

Not all integration patterns are created equal. When connecting to external systems, the approach you choose has a significant impact on **reliability**, **controllability**, and **user experience**. Below is a stack-ranked guide from most to least recommended. Before building an agent to agent plugin make sure you understand these tradeoffs.

## Approach 1: API Integration via Moveworks Primitives (Recommended)

**Use Moveworks [plugins](/agent-studio/core-concepts/assistants-agents-plugins#/) with [HTTP Actions](/agent-studio/actions/http-actions) to call external APIs directly.**

This is the most robust approach. By wrapping an external API call inside a Moveworks plugin, you retain full control over:

* **Tool selection**: The [Agentic Reasoning Engine](/agent-studio/agentic-ai/the-agentic-reasoning-engine) uses your plugin's name, description, and utterances to decide when to invoke it. You can fine-tune triggering behavior with specific utterances, trigger keywords, and clear descriptions.
* **Input control via [slots](/agent-studio/conversation-process/slots)**: You define exactly what information is collected from the user and how it's validated before being sent to the external system.
* **Response handling**: Your AI Assistant has all the context to always give a great response.
* **Security & governance**: Data flows through your configured [connectors](/agent-studio/connectors) with enterprise grade authentication and audit trails.

<Callout intent="info">
  If you're trying to connect to a Foundation Model like GPT, try our built-in plugin: [QuickGPT](https://www.moveworks.com/us/en/platform/quick-gpt).
</Callout>

## Approach 2: MCP (Model Context Protocol)

MCP allows external tools and data sources to be exposed to an AI agent through a standardized protocol.

While functional, MCP introduces trade-offs compared to native API integration:

* **Loss of tool selection control**: MCP exposes a wide surface area of tools to the reasoning engine. Unlike a focused Moveworks plugin with curated utterances and descriptions, MCP tools arrive as a broad catalog. The reasoning engine must choose among many options without the fine-tuned triggering signals that plugins provide.
* **Reduced controllability**: You have less ability to shape how inputs are collected (no slot validation, inference policies, or custom data types) and less control over how outputs are presented.
* **Wider surface area**: More available tools means more ambiguity for the reasoning engine when deciding which tool to invoke for a given request.

MCP can be appropriate when a vendor only exposes their capabilities through MCP and does not offer a REST API.

## Approach 3: Agent-to-Agent Communication (Use with Caution)

Direct agent to agent communication, where your Moveworks agent delegates work to another autonomous agent is the least recommended approach.

The core issue is that **our reasoning engine has no context of the other agent's working memory**. When Moveworks' reasoning engine delegates to another agent, it has no visibility into that agent's internal capabilities, tool inventory, or decision making logic. It's sending a request into a black box and hoping for the best.

<Callout intent="tip">
  Think of it this way: imagine you need to ask a colleague for help, but you have no idea what they're actually capable of. They have dozens of specialized skills and tools, but none of those are explained to you up front. You just send a message and hope they figure out which of their many capabilities to apply. That's the experience from the reasoning engine's perspective -- it can't make an informed decision about what to delegate because it doesn't understand the other agent's strengths, limitations, or how it will process the request.
</Callout>

This makes selecting that agent every time for the right plugin extremely difficult.

If you must connect agent to agent, we outline the recommended approaches below

## Summary

| Approach                         | Tool Selection                                | Context Control                             | Controllability       | Recommended For                    |
| :------------------------------- | :-------------------------------------------- | :------------------------------------------ | :-------------------- | :--------------------------------- |
| **API via Moveworks Primitives** | Full control via plugin triggers & utterances | Full: slots, validation, inference policies | High: end-to-end      | Production use cases               |
| **MCP**                          | Limited: wide tool surface area               | Partial                                     | Moderate              | Rapid prototyping, vendor only MCP |
| **Agent-to-Agent**               | None: remote agent decides                    | None: no shared working memory              | Low: opaque execution | Last resort only                   |

***

# Architecture Decisions

## Context Engineering

There are three ways that you can manage context, each with their own pros & cons.

| Strategy            | Description                                                                                                                                                                  | Pros                                                                                                                              | Cons                                                                                          |
| :------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------- |
| Slots               | Let the Agentic Reasoning Engine decide what conversation history to provide your model.                                                                                     | Can intelligently combine context with your org-specific knowledge (e.g. via the Search plugin).                                  | Context is lossy. Reasoning engine won't provide ALL the detail for your external API to use. |
| API-Managed Threads | If the external API keeps track of the thread for you (e.g., returns a `thread_id` that you pass back on subsequent calls), generate a `thread_id` and collect it as a slot. | All of your conversation context will be preserved between turns -- the external API manages the full thread history on its side. | Limited availability across AI vendors. More complex setup.                                   |
| Custom Database     | Store user & system messages in a custom database                                                                                                                            | Full control over the context engineering approach.                                                                               | Increases the # of systems touching your personal data. Databases will need to be secured.    |

### Example 1: Reasoning Engine Context via Slots (Simplest)

This approach lets the [Agentic Reasoning Engine](/agent-studio/agentic-ai/the-agentic-reasoning-engine) manage conversation context for you. The reasoning engine tracks the conversation history and decides what context to pass to your external API on each turn. This is the fastest way to get started, no thread tracking or database needed.

Your plugin will look something like this:

```mermaid
sequenceDiagram
    participant User
    participant AI_Assistant as AI Assistant
    participant Claude_Plugin as Claude Plugin
    participant Anthropic_API as Anthropic API
    participant Search_Plugin as Search Plugin

    %% First Turn
    User->>AI_Assistant: "Get competitive intel on enterprise chat platforms"
    AI_Assistant->>AI_Assistant: Select Claude_Plugin
    AI_Assistant->>Claude_Plugin: Send prompt
    Claude_Plugin->>Anthropic_API: Call Anthropic API
    Anthropic_API-->>Claude_Plugin: Return competitive summary
    Claude_Plugin-->>AI_Assistant: Return Claude's output
    AI_Assistant-->>User: Deliver competitive intel summary

    %% Follow-up Turn
    User->>AI_Assistant: "Now contrast that with our native chat capabilities"
    AI_Assistant->>Search_Plugin: Retrieve product docs, internal feature specs, etc.
    Search_Plugin-->>AI_Assistant: Return relevant internal context
    AI_Assistant->>AI_Assistant: Combine Claude's competitive intel with org context
    AI_Assistant-->>User: Share analysis
```

For the easiest implementation, we recommend the following high-level approach.

<Steps>
  ### Create a Conversation Process with an action activity

  Create a [Conversation Process](/agent-studio/conversation-process#/) with an action activity for your agent's API. This is the core of your plugin -- it defines the flow that calls the external API and returns the response. Start by creating the process and adding an action activity that points to the [HTTP Action](/agent-studio/actions/http-actions) you'll configure in the next step.

  ![](https://files.readme.io/168841f3f179eff4f2384eae02a97d5f05961bac53bc3d07c8ab1c56e0f98679-CleanShot_2025-10-05_at_08.05.282x.png)

  ### Set up an HTTP action

  Set up an [HTTP action](/agent-studio/actions/http-actions) to call the external agent's API. Here's an example using the Anthropic API:

  ```bash
  curl https://api.anthropic.com/v1/messages \
    -X POST \
    -H 'Content-Type: application/json' \
    -H "x-api-key: $ANTHROPIC_API_KEY" \
    -H 'anthropic-version: 2023-06-01' \
    -d '{
      "model": "claude-3-5-sonnet-20241022",
      "max_tokens": 1024,
      "messages": [
        { "role": "user", "content": "{{user_query}}" }
      ]
    }'
  ```

  ### Create two slots

  Create two slots to capture the user's query and conversation context.

  | Slot Name              | Data Type | Slot Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
  | :--------------------- | :-------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
  | `query`                | string    | The query a user has input to you                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
  | `conversation_context` | object    | Capture the immediate conversational context by recording the last user message and the last bot response. This object should NEVER be requested from the user; it should be populated automatically based on the conversation history to maintain relevance and continuity for subsequent turns. **Properties:** `last_user_message` (string) -- the literal message of the last relevant message the user sent. Make it exact, do not summarize. `last_bot_message` (string) -- the literal message of the last relevant message you sent. Focus on the content replied with, not progress updates. Make it exact, do not summarize. |

  ### Map the slots to the action activity

  Map the slots to the action activity in your conversation process. Pass the slots into the API call using DSL:

  ```yaml
  user_query: |
      $CONCAT([
          "'UserInput:'",data.query,
          "'PreviousBotMessage:'",$TEXT(data.conversation_context)
      ])
  ```

  ### Add a content activity

  Add a content activity to help the AI assistant select your plugin on subsequent turns.

  ![](https://files.readme.io/b466d94f156552b061ee309690d542fc287431df63e0779c80fd58acdb971e4c-CleanShot_2025-10-05_at_08.07.252x.png)

  ### Choose an invocation phrase

  Choose an invocation phrase for your LLM. Here we are using "Hey Claude":

  ![](https://files.readme.io/d9ece8680a569e9b714e36cc4165c841676988218f67c4967de8efb1d5316c18-CleanShot_2025-10-05_at_07.52.582x.png)
</Steps>

### Example 2: API-Managed Thread with an Optional Thread ID Slot (recommended if available)

Some external APIs keep track of the conversation thread for you -- you send a `thread_id` with each request and the API maintains the full message history on its side. Examples include OpenAI's Assistants API, where the API stores all messages in a thread and you simply reference the thread ID on subsequent calls.

The key design pattern is to make the `thread_id` slot **optional** so that it sends `null` on the first turn (when no thread exists yet) and carries the returned thread ID forward on subsequent turns.

#### Slot Configuration

Create a slot for the thread ID with the following configuration:

| Slot Name   | Data Type | Inference Policy | Slot Description                                                                                                                                                                                                                                                                          |
| :---------- | :-------- | :--------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `thread_id` | string    | Always Infer     | NEVER ask the user for this value. This is the thread\_id returned by the external API from a previous turn in this conversation. If no thread\_id exists in the conversation context, set this to null. This value is used to maintain conversation continuity with the external system. |

The inference policy set to **"Always Infer"** means:

* **First turn**: No thread ID exists in context, so the reasoning engine infers `null`.
* **Subsequent turns**: The thread ID was returned in the previous response and exists in context, so the reasoning engine infers it automatically.

#### Conversation Process Implementation

This can be handled directly in a [Conversation Process](/agent-studio/conversation-process#/) -- no compound action or `switch` needed. The `thread_id` slot is truthy/falsy, so you can use DSL to conditionally pass it to the external API.

#### HTTP Action

Set up a single [HTTP Action](/agent-studio/actions/http-actions) that accepts both the `thread_id` and the user message. The API should always return a `thread_id` in the response so it can be carried forward.

If the external API doesn't automatically generate a new thread when `thread_id` is null, add an action step or logic in your compound action to create a new thread first, then pass the resulting ID to the main API call.

```bash
curl https://api.example.com/v1/chat \
  -X POST \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "thread_id": "{{{thread_id}}}",
    "message": "{{{user_query}}}"
  }'
```

#### Conversation Process Setup

In your [Conversation Process](/agent-studio/conversation-process#/), wire the slots directly to the action activity. The input mapping uses DSL -- since `thread_id` is falsy on the first turn, you can pass it as-is:

```yaml
user_query: data.query
thread_id: data.thread_id
```

The action activity returns the response to the user. The `thread_id` from the API response is now part of the conversation context, so on the next turn the reasoning engine will automatically infer it into the slot.

<Callout intent="info">
  **Important:** Make sure the `thread_id` is visible in the response output shown to the conversation. This is what allows the reasoning engine to pick it up as context on the next turn and infer it into the slot automatically. If the API returns it but it's not surfaced in the process output, the reasoning engine won't have it available to infer.
</Callout>

```mermaid
sequenceDiagram
    participant User
    participant AI_Assistant as AI Assistant
    participant Plugin as LLM Plugin
    participant External_API as External LLM API

    %% First Turn (thread_id = null)
    User->>AI_Assistant: "What were our top products last quarter?"
    AI_Assistant->>Plugin: query + thread_id=null
    Plugin->>External_API: POST /chat (thread_id=null, creates new thread)
    External_API-->>Plugin: { thread_id: "thread_abc123", response: "..." }
    Plugin-->>AI_Assistant: { thread_id: "thread_abc123", response: "..." }
    AI_Assistant-->>User: "Your top products last quarter were..."

    %% Second Turn (thread_id inferred from context)
    User->>AI_Assistant: "How does that compare to the previous quarter?"
    AI_Assistant->>Plugin: query + thread_id="thread_abc123"
    Plugin->>External_API: POST /chat (thread_id="thread_abc123")
    External_API-->>Plugin: { thread_id: "thread_abc123", response: "..." }
    Plugin-->>AI_Assistant: { thread_id: "thread_abc123", response: "..." }
    AI_Assistant-->>User: "Compared to the previous quarter..."
```

### Example 3: Build Your Own Thread Store (ServiceNow Table or Custom Database)

Many external agents and LLMs don't offer an API that keeps track of the thread for you, which means every API call is stateless -- the external system has no memory of prior turns. You can solve this by creating your own thread tracking mechanism using a ServiceNow table (or any database accessible via API).

<Steps>
  ### Create the ServiceNow Table

  In your ServiceNow instance, navigate to **System Definition > Tables** and create a new custom table. A recommended setup:

  | Column Name              | Type      | Max Length | Description                                                                                                                           |
  | :----------------------- | :-------- | :--------- | :------------------------------------------------------------------------------------------------------------------------------------ |
  | `u_thread_id`            | String    | 64         | Unique identifier for the conversation thread. Auto-populated via business rule (see below).                                          |
  | `u_user_id`              | String    | 128        | The email or sys\_id of the user who initiated the conversation. Used for lookups on subsequent turns.                                |
  | `u_external_session_id`  | String    | 256        | Optional. If the external API returns its own session or conversation ID, store it here for correlation.                              |
  | `u_conversation_history` | String    | 65000      | A JSON string storing the array of message pairs (user + assistant). Set the max length high to accommodate multi-turn conversations. |
  | `u_created_at`           | Date/Time | --         | Timestamp of when the thread was created. Useful for cleanup and TTL policies.                                                        |
  | `u_updated_at`           | Date/Time | --         | Timestamp of the last update. Useful for identifying stale threads.                                                                   |

  **Table name example:** `u_agent_thread_log` | **Label:** `Agent Thread Log`

  <Callout intent="info">
    Set the `u_conversation_history` column to a max length of 65000 (the ServiceNow string max) or use a multi-line text field. For very long conversations, consider a strategy to trim older messages and keep only the most recent N turns.
  </Callout>

  ### Add a Business Rule for Auto-generating Thread IDs

  Create a **Before Insert** business rule on `u_agent_thread_log` to automatically generate a unique `u_thread_id` when a new record is created. This way, your compound action only needs to POST the `u_user_id` and the first message -- the thread ID is generated server-side.

  ```javascript
  // Business Rule: Generate Thread ID
  // Table: u_agent_thread_log
  // When: Before Insert
  (function executeRule(current, previous) {
      current.u_thread_id = gs.generateGUID();
      current.u_updated_at = new GlideDateTime();
  })(current, previous);
  ```

  ### Create the ServiceNow REST APIs

  You need three operations, which you can accomplish via the standard **Table API** or a **Scripted REST API**:

  **Option A: Use the standard Table API**

  | Operation     | Method | Endpoint                                                                  | Purpose                                                                   |
  | :------------ | :----- | :------------------------------------------------------------------------ | :------------------------------------------------------------------------ |
  | Create thread | POST   | `/api/now/table/u_agent_thread_log`                                       | Create a new record with `u_user_id` and initial `u_conversation_history` |
  | Get thread    | GET    | `/api/now/table/u_agent_thread_log?sysparm_query=u_thread_id={thread_id}` | Retrieve conversation history for an existing thread                      |
  | Update thread | PATCH  | `/api/now/table/u_agent_thread_log/{sys_id}`                              | Append the latest message pair to `u_conversation_history`                |

  **Option B: Create a Scripted REST API** for cleaner endpoints and built-in logic (e.g., auto-trimming old messages, validating JSON structure). This is recommended if you want to encapsulate the history-append logic server-side rather than in your compound action.

  ### Build HTTP Actions in Agent Studio

  Create three [HTTP Actions](/agent-studio/actions/http-actions) in Agent Studio, one for each operation:

  1. **Create\_Thread\_Action** -- `POST` to create a new record. Send the user's first message as the initial `u_conversation_history` value (e.g., `[{"role": "user", "content": "..."}]`). Returns the `sys_id` and `u_thread_id`.
  2. **Get\_Thread\_Action** -- `GET` to retrieve the conversation history by `u_thread_id`. Returns the `u_conversation_history` JSON string.
  3. **Update\_Thread\_Action** -- `PATCH` to update the record with the latest user message and assistant response appended to the history.

  ### Wire It Together in Your Compound Action

  **How the flow works:**

  1. **On the first turn**, the compound action calls **Create\_Thread\_Action**, then calls the external LLM API with the user's message, then calls **Update\_Thread\_Action** to store both the user message and the LLM response. The `u_thread_id` is returned to the reasoning engine.
  2. **On subsequent turns**, the reasoning engine passes the `u_thread_id` (collected as a slot with inference policy set to auto-infer). The compound action calls **Get\_Thread\_Action** to retrieve history, constructs the full message array, calls the external LLM API, then calls **Update\_Thread\_Action** to append the new exchange.
  3. **Collect the `thread_id` as a slot** with an inference policy set to automatically infer from context -- the reasoning engine will carry it forward across turns without asking the user.

  ### Add Housekeeping

  Consider adding a **Scheduled Job** in ServiceNow to clean up stale threads (e.g., delete records where `u_updated_at` is older than 24 hours). This prevents the table from growing indefinitely and avoids surfacing outdated context.
</Steps>

```mermaid
sequenceDiagram
    participant User
    participant AI_Assistant as AI Assistant
    participant Plugin as LLM Plugin
    participant SN_Table as ServiceNow Table
    participant External_API as External LLM API

    %% First Turn
    User->>AI_Assistant: "Summarize our Q3 revenue trends"
    AI_Assistant->>Plugin: Send prompt
    Plugin->>SN_Table: POST - Create new thread record
    SN_Table-->>Plugin: Return thread_id
    Plugin->>External_API: Call API with user message
    External_API-->>Plugin: Return response
    Plugin->>SN_Table: PUT - Store message pair in conversation_history
    Plugin-->>AI_Assistant: Return response + thread_id

    %% Second Turn
    User->>AI_Assistant: "Now compare that to Q2"
    AI_Assistant->>Plugin: Send prompt + thread_id
    Plugin->>SN_Table: GET - Retrieve conversation_history by thread_id
    SN_Table-->>Plugin: Return prior messages
    Plugin->>External_API: Call API with full history + new message
    External_API-->>Plugin: Return response
    Plugin->>SN_Table: PUT - Append latest exchange
    Plugin-->>AI_Assistant: Return response
```

This approach gives you full context continuity with any stateless API, and the conversation history lives in a system you control.

## Handling Asynchronous APIs

Some external agents and APIs don't return results immediately. Instead, they accept a request, return a job or task ID, and require you to poll for the result. You can handle this pattern in Agent Studio using a [compound action](/agent-studio/actions/compound-actions) with chained [action steps](/agent-studio/actions/compound-actions/action) and [`delay_config`](/agent-studio/actions/compound-actions/action#/syntax-reference) to space out polling attempts.

**The pattern: Submit, wait, and poll with stacking intervals**

Rather than polling aggressively (which wastes API calls and may hit rate limits) or waiting too long (which degrades user experience), use a stacking wait strategy that starts short and gets progressively longer:

1. **Submit the request** -- Call the external API to kick off the async job. Capture the `job_id` or `task_id` from the response.
2. **Wait 15 seconds, then poll** -- Use `delay_config` on the next action step to pause, then call the status endpoint.
3. **If not ready, wait 1 minute, then poll again** -- Use a `switch` to check the status. If still processing, hit a second polling step with a longer delay.
4. **If still not ready, wait 5 minutes, then poll a final time** -- A last attempt with a longer window for slow-running jobs.
5. **Return the result** -- If the job completes at any polling step, return the result.

```yaml
# Example: Async API polling with stacking wait times
steps:
  # Step 1: Submit the async request
  - action:
      action_name: Submit_Async_Job_Action
      output_key: job_submission
      input_args:
        prompt: data.user_query

  # Step 2: Wait 15 seconds, then poll
  - action:
      action_name: Poll_Job_Status_Action
      output_key: poll_1
      delay_config:
        seconds: "15"
      input_args:
        job_id: data.job_submission.job_id

  # Step 3: Check result - if done, return; otherwise keep polling
  - switch:
      cases:
        - condition: data.poll_1.status == "completed"
          steps:
            - return:
                output_mapper:
                  result: data.poll_1.result

        - condition: data.poll_1.status != "completed"
          steps:
            # Step 4: Wait 1 minute, then poll again
            - action:
                action_name: Poll_Job_Status_Action
                output_key: poll_2
                delay_config:
                  minutes: "1"
                input_args:
                  job_id: data.job_submission.job_id

            - switch:
                cases:
                  - condition: data.poll_2.status == "completed"
                    steps:
                      - return:
                          output_mapper:
                            result: data.poll_2.result

                  - condition: data.poll_2.status != "completed"
                    steps:
                      # Step 5: Wait 5 minutes, final poll
                      - action:
                          action_name: Poll_Job_Status_Action
                          output_key: poll_3
                          delay_config:
                            minutes: "5"
                          input_args:
                            job_id: data.job_submission.job_id

                      - switch:
                          cases:
                            - condition: data.poll_3.status == "completed"
                              steps:
                                - return:
                                    output_mapper:
                                      result: data.poll_3.result
                            - condition: data.poll_3.status != "completed"
                              steps:
                                - return:
                                    output_mapper:
                                      result: '''The request is still processing. There may be an error'''
```

```mermaid
sequenceDiagram
    participant User
    participant AI_Assistant as AI Assistant
    participant Plugin as Async Plugin (CA)
    participant External_API as External API

    User->>AI_Assistant: "Run the quarterly forecast model"
    AI_Assistant->>Plugin: Execute compound action
    Plugin->>External_API: POST /jobs (submit request)
    External_API-->>Plugin: { job_id: "abc123", status: "processing" }

    Note over Plugin: Wait 15 seconds (delay_config)
    Plugin->>External_API: GET /jobs/abc123 (poll #1)
    External_API-->>Plugin: { status: "processing" }

    Note over Plugin: Wait 1 minute (delay_config)
    Plugin->>External_API: GET /jobs/abc123 (poll #2)
    External_API-->>Plugin: { status: "processing" }

    Note over Plugin: Wait 5 minutes (delay_config)
    Plugin->>External_API: GET /jobs/abc123 (poll #3)
    External_API-->>Plugin: { status: "completed", result: { ... } }

    Plugin-->>AI_Assistant: Return forecast results
    AI_Assistant-->>User: "Here are your Q3 forecast results..."
```

<Callout intent="info">
  Adjust the polling intervals based on the expected response time of your external system. For APIs that typically respond in under a minute, you might use 5s -> 15s -> 1m. For long-running jobs, consider 1m -> 5m -> 15m. Set the last poll for the upper bound of the system you are connecting to.
</Callout>

## Token Consumption and Cost

LLM providers charge based on the number of tokens processed (both input prompt and output generation). Long conversations or large documents can become expensive quickly.

**Best Practices:**

1. **Set Limits**: Always use the `max_tokens` parameter in your API calls to cap the length of the response and prevent unexpectedly large (and expensive) outputs.
2. **Be Concise**: Encourage users and design system prompts to be as concise as possible.
3. **Monitor Usage**: Regularly check your API usage and cost dashboards on the LLM provider's platform.
4. **Choose the Right Model**: For simpler tasks, consider using smaller, faster, and cheaper models instead of the most powerful (and most expensive) ones.

## Data Security & Privacy

Standard public LLM APIs may use your prompt data to train their models. Sending Personally Identifiable Information (PII) or sensitive company data is a significant risk.

**Best Practices:**

* **Consult Your Security Team**: Always review the data privacy and terms of service for any LLM provider.
* **Prefer Enterprise Offerings**: Whenever possible, use enterprise-grade services like Azure OpenAI or an OpenAI Enterprise agreement, which typically guarantee that your data will not be used for model training.
* **Anonymize Data**: If you must send potentially sensitive information, build steps in your workflow to find and replace sensitive data with placeholders before sending it to the LLM. You can use [our LLM Actions](/agent-studio/actions/llm-actions#/) to do this.
* **Educate users**: Inform users about what data is being sent to a third-party service and advise them against submitting sensitive information. You can do this through a **Content Activity** & enabling the **Activity Confirmation Policy** on your API call.

## Plugin Selection

Triggering reliability can vary depending on the use case and breadth of positive utterance subject matter. Below are some options to optimize your LLM plugins:

1. **Define Diverse But Specific Utterances**: In your plugin's trigger configuration, provide a wide range of example phrases. For a summarization plugin, this could include:
   * "summarize this document"
   * "give me the tl;dr"
   * "what are the key points of this?"
   * "can you create an executive summary"
2. **Define a trigger keyword**: Assign a deterministic triggering phrase to your plugin so that users can trigger the plugin on command -- this will help ensure the agent is always called.
3. **Use a System Prompt**: Instead of relying on the user to frame their entire request, use the `system` message (or an equivalent field) in your API request body. This pre-prompts the LLM with its role or instructions (e.g., "You are an expert at rewriting text to be more professional"). The user then only needs to provide the core input, making the interaction much smoother.

# Check out our demo!

<iframe src="https://player.vimeo.com/video/1125946343?h=aa563f08f9&badge=0&autopause=0&player_id=0&app_id=58479" frameBorder="0" allowFullScreen />