Guardrails

Overview

In this article, you will learn about the safeguards implemented by Moveworks in the Moveworks AI Assistant to enable users to get help for a wide variety of use cases in a natural, conversational experience, while ensuring that the response are safe and appropriate for work settings.

Guiding Principles

The design of the Moveworks Assistant follows these key principles in how it serves user requests:

Ensure that toxic, inappropriate or offensive content does not enter the system.
Be cautious with sensitive and highly subjective topics.
Respond to users with relevant content from their organization’s resources.
Focus on primarily serving work-related requests.
As far as possible, use available plugins as tools to serve user requests. Do not respond to the user without attempting to use plugins first.
Give users visibility into the content used to generate the responses.

Input Safeguards

When an utterance is received by the Moveworks Assistant, it is processed with a sequence of safeguards and enhancements to enable it to be addressed appropriately. The following layers are arranged in the order in which they apply to a user request when they interact with the Moveworks Assistant.

Toxicity Filter

Purpose: Block inappropriate content and requests from entering the system.

How it works: Using machine learning, all incoming requests are analyzed for potential toxic or non-work appropriate content. Moveworks supplements GPT’s own toxicity check with a fine-tuned large language model (FLAN-T5) that assesses appropriateness for work environments, and uses a policy that guides the Moveworks Assistant to not engage with the user if such a request is detected. In cases where the toxicity filter is triggered, the user will receive a message similar to I'm unable to assist with that request, and will not receive an acknowledgement of the issue.

Examples: Language that is hateful, abusive, derogatory or offensive.

Sensitive Topics Policy

Purpose: Instruct the Moveworks Assistant to not engage the user on topics that are sensitive and may be considered highly subjective from one person to another. The goal is to set expectations with users that the Moveworks Assistant is not the right medium to discuss these topics.

How it works: Using policies which are applied to all requests, Moveworks instructs the GPT-4 model to analyze the theme of the request and decline to discuss potentially sensitive topics such as those related to medical, psychological, physiological, emotional, financial or political matters. These are treated differently from content that is considered toxic, and the user may receive an acknowledgement and polite refusal to respond.

Product and User Context

Purpose: Instruct the Moveworks Assistant on how to analyze and assess user requests from the perspective of a helpful, knowledgeable enterprise assistant.

How it works: Using prompts, Moveworks instructs the GPT-4 model on key aspects of enterprise support, including:

Invoke the Moveworks Assistant’s role and persona of a helpful enterprise assistant.
Instruction to always use provided functions or plugins to resolve requests, and not to respond without attempting to use a plugin
User information including name, location and preferred language
The current date and time

Organization-specific Grounding

Purpose: Guide the behavior of the Moveworks Assistant by providing the list of plugins or tools available in the environment, and their selected knowledge, which varies from customer to customer, and may also vary between individual users.

How it works: Using Moveworks’ plugin architecture, with retrieval augmented generation, we provide the Moveworks Assistant with the available and relevant plugins to resolve the user’s request:

The available list of relevant plugins in the organization’s bot setup, including custom plugins created with Agent Studio (see the list of standard plugins in the appendix).
Descriptions, arguments and outputs from each plugin
Instructions on how to use the list of provided plugins and when to direct the user to help options outside the bot
The grounding of the response in the organization’s knowledge and resources is also enforced in this manner.

Work-related Scope

Purpose: To constrain the scope of the Moveworks Assistant to that of an enterprise assistant. This is done to ensure that the Moveworks Assistant is working with relevant and work-appropriate content and resources.

How it works: If a request is detected to be about clearly non-work related matters, the Moveworks Assistant informs the user that it is unable to answer the query and emphasizes that it is designed to help with work-related questions.

Optional configurable instructions

Purpose: For some organization-specific scenarios, Moveworks can add optional configurable instructions to guide the Moveworks Assistant on now to respond to certain user requests. These are not deterministic overrides, but are intended to influence the response while remaining compliant with overarching policies.

How they work: These instructions are appended to the prompt sent to the LLM and can provide a measure of customization for some steps such as plugin selection. As an example, it’s possible to guide the reasoner to select the Forms plugin for queries related to moving an office and equipment if we see that the current behavior is that Forms and Knowledge plugins may both be selected at different times.

Output Safeguards

Similarly, the output generated by the reasoning engine is also processed to make sure that it is safe. This is primarily done via the toxicity filter.

Toxicity Filter

Similar to the toxicity checks on the input side, we also check the generated output to ensure that it is appropriate for work environments. This provides an additional layer of protection against malicious attempts to bypass controls on the input side.

Citations

Purpose: While the Assistant provides a summarized response in chat, it shows all citations - indicated by a superscript number (e.g., (1) - from the source. Citations allow users to read the source article and verify the truthfulness of the Assistant response.

How they work: Citations can be a knowledge article snippet, a person's profile card, an office map, or response from a Agent Studio Plugin. Users can access the citations by clicking on the ℹ️ icon at the bottom of each Assistant response.

Linked Verified Entities

Purpose: To provide validation of Assistant's summarized response against existing knowledge resources - what we call “grounding” - ensuring users can trust the information they receive from Assistant.

How they work: We verify entities mentioned in user messages through a two-step process:

We predefine the types of entity mentions we want to verify (e.g., people, URLs).
We check each step of the plugin responses to ensure those mentions can be found in the source. If they are found, the entity is verified. Verified entities will be marked with a small “+” superscript. Clicking on the “+” will direct you to the reference section where the entity is verified.

FAQs

How are hyperlinks included in the Moveworks Assistant response?

There are a few different types of hyperlinks that are included in the Moveworks Assistant summarization response.

A hyperlink from a knowledge article. If there are hyperlinks in the knowledge article, they will be included as hyperlinks in the Moveworks Assistant summarization.
Software deep links.For certain software that are configured to be self-serviceable by end users, the Moveworks Assistant will include a deep link to the organization's internal software center (e.g. Microsoft Intune)
A link from FAQs.If a link is included in an FAQ (such as a link that takes user to reset their password), they will be rendered as a blue link (http://) in the Moveworks Assistant response.

The following links are not available in the Moveworks Assistant response:

The link to the actual knowledge article.While users can access the original knowledge source through the citations page, the Moveworks Assistant LLM does not have access to the link to the actual knowledge article.

We've noticed a few challenges with how the Moveworks Assistant include hyperlinks in the summarization. As the Moveworks Assistant is built with generative AI, these variances are sporadic and cannot be 100% eliminated. However, we are continuously tuning the Moveworks Assistant prompt to ensure better consistency.

Incorrectly formatted hyperlinks: Sometimes you might notice that the link in the response is not properly linked. When this happens, you can click on the citation page and access the link directly from the knowledge article.
Missing hyperlinks: The Moveworks Assistant summarization omits the hyperlink from the knowledge source. When this happens, you can follow-up by asking "Can you give me the hyperlink?"
Hallucinated hyperlinks:Reducing the chance of hallucination and making sure the Moveworks Assistant’s response is grounded and verifiable is a continuous improvement area. With linked verified entities, users can verify the truthfulness of hyperlinks.

Read more about the AI Assistant's agentic architecture here.

Updated about 1 month ago