Configure Moveworks Toxicity Filter

View as Markdown

Overview

The Moveworks AI Assistant is a general-purpose employee service tool, it is instructed to not engage on sensitive topics. In these scenarios, the expected behavior is for the bot to decline the request. Using machine learning, all incoming requests are analyzed for potential toxic or non-work appropriate content. Moveworks supplements GPT’s own toxicity check with a large language model that assesses appropriateness for work environments, and uses a policy that guides the Moveworks Assistant to not engage with the user if such a request is detected. In cases where the toxicity filter is triggered, the user will receive a message similar to I’m unable to assist with that request, and will not receive an acknowledgement of the issue.

Examples: Language that is hateful, abusive, derogatory or offensive.

In some cases, the Moveworks baseline toxicity default filter can be too broad. For example, an organization may want the AI Assistant to engage with users seeking mental health support, where surfacing relevant knowledge (such as Employee Assistance Program resources) is more beneficial than declining the request.

How this Feature Works

To handle these cases, Moveworks organizes its safety filter into a fixed set of content categories. By default, all categories are blocked. Admins can choose to allow specific categories so that the AI Assistant will engage with requests falling under those categories and surface the relevant knowledge.

The available content categories are:

  • Violent — content describing or promoting physical violence.
  • Sexual Content or Sexual Acts — sexually explicit content or descriptions of sexual acts.
  • Suicide & Self-Harm — content related to suicide or self-harm. Some organizations allow this category so employees can ask for mental health help and be routed to relevant support resources.
  • Unethical Acts — content describing or promoting unethical behavior.
  • Jailbreak — attempts to bypass the AI Assistant’s safety instructions or intended behavior.

Any category that is not explicitly selected remains blocked. The configured selections are applied to every future interaction with the Moveworks AI Assistant.

Note: By default, every organization starts with all content categories blocked. Allowing a category applies to all users interacting with the Moveworks AI Assistant, so select categories carefully based on your organization’s policies.

Prerequisites

Before configuring allowed content categories, please gather the following:

  1. A list of use cases or scenarios where the AI Assistant is currently declining requests that you would like it to engage with (e.g., employees asking about mental health support and being declined).
  2. The content categories from the list above that cover those scenarios (e.g., “Suicide & Self-Harm” to enable mental health support use cases).
  3. Alignment with relevant stakeholders (e.g., HR, Legal, Security) on which categories are appropriate to allow for your organization.

Configuration Steps

Step 1: Navigate to Display Configurations

  1. Navigate to Chat Platforms -> Display Configurations


Step 2: Navigate to Moveworks AI Assistant Display Settings & Disclaimers toggle

  1. Scroll down to the Moveworks AI Assistant Display Settings & Disclaimers toggle and click into this module.


Step 3: Select Allowed Content Categories

  1. Scroll down to the Safety Guard: Allowed Content Categories section.
Screenshot 2026-04-16 at 2.27.30 PM
  1. Click the dropdown and check the box next to each category you want to allow (e.g., Suicide & Self-Harm to enable mental health support use cases). Uncheck a category at any time to return it to the default blocked state. Screenshot 2026-04-16 at 2.28.05 PM

Any category you do not select will remain blocked. Only select categories that your organization has reviewed and explicitly wants the AI Assistant to engage on.

FAQs

  1. Q: Does this override underlying toxicity protections in GPT models?

A: This is a Moveworks specific toxicity check only. There is also the OpenAI or Azure toxicity check that gets applied, which Moveworks does not have control over.

  1. Q: What happens if I don’t select any categories?

A: All content categories remain blocked. This is the default behavior for every organization and provides the strongest safety posture.

  1. Q: Can I add my own custom categories?

A: No. The set of content categories is fixed and maintained by Moveworks. You can only choose which of the provided categories to allow.