Ungrounded Responses in Moveworks Copilot

Overview

Sometimes, you may have questions about grounding and references in responses:

  • “Why did the Moveworks Copilot return information that is not in our knowledge base?”
  • “Why do I sometimes get responses without citations?”
  • “The Moveworks Copilot gave a response and I don’t know here it came from?”

It’s important to note that currently, the Moveworks Copilot does not search the open internet. Therefore, if an answer has not been sourced from the available references, then there is a chance that it may be ungrounded. In this article we explain some patterns that may contain ungrounded responses. However, there are a couple of troubleshooting questions that can guide you towards more clarity within a few minutes:

  1. Were any Plugins called as part of the AI Reasoning? Check the AI Reasoning for the response. If no plugins were called, then the response was likely ungrounded.
  2. Are there no citations in the citations panel? If no citations, then this is the tell-tale sign of a ungrounded response. But if there are citations, the you can trace which pieces of content were used to generate the response.

Ungrounded content in Moveworks Copilot responses

The Moveworks Copilot is geared towards producing responses that leverage the resources and content present in the customer’s environment. This process is called grounding.

State-of-the-art large language models such as GPT-4 Turbo have been trained on billions of diverse documents available publicly, which when combined with their superior ability to follow user instructions, enables them to generate natural, contextual responses. This “knowledge” drawn from the training data and memorized by the LLM is also called world knowledge.

There are certain scenarios in which the Moveworks Copilot may use content that is not derived from customer data. The following types of information may be observed in Moveworks Copilot responses from this training data:

Understanding of definitions and acronyms

The LLM may use its world knowledge to produce answers with context that may not be present in customers’ data. As an example,

  • GPT-4 Turbo may be aware of the full form of an acronym that is not explicitly stated in customer documents
  • It may be aware of the nature of entities and their relationship to other entities, e.g. how two versions of a mobile device are related

Helpfulness as an enterprise assistant

The Moveworks Copilot is instructed to behave as a responsible, helpful, knowledgeable and empathetic assistant, which is generally a very desirable experience for users. As part of its response generation to embody these traits, it may produce verbiage that is not instructed or programmed in any way, but is generally aligned with that persona. This includes acknowledgement of the user’s issue and their concerns and high level general suggestions.

Note that in the majority of cases, the Moveworks Copilot informs the user if it is unable to find specific relevant information in the customer’s content for the question, but may continue to offer such high-level suggestions. The Moveworks Copilot is instructed to not engage the user on certain topics, which limits the range of such suggestions and their frequency.

Content drafting capability

The Moveworks Copilot has a plugin which can be used to generate emails, chat messages, summaries and other artifacts leveraging previously generated grounded responses (using conversational context). Users can also give formatting and tone related guidance to the Moveworks Copilot when using this plugin. Due to the need to utilize more creative aspects of text generation, this plugin is not as constrained to produce grounded responses. However, it is intended to be aligned with the other policies listed above, including the policy on work-related requests.

Formally unsupported work-related use cases

LLMs are useful for a wide variety of work-related use cases. Moveworks is committed to adding support for as many of these as possible over time while remaining consistent with the Moveworks Copilot’s guiding principles.

At any given time, there may be some use cases that are possible to accomplish in the Moveworks Copilot, but are not formally supported. Formal support in this context generally means that Moveworks has tested those use cases with certain high priority utterances and has added guidelines or policies and prompt tuning to ensure that they work predictably and reliably.

A noteworthy example of a category of use cases that is currently possible but not formally supported is code related tasks. The Moveworks Copilot is allowed to respond to user requests to analyze or generate code, but these applications are not currently rigorously tested by Moveworks.

Tone and formatting behavior

There is an infinite number of ways to prescribe response generation behavior, and at any given time, there will be only a relatively much smaller number of explicit instructions provided. This means that several aspects of the response generation such as summarization style, tone, length and formatting could vary over time.

Knowledge-seeking queries that produce ungrounded responses

While ungrounded response generation is rare, it is still possible. Moveworks has a continuous improvement effort to ensure that the GPT-powered Moveworks Copilot maintains fidelity to customers’ knowledge and files. The Moveworks Copilot is also instructed and tested to communicate to the user, if necessary, that it was unable to find any relevant knowledge, instead of leveraging the model’s memorized training data to produce a response. The majority of Moveworks customers have also expressed a preference for receiving only grounded answers.

As part of this effort, Moveworks human evaluators annotate usage data for quality, including flagging cases where ungrounded responses were produced by the Moveworks Copilot. Machine learning engineers then use this data to identify patterns where such ungrounded responses have occurred (<0.1% of all interactions). To mitigate such occurrences in the future, Moveworks continually adjusts the instructions to GPT-4 Turbo to maintain a high rate of grounded answers.

Knowledge-seeking queries that produce grounded but inaccurate responses

There is potential for misinterpretation, and hence, inaccurate summarization, if the context is ambiguous or distributed across a large document. Moveworks breaks larger documents into smaller chunks called snippets, which enables us to more closely match user queries to specific parts of articles and files. However, sometimes, this can lead to a loss of context that is spread across multiple parts of the document, and may lead the Moveworks Copilot to misinterpret one snippet because it may not be aware of other related snippets.

An example could be if you ask for an abbreviation that is defined in one part of the document and used in another part. Another example is when a response is generated based on combining multiple plugin responses, and there is ambiguity in how the multiple responses fit together; this is possible when a query returns an API response that is missing some data which would enable the Moveworks Copilot to determine how it is related to the rest of the content.

How does Moveworks minimize ungrounded responses in the Moveworks Copilot?

  1. We use an in-house fine-tuned toxicity check model that detects toxic, unprofessional or problematic inputs and ensures that the Moveworks Copilot does not engage the user when such an input is detected. This prevents scenarios where the topic is unsuitable for a work environment, and there is likely no relevant information present
  2. At the heart of the Moveworks Copilot's Q&A capabilities is our search system that has been developed over several years, and is specialized to provide semantically relevant results from your multiple knowledge sources, such as KBs, forms, generally available user roster information and any custom queries developed using Creator Studio. This system ensures that the Moveworks Copilot has the most relevant resources to work with
  3. We use retrieval augmented generation to constrain the Moveworks Copilot to only generate a response using the customer's enterprise data. This minimizes the likelihood of ungrounded responses. If the Moveworks Copilot does not have relevant information, it declines to answer the question, offering to assist with filing a ticket to find the information
  4. The Moveworks Copilot further provides the exact references it used, and embeds inline citations to enable the user to independently verify the accuracy of the response.
  5. To mitigate the Moveworks Copilot answering incorrectly or performing actions unrelated to your request, it may ask you clarifying, follow-up questions to fully understand what your request is.
  6. Customers can choose to add a disclaimer regarding the use of Generative AI in the Moveworks Copilot for their users.

Continuous improvement for better grounding

We are actively researching and developing enhancements to Moveworks Copilot’s ability to reduce ungrounded responses:

  1. Investing in our Search retrieval: We are improving our semantic search capabilities so that your Moveworks Copilot not only retrieves information relevant to your question (precision KPI), but also explores all the potentially relevant information out there (recall KPI)
  2. Building Specialized Fact-Checking Models: We are developing in-house Fact-Checker models to verify the accuracy of each of your Moveworks Copilot's responses. These models will ensure that every statement your Moveworks Copilot makes aligns closely with the source material and stays relevant to your original question.
  3. Specialized In-House Annotation Teams: Recognizing the importance of quality evaluation data, we're investing in expert in-house annotation teams. They specialize in labelling the data we use to improve the Moveworks Copilot, enabling us to identify issues quickly and address them
  4. Moveworks Knowledge Studio: To further bridge knowledge gaps that could lead to ungrounded responses, we've introduced Moveworks’ Knowledge Studio. Now you can increase the coverage of your Knowledge Base more easily and reduce knowledge gaps.

Guidance for stakeholders and users on ungrounded responses

  • Moveworks recommends that customers treat ungrounded responses as subject to change and not take strong dependencies on these continuing to be supported.
  • We also recommend that users look for citations in the generated response for the references used, so that they can always independently verify whether the response is based on their organization’s content.
  • Users can also ask follow-up questions to get more clarity from the Moveworks Copilot on the sources it has used.
  • Users are free to use permitted but formally unsupported use cases at their own discretion, while observing the same precautions to check the accuracy of the responses.