Sometimes, you may have questions about grounding and references in responses:
It’s important to note that currently, the Moveworks Assistant does not search the open internet. Therefore, if an answer has not been sourced from the available references, then there is a chance that it may be ungrounded. In this article we explain some patterns that may contain ungrounded responses. However, there are a couple of troubleshooting questions that can guide you towards more clarity within a few minutes:
We are actively researching and developing enhancements to Moveworks Assistant’s ability to reduce ungrounded responses:
The Moveworks Assistant is geared towards producing responses that leverage the resources and content present in the customer’s environment. This process is called grounding.
State-of-the-art large language models such as GPT-4 Turbo have been trained on billions of diverse documents available publicly, which when combined with their superior ability to follow user instructions, enables them to generate natural, contextual responses. This “knowledge” drawn from the training data and memorized by the LLM is also called world knowledge.
There are certain scenarios in which the Moveworks Assistant may use content that is not derived from customer data. The following types of information may be observed in Moveworks Assistant responses from this training data:
The LLM may use its world knowledge to produce answers with context that may not be present in customers’ data. As an example,
The Moveworks Assistant is instructed to behave as a responsible, helpful, knowledgeable and empathetic assistant, which is generally a very desirable experience for users. As part of its response generation to embody these traits, it may produce verbiage that is not instructed or programmed in any way, but is generally aligned with that persona. This includes acknowledgement of the user’s issue and their concerns and high level general suggestions.
Note that in the majority of cases, the Moveworks Assistant informs the user if it is unable to find specific relevant information in the customer’s content for the question, but may continue to offer such high-level suggestions. The Moveworks Assistant is instructed to not engage the user on certain topics, which limits the range of such suggestions and their frequency.
The Moveworks Assistant has a plugin which can be used to generate emails, chat messages, summaries and other artifacts leveraging previously generated grounded responses (using conversational context). Users can also give formatting and tone related guidance to the Moveworks Assistant when using this plugin. Due to the need to utilize more creative aspects of text generation, this plugin is not as constrained to produce grounded responses. However, it is intended to be aligned with the other policies listed above, including the policy on work-related requests.
LLMs are useful for a wide variety of work-related use cases. Moveworks is committed to adding support for as many of these as possible over time while remaining consistent with the Moveworks Assistant’s guiding principles.
At any given time, there may be some use cases that are possible to accomplish in the Moveworks Assistant, but are not formally supported. Formal support in this context generally means that Moveworks has tested those use cases with certain high priority utterances and has added guidelines or policies and prompt tuning to ensure that they work predictably and reliably.
A noteworthy example of a category of use cases that is currently possible but not formally supported is code related tasks. The Moveworks Assistant is allowed to respond to user requests to analyze or generate code, but these applications are not currently rigorously tested by Moveworks.
There is an infinite number of ways to prescribe response generation behavior, and at any given time, there will be only a relatively much smaller number of explicit instructions provided. This means that several aspects of the response generation such as summarization style, tone, length and formatting could vary over time.
While ungrounded response generation is rare, it is still possible. Moveworks has a continuous improvement effort to ensure that the GPT-powered Moveworks Assistant maintains fidelity to customers’ knowledge and files. The Moveworks Assistant is also instructed and tested to communicate to the user, if necessary, that it was unable to find any relevant knowledge, instead of leveraging the model’s memorized training data to produce a response. The majority of Moveworks customers have also expressed a preference for receiving only grounded answers.
As part of this effort, Moveworks human evaluators annotate usage data for quality, including flagging cases where ungrounded responses were produced by the Moveworks Assistant. Machine learning engineers then use this data to identify patterns where such ungrounded responses have occurred (<0.1% of all interactions). To mitigate such occurrences in the future, Moveworks continually adjusts the instructions to GPT-4 Turbo to maintain a high rate of grounded answers.
There is potential for misinterpretation, and hence, inaccurate summarization, if the context is ambiguous or distributed across a large document. Moveworks breaks larger documents into smaller chunks called snippets, which enables us to more closely match user queries to specific parts of articles and files. However, sometimes, this can lead to a loss of context that is spread across multiple parts of the document, and may lead the Moveworks Assistant to misinterpret one snippet because it may not be aware of other related snippets.
An example could be if you ask for an abbreviation that is defined in one part of the document and used in another part. Another example is when a response is generated based on combining multiple plugin responses, and there is ambiguity in how the multiple responses fit together; this is possible when a query returns an API response that is missing some data which would enable the Moveworks Assistant to determine how it is related to the rest of the content.