Moveworks Copilot Behavior Framework

Overview

The Moveworks Copilot Behavior Framework is written to provide a baseline understanding of common Copilot responses, in order to build a deeper understanding of an agentic AI conversational experience. Furthermore, this framework is structured to improve knowledge by imparting key troubleshooting strategies, significantly decreasing the need for direct intervention for every encountered issue. This methodical approach ensures admins are well-equipped to manage and resolve Copilot issues, enhancing the overall user experience.

The Moveworks Copilot Behavior Framework

The Moveworks Copilot is designed with a comprehensive behavior framework that includes multiple layers for interpreting user queries, providing context-aware responses, and maintaining effective performance. Below are the key components of the framework:

  • Utilizes chat context, like chat history, for context-aware responses, adapting to user's specific interactions.
  • Features dynamic response generation through AI reasoning and customizable plugins.
  • Employs conversation safeguards to ensure accurate information delivery.
  • Attempts to manage hallucinations with specific criteria for more accurate responses.
  • Maintains performance optimization, ensuring swift responses regardless of the number of concurrent active users on Moveworks Copilot.

Leveraging Chat Context to Contextualize Responses

Conversation context is the previous conversation history between the user and the Moveworks Copilot. The ability to use context is a major differentiator between Moveworks Classic and the Moveworks Copilot experiences.

Generally, it makes sense that incorporating context will lead to more conversational and helpful interactions. But what are the specific ways in which the use of context impacts the responses that users get?

Here are the ways in which it influences the Moveworks Copilot’s behavior and responses:

  1. It is used to determine whether the user is speaking about a new topic or continuing a previous one.
  2. The reasoning engine uses context to understand references such as pronouns and follow-up questions that may be vague when taken in isolation, but make sense when combined with previous context.
  3. Resources found in response to previous requests are also available to the reasoning engine for a few subsequent turns, and may be used to supplement any resources found for the current request.
    1. In a few cases, if the reasoning engine determines that it can answer the current request completely with the information it already has, it may not call a plugin and will summarize a response from the previously retrieved information.
  4. In cases where the Moveworks Copilot produces a structured card response, such as people lookups or ticket information, the card shows fields that were configured to be displayed. However, any other fields provided by the underlying API are also available to be used for any follow up questions.
  5. User feedback or responses to previously provided responses will influence the Moveworks Copilot’s decision to call the same plugin or other plugins.
    1. Particularly in the case of repeated requests for the same information, the reasoning engine may not provide the exact same answer again and again if the user’s requests indicate that the answer was not helpful.
  6. It may be used to provide more empathetic or situationally aware responses if the user has had trouble getting a successful response.

How long does this context window last? The Moveworks Copilot does not have a hard time-based cutoff for the context, but retains the last few interactions, allowing a user to ask contextual questions even if there is a gap after the last interaction. There is no ability for customers to change the context window.

Using AI Reasoning Step for Plugin Selection

An important part of planning that deserves to be called out separately is plugin selection. Here, the Moveworks Copilot tries to match the user request with the best plugin from the list of all available plugins in the customer’s environment (including custom plugins built using Creator Studio) by using their descriptions and the arguments they take as inputs.

Currently, the Moveworks Copilot tries to select the single most useful plugin for every request at a time.

The key factors that determine which plugin is called by the reasoning engine are:

  1. Query details, including intent and whether a previous topic is being continued or not
  2. Plugin description, including positive/negative examples
  3. Prior conversation context, particularly if a plugin was called previously and did not appear to help the user

If multiple plugins could address the user’s request, the Moveworks Copilot selects one, and in case that does not produce a useful response, it may select a different plugin to try another approach.

Note that the reasoning engine is generally not aware of the resources that may be available if a plugin is called, until it calls the plugin. As an example, this means that it is not aware of the KBAs or files that the knowledge search plugin can serve until it calls the plugin.

However, in some cases, a “lightweight” search call may be made to get an initial, rough signal for whether the knowledge search plugin or the forms plugin have any content that may be useful This is not used to make a final plugin selection, but can help influence whether these plugins get selected.

Reducing Hallucinations to Improve Accuracy

Generative large language models such as GPT-4 and Llama 3 excel at producing natural, fluid text in response to requests. They owe both their linguistic abilities, as well as their ability to use facts in responses, to the huge amounts of data that they have been trained on. This ability to generate high quality text “from memory” is critical to enabling these LLMs to have rich conversations with users. Such text generation, which does not use any reference documents or data besides what the model has been trained on, is technically a hallucination.

This is a key challenge for all applications that require that users only be served with relevant, accurate answers which are based on specific resources, such as an organization’s knowledge base, employee information, lookups from systems of record and terminology or entities prevalent within the organization. In summary, every hallucination is not harmful, but constraining the output of LLMs used for search to only verified sources is key.

Moveworks has addressed hallucinations in the Moveworks Copilot in several ways:

  1. Moveworks uses an in-house fine-tuned toxicity check model that detects toxic, unprofessional or problematic inputs and ensures that the Moveworks Copilot does not engage the user when such an input is detected. This prevents scenarios where the topic is unsuitable for a work environment, and there is likely no relevant information present 🛡️
  2. At the heart of the Moveworks Copilot's Q&A capabilities is our search system that has been developed over several years, and is specialized to provide semantically relevant results from your multiple knowledge sources, such as KBs, forms, generally available user roster information and any custom queries developed using Creator Studio. This system ensures that the copilot has the most relevant resources to work with 📚
  3. We use retrieval augmented generation aka RAG to constrain the Moveworks Copilot to only generate a response using the customer's enterprise data (examples listed above). This minimizes the likelihood of hallucinations and makes responses grounded and factually accurate. If the Moveworks Copilot does not have relevant information, it declines to answer the question, offering to assist with filing a ticket to find the information 🔍
  4. The Moveworks Copilot further provides the exact references it used, and embeds inline citations to enable the user to independently verify the accuracy of the response. 🗂️
  5. To mitigate the Moveworks Copilot answering incorrectly or performing actions unrelated to your request, it will ask you clarifying, follow-up questions to fully understand what your request is. 💬
  6. You can choose to add a disclaimer regarding the use of Generative AI in the Copilot for their users, but very few customers (one to-date) have leveraged this approach. With the emergence of consumer AI tools on the market the good news is most end-user populations have more of an understanding of the AI landscape and understand that there is a margin of error.

FAQ

  1. Why are two users who asked the same question receive different answers?
  • The most common reason why this happens is that the two users chat histories contained different context prior to asking the same question. The Moveworks Copilot takes into account chat history when developing answers and even slight differences in how questions are both asked and how answers are responded to may result in the Moveworks Copilot answering differently. This allows the Moveworks Copilot to adapt its response to user queries with the most recent information from the previous conversation, and ensures that the user gets a conversational and situation-aware response.
  1. Why does the Moveworks Copilot response vary for the same question by the same user?
  • The Moveworks Copilot has a flexible and very powerful reasoning engine that takes into account different dimensions of inputs before it returns a result to the user. Because of this, it is able to handle many complex requests, and even call multiple plugins in a row to fulfill a user’s request. This level of flexibility is only made available by the reasoning engine. Just like talking to a human agent, we do not expect their answer to be exactly the same every time. You can try entering the same exact utterance to chatGPT, and notice that the response it gives varies every time. Many factors could affect the Moveworks Copilot response, such as context from previous turns, the users’ permissions, and more. When a user asks a question repeatedly, while the input utterance is exactly the same, the context varies. Therefore, the Moveworks Copilot picks up new information every time, and could return a different response every time.
  1. What steps should I take to understand why there are different responses?
  • There are a few things you can do to inspect where and why the Moveworks Copilot provided a different answer to the same question:
    • Look at the prior context of the two conversations. By inspecting the prior context of the two conversations you can find likely points of difference(s) in interactions between the two users that could highlight why the answers could be slightly different.
    • Look at AI Reasoning Steps. Click the ℹ️ icon to look up both sets of AI reasoning steps. By comparing the steps, you can see if the Moveworks Copilot called different plugins to answer the question. Although this may not directly answer why the Moveworks Copilot did so, you can usually combine looking at the AI reasoning steps with the context history to hypothesize what happened.
    • Look at the Citations in the Reference Page. The last place to check is to look at what information the Moveworks Copilot cited. It is possible that even when calling the same plugin, the Moveworks Copilot retrieved different pieces of information. If it does so, this could explain why the summary or answer is different. Again, this doesn’t explain why it did so, but just like with AI Reasoning steps, you can probably hypothesize why when looking at context history.

🛠

Using the copilot clearhistory command

When performing testing of different utterance variations, you may encounter problems with context creating different answers. Moveworks Copilot often treats repeated questions of the same type to mean that the previously provided option was not helpful. As a result, the Moveworks Copilot proactively changes its responses to try other solutions. If you are doing heavy testing, you may enable copilot clearhistory .

Reach out to Customer Success for more details and to get it enabled.

  1. Why didn't the Moveworks Copilot return the expected resource [document/file/query result]?
  • These situations are difficult to assess without more context; however, there are a several troubleshooting questions that can guide you towards more clarity within a few minutes:
    • What specific resource were you expecting? This is always the most important place to start as it grounds the conversation in the exact resource in question instead of a theoretical one. Once the resource is identified, you can check a couple of things:
      1. Check Moveworks Setup for Ingestion: To see if that resource is ingested. The resource could be from a knowledge system not ingested, or a different repository of a currently ingested knowledge system that Moveworks Copilot doesn’t have access to. If the resource isn’t ingested, then that would explain why it is not serving.
      2. Check Moveworks Setup for ACLs: You can check ACLs as there could be an issue with the Moveworks Copilot ingesting updated ACLs or respecting them. If an issue is found, contact Moveworks Support.
      3. Review the contents of the document: Open up the document and review the content of the doc to align on whether it has the information that answers the question. Sometimes the content is not written clearly enough or the content could be interpreted in different ways. If inspecting this reveals this could be the case, Moveworks suggests to edit the article to make it clearer and then see if it serves when the question is asked (although you should be aware of context when re-testing, explained in more detail above)
  • Did you check the citations panel for citations? Look at other returned docs in citations to see if it is reasonable for one to be in conflict and get used for summarization. If there are two or more competing articles that could answer the question, the Moveworks Copilot may not know which is the “preferred” one. In situations like this it is best to consolidate answers into a single resource that can be cross-linked in other docs in order to avoid this conflict.
  • Is this issue present in Moveworks Classic as well? You can leverage the “Copilot Switch” command and see if the resource returns as expected when using Moveworks Classic. If it does, then it indicates that there is something different with how Moveworks Copilot is interpreting the resources. If it also fails to return, there may be an issue with annotations or relevance, which means Moveworks support needs to get involved. This is especially true if 1(a) is satisfied and the resource has been ingested properly.
  1. How can you determine if the Moveworks Copilot response is a hallucination?
  • You may ask this question in a few different ways:
    • “Why did the Moveworks Copilot return information that is not in our knowledge base?”
    • “Why do I sometimes get responses without citations?”
    • “The Moveworks Copilot gave a response and I don’t know here it came from?”
  • Before getting into the troubleshooting steps, its important to understand that hallucinations are generally difficult to troubleshoot, as there is no clear explanations as to why the Moveworks Copilot hallucinated. However, there is one important characteristic - Moveworks Copilot does not search the open internet.
  • Now, there are a several troubleshooting questions that can guide you towards more clarity within a few minutes:
    • Where no Plugins called? Check the AI Reasoning for the response. If no plugins were called, then the response was likely a hallucination.
    • Are there no citations in the citations panel? If no citations, then this is the tell-tale sign of a hallucination. If there are citations, then the response is likely not a hallucination.

💡

Citations may not be provided in subsequent turns

While no citations in isolation will mean a hallucination, if the user is continuing a previous conversation, or summarizing the previous answer, the Moveworks Copilot may not provide citations the second time around.*

  1. Are all hallucinations bad?
  • Depends on the application. The ability to create content with minimal effort is a very popular and useful application of LLMs. On the other hand, since the model is not using specific references, it may produce text that contains factually accurate information, but it may also mix up unrelated facts it has memorized or even make up facts or elements such as phone numbers, names, places etc, to fill in the gaps in its generated text. This can produce output which is off-topic, self-contradictory, or factually incorrect. Note that, in some cases, the model can produce a faithful reproduction of its training data, such that a well-informed user can attest to its accuracy.
  1. Why is the Moveworks Copilot slow or experiencing latency issues?
  • If you are migrating from Moveworks Classic to Moveworks Copilot, you will see immediately that Moveworks Copilot is slower. The slowness stems from several factors:
    • Generative AI bots are generally slower than those that leverage discriminative AI models/
    • The enterprise guardrails / conversation safeguards that Moveworks has in our Copilot design such as toxicity filters, fact checking models, readability checks, citation checks, etc. increases latency in order to ensure factually accurate responses.
    • Products like ChatGPT and Perplexity use a UI tactic of having their Copilots type out the responses as it generates the text. For some queries, this gives the perception that the answer is faster to generate. Due to limitations with both Slack and Microsoft Teams, Moveworks is unable to mimic this behavior.
  • For Channel Resolver, there is high latency from the user asking the question in the channel to the Moveworks Copilot reaching out via in DM.
    • The vast majority of requests in channel take less than 30 seconds to fulfill. Users who post in the channel are not expecting instantaneous reach outs from the Moveworks Copilot since sometimes the Copilot doesn’t respond at all when it cannot help the user. Furthermore, the Moveworks Copilot will still reach out faster, on average, than an agent monitoring the channel.
  1. Does adding more users create higher latency in Moveworks Copilot?
  • No, adding more users does not increase the latency of the Moveworks Copilot’s responses to users. More users could theoretically increase timeouts, but Moveworks protects against this by ensuring enough GPU capacity and infrastructure to handle large volumes of users. Moveworks has multiple customers with over 100,000 users on the Copilot platform, including one customer with over 500,000 users.
  1. How does the Moveworks Copilot use dates?
  • Sometimes, the Moveworks Copilot was not aware of a future date, or would reference a date that has already passed as the “next” date for something like a holiday (ex., the Copilot responding that the next day is July 4th when today is August 4th).
    • While copilot is aware of today's date, it does not always use it reliably for answers about "next holiday.”
  • There may be other reasons that further cause the Moveworks Copilot to struggle with dates:
    • *Multiple knowledge sources with conflicting dates:** There can be conflicting info within multiple docs, knowledge articles, and FAQs referencing the same dates or holidays. But sometimes those docs are out of date (maybe there is 2023 and 2024 company holidays article ingested), or contain conflicting information. This may confuse the Moveworks Copilot and cause it to reference multiple sources to create a single summary. If those sources are in conflict, the summary may be wrong.
    • Dates formatted in tables: Often, this happens with KBAs that show a collection of dates (like company holidays) in a tabular format. Due to the way HTML table and cell contents render when ingested, the Moveworks Copilot may struggle to properly association columns and rows of the table. It is best to not list collection of dates in tables, but to list out dates where the holiday and day can be clearly associated with each other (ex., “Independence Day, July 4, 2024”)
    • Inconsistent abbreviation: Sometimes happens from the use of abbreviations for months like “Sep.” or “Sept.” for September. These abbreviations may make it difficult for the Moveworks Copilot to understand the exact dates. It is best to use the full name of the month in articles.
  • It is also interesting to note that other copilots experience this same issue and it is not unique to Moveworks Copilot.
  1. How does Moveworks Copilot handle math / calculations?
  • Sometimes the Moveworks Copilot did not calculate an equation correctly, or may struggle to do math. It is important to note that large number calculation is not in the scope of Moveworks Copilot.

Large number calculation is not in the scope of Moveworks Copilot. LLMs are in general not good at calculations and while sometimes it can get by with simple calculations, it is never known to be able to perform precise calculations like a calculator or computer. it's a language model that's always just predicting the next word.

  • Can we do anything to turn off this capability in Moveworks Copilot? We do not advertise a calculation capability and there is no designed support for it. So we can't turn it off. We can experiment with a guideline that asks the Moveworks Copilot to tell users that it does not support doing math problems if they ask, but it will never be 100% reliable.