Learning Processes in Moveworks Copilot

Overview

The Moveworks Copilot enhances user experience through a diverse array of machine learning models, addressing tasks like knowledge search and toxicity detection., and UX upgrades. Despite learning from immediate user feedback, the Moveworks Copilot doesn't adapt its long-term behavior automatically from user interactions, relying instead on systematic improvements by Moveworks Machine Learning (ML) team. To navigate privacy, performance, and quality challenges, Moveworks balances the Copilot's context-aware responses with ongoing efforts to refine its in-context learning capabilities, ensuring sensitive user data is handled securely while exploring safe, user-specific enhancements for a more personalized experience.

How does the Moveworks Copilot continuously learn and improve?

The Moveworks Copilot uses of a suite of ML models to deliver a comprehensive user experience. These models perform a variety of tasks, such as reasoning and planning (e.g. GPT-4o), knowledge search, toxicity detection, language detection, resource translation, and many others.

The Moveworks Copilot is in continuous improvement with regular updates to not just ML models, but also user experience enhancements, architecture advancements and infrastructural improvements. Continious improvement happens mainly via two pathways:

  1. Review of subjective feedback: Moveworks annotation teams and ML engineers regularly review masked and anonymized usage data to identify themes for improvement. In addition, user feedback from MS Teams or Slack, customer feedback from support tickets and Community are also used to identify improvements to specific use cases.
  2. Targeted improvements to improve metrics: We closely monitor metrics such as latency, error rates and response rates, and prioritize investments to improve the platform across all use cases.

The Moveworks Copilot is also capable of temporarily learning from user feedback and applying that feedback immediately—but only for that one user, and only for a limited number of turns of conversation.

Why does the Moveworks Copilot seem to understand my preferences at first, but later it seems like it's forgotten them?

When you interact with Moveworks Copilot, it appears to immediately learn from your inputs because it uses context from your conversation to inform its responses. Conversation history context includes:

  • Messages from you
  • Messages from the Moveworks Copilot
  • Results returned by plugin calls made by the Moveworks Copilot on your behalf
    • Examples include articles retrieved by searches, information about coworkers from people lookups, whether that request for access to software it made for you was successful or not and why, etc
  • Feedback text you submit under “Additional Feedback” after rating a Copilot response with 👍🏼/👎🏼

All this information is part of the input context the Moveworks Copilot uses to generate appropriate and relevant replies. The Moveworks Copilot is capable of following instructions and feedback from across this input context, thanks to the ”in-context learning” capabilities of its models.

However, there are limitations to how much information can be included in this input context:

  1. Context Length Limits: Most models (including LLMs) have limits on how much text they can be provided as input. If the conversation becomes too long, older messages may be omitted to make room for newer ones.
  2. Performance and Latency: Including too much information can slow down the Moveworks Copilot's responses. Large input sizes require more computational resources, leading to longer wait times.
  3. Quality of Responses: Including too much irrelevant information in the input to a model can distract the model from the parts of the input text it should be attending to, decreasing the quality or accuracy of its outputs.

To maintain optimal performance and ensure high-quality interactions, Moveworks Copilot curates and filters the conversation history it uses. This means that earlier preferences or details you shared might not always be included in later interactions if they've fallen outside the input context limit.

What This Means for You

  • If the Moveworks Copilot seems to "forget" a preference, it might be because that information is no longer within the input context it's currently processing.
  • Restating important preferences or information in your ongoing conversation can help the Moveworks Copilot better assist you.

We're Continuously Improving

We understand that having to repeat information can be inconvenient. Balancing the Moveworks Copilot's ability to remember details with performance and privacy considerations is an active area of development. We're working on enhancements to provide a more seamless experience while respecting context limits and ensuring your data remains secure.

Why can’t Moveworks Copilot learn from my conversations in real time?

Modern enterprise AI assistants don't "learn" in the way that many people imagine. When we say an assistant “learns,” it's often misunderstood as implying that its models are continually self-updating their parameters to "learn" over time. In reality, most AI assistants (including ChatGPT, Microsoft Copilot, and Moveworks Copilot) do not perform live, continual training of their LLMs during interactions.

There are several reasons not to do live, continual training of models on user interactions and feedback:

  1. Privacy and Confidentiality Risks: Live training on user interactions runs a risk of leaking sensitive or confidential information—such as personal information, proprietary business information, or trade secrets—into the model’s knowledge.
    1. More detail: If a model is trained on sensitive data inadvertently, it could lead to data leaks from one user to another—including between employees of the same Moveworks customer—or non-compliance with regulations like GDPR or HIPAA.
    2. Example: say a user asks for a summary of a company document, thinks the response is bad, and gives a 👎🏼. Then say they say “try again”, they get a better summary, and they give a 👍🏼. That user might expect Moveworks Copilot models to immediately learn from that feedback—but what if the summary contained information the user did not realize was private to their team, client, department or organization? If the preferred summary was immediately trained into a model, then the information is at risk. Even if the user is trying to train it on “how to handle data like this”, there is a risk of it internalizing the contents of the data as well. This is risky to leave to automation alone, particularly automation that would update model parameters in real time.
  2. Difficulty in Retracting Learned Information: Once private or confidential data is trained into a model, it becomes challenging* to have the model "unlearn" that information.
    1. More detail: The industry standard way to retract learned data is actually to delete the model entirely, and start over with a model checkpoint from before the sensitive example was trained into its weights. Until the model is re-trained without the retracted data, we’d face a choice of two bad options: deploy the older checkpoint immediately and regress the quality of the production model’s behavior, or keep it live and risk exposure of the data that was requested to be extracted. To avoid this scenario, it is far better practice to have human review processes for training data curation to ensure that the wrong data does not get trained into models in the first place.
  3. Increased Security Vulnerability to Data Poisoning: Live training on user inputs increases the attack surface for data poisoning attacks. It is a lot easier for bad actors to manipulate the behavior of models if there are no humans in the loop curating and validating training datasets.
  4. Quality Control Issues: Without proper oversight, live training may incorporate erroneous, biased, or inappropriate content from user inputs. This can degrade the AI assistant's performance, leading to incorrect or unreliable responses that diminish user trust.
  5. Regulatory Compliance Challenges: Many of our customer agreements are subject to strict regulations regarding data handling and retention. Live training can conflict with legal requirements, such as GDPR’s “right to be forgotten”, resulting in compliance violations and potential legal repercussions.
  6. Model Stability and Consistency: Continual, automatic updates to the AI model can introduce unexpected changes in behavior. It is better practice to leave a production model alone until extensive evaluation verifies that a further-trained version of the model is good to deploy, any notable changes are known and can be communicated to stakeholders.
  7. Compute Constraints: Live training demands significant computational resources that are cost prohibitive to any vendor.

Footnote on why unlearning is challenging: the field of machine unlearning*—developing methods to efficiently remove specific data from trained models—is an open research area with no definitive solutions yet (see Cao & Yang, 2015; Bourtoule et al., 2019), making it difficult to guarantee that sensitive information can be effectively and reliably erased once learned without throwing out the model entirely.

How does the Moveworks Copilot learn and persist preferences in the long-term?

There are no automatic long-term changes in behavior that occur solely based on providing feedback or from interactions between the user and the copilot. This means that the Copilot does not automatically adapt its behavior based on previous examples of similar interactions outside of a given user’s recent conversation history.

More generally, we do not have an automated feedback loop in the Moveworks Copilot for long-term learning because of challenges in real-time learning in a safe manner. However, we do review randomly sampled usage data to identify key patterns and make continuous improvements for high frequency issues.

Is Moveworks experimenting with safe methods for automated, realtime, long-term learning from user feedback?

Yes! If you’ve tried ChatGPT’s “Memories” feature, Moveworks is designing something similar for the Moveworks Copilot. Rather than training on user preferences, the approach under development makes user-specific learning a data store, where we add relevant user preference text to the input context for Copilot interactions with that user going forward.