Continuous Learning

Overview

The Moveworks Assistant enhances user experience through a diverse array of machine learning models, addressing tasks like knowledge search and toxicity detection., and UX upgrades. Despite learning from immediate user feedback, the Moveworks Assistant doesn't adapt its long-term behavior automatically from user interactions, relying instead on systematic improvements by Moveworks Machine Learning (ML) team. To navigate privacy, performance, and quality challenges, Moveworks balances the Assistant's context-aware responses with ongoing efforts to refine its in-context learning capabilities, ensuring sensitive user data is handled securely while exploring safe, user-specific enhancements for a more personalized experience.

How does the Moveworks Assistant continuously learn and improve?

The Moveworks Assistant uses of a suite of ML models to deliver a comprehensive user experience. These models perform a variety of tasks, such as reasoning and planning (e.g. GPT-4o), knowledge search, toxicity detection, language detection, resource translation, and many others.

The Moveworks Assistant is in continuous improvement with regular updates to not just ML models, but also user experience enhancements, architecture advancements and infrastructural improvements. Continious improvement happens mainly via two pathways:

Review of subjective feedback: Moveworks annotation teams and ML engineers regularly review masked and anonymized usage data to identify themes for improvement. In addition, user feedback from MS Teams or Slack, customer feedback from support tickets and Community are also used to identify improvements to specific use cases.
Targeted improvements to improve metrics: We closely monitor metrics such as latency, error rates and response rates, and prioritize investments to improve the platform across all use cases.

The Moveworks Assistant is also capable of temporarily learning from user feedback and applying that feedback immediately—but only for that one user, and only for a limited number of turns of conversation.

What techniques does Moveworks use to make targeted improvements?

Moveworks employs a variety of techniques to improve the performance of ML systems:

User simulation to operate the system end-to-end as an actual user and observe overall performance
- User behavior model
- Goal-oriented simulation
Synthetic environment studies to recreate environments and observe the behavior of tools and the overall system in diverse conditions
- Tools Payload Simulation
- Stateful System Simulation
Model based evaluation to accelerate and automate performance assessment, meant to augment the expert evaluations of human annotators
- Evaluation Guidelines and Prompts
- Calibrating LLM as a judge/juror

Why does the Moveworks Assistant seem to understand my preferences at first, but later it seems like it's forgotten them?

When you interact with Moveworks Assistant, it appears to immediately learn from your inputs because it uses context from your conversation to inform its responses. Conversation history context includes:

Messages from you
Messages from the Moveworks Assistant
Results returned by plugin calls made by the Moveworks Assistant on your behalf
- Examples include articles retrieved by searches, information about coworkers from people lookups, whether that request for access to software it made for you was successful or not and why, etc
Feedback text you submit under “Additional Feedback” after rating a Assistant response with 👍🏼/👎🏼

All this information is part of the input context the Moveworks Assistant uses to generate appropriate and relevant replies. The Moveworks Assistant is capable of following instructions and feedback from across this input context, thanks to the ”in-context learning” capabilities of its models.

However, there are limitations to how much information can be included in this input context:

Context Length Limits: Most models (including LLMs) have limits on how much text they can be provided as input. If the conversation becomes too long, older messages may be omitted to make room for newer ones.
Performance and Latency: Including too much information can slow down the Moveworks Assistant's responses. Large input sizes require more computational resources, leading to longer wait times.
Quality of Responses: Including too much irrelevant information in the input to a model can distract the model from the parts of the input text it should be attending to, decreasing the quality or accuracy of its outputs.

To maintain optimal performance and ensure high-quality interactions, Moveworks Assistant curates and filters the conversation history it uses. This means that earlier preferences or details you shared might not always be included in later interactions if they've fallen outside the input context limit.

What This Means for You

If the Moveworks Assistant seems to "forget" a preference, it might be because that information is no longer within the input context it's currently processing.
Restating important preferences or information in your ongoing conversation can help the Moveworks Assistant better assist you.

We're Continuously Improving

We understand that having to repeat information can be inconvenient. Balancing the Moveworks Assistant's ability to remember details with performance and privacy considerations is an active area of development. We're working on enhancements to provide a more seamless experience while respecting context limits and ensuring your data remains secure.

Why can’t Moveworks Assistant learn from my conversations in real time?

Modern enterprise AI assistants don't "learn" in the way that many people imagine. When we say an assistant “learns,” it's often misunderstood as implying that its models are continually self-updating their parameters to "learn" over time. In reality, most AI assistants (including ChatGPT, Microsoft Assistant, and Moveworks Assistant) do not perform live, continual training of their LLMs during interactions.

There are several reasons not to do live, continual training of models on user interactions and feedback:

Privacy and Confidentiality Risks: Live training on user interactions runs a risk of leaking sensitive or confidential information—such as personal information, proprietary business information, or trade secrets—into the model’s knowledge.
1. More detail: If a model is trained on sensitive data inadvertently, it could lead to data leaks from one user to another—including between employees of the same Moveworks customer—or non-compliance with regulations like GDPR or HIPAA.
2. Example: say a user asks for a summary of a company document, thinks the response is bad, and gives a 👎🏼. Then say they say “try again”, they get a better summary, and they give a 👍🏼. That user might expect Moveworks Assistant models to immediately learn from that feedback—but what if the summary contained information the user did not realize was private to their team, client, department or organization? If the preferred summary was immediately trained into a model, then the information is at risk. Even if the user is trying to train it on “how to handle data like this”, there is a risk of it internalizing the contents of the data as well. This is risky to leave to automation alone, particularly automation that would update model parameters in real time.
Difficulty in Retracting Learned Information: Once private or confidential data is trained into a model, it becomes challenging* to have the model "unlearn" that information.
1. More detail: The industry standard way to retract learned data is actually to delete the model entirely, and start over with a model checkpoint from before the sensitive example was trained into its weights. Until the model is re-trained without the retracted data, we’d face a choice of two bad options: deploy the older checkpoint immediately and regress the quality of the production model’s behavior, or keep it live and risk exposure of the data that was requested to be extracted. To avoid this scenario, it is far better practice to have human review processes for training data curation to ensure that the wrong data does not get trained into models in the first place.
Increased Security Vulnerability to Data Poisoning: Live training on user inputs increases the attack surface for data poisoning attacks. It is a lot easier for bad actors to manipulate the behavior of models if there are no humans in the loop curating and validating training datasets.
Quality Control Issues: Without proper oversight, live training may incorporate erroneous, biased, or inappropriate content from user inputs. This can degrade the AI assistant's performance, leading to incorrect or unreliable responses that diminish user trust.
Regulatory Compliance Challenges: Many of our customer agreements are subject to strict regulations regarding data handling and retention. Live training can conflict with legal requirements, such as GDPR’s “right to be forgotten”, resulting in compliance violations and potential legal repercussions.
Model Stability and Consistency: Continual, automatic updates to the AI model can introduce unexpected changes in behavior. It is better practice to leave a production model alone until extensive evaluation verifies that a further-trained version of the model is good to deploy, any notable changes are known and can be communicated to stakeholders.
Compute Constraints: Live training demands significant computational resources that are cost prohibitive to any vendor.

Footnote on why unlearning is challenging: the field ofmachine unlearning*—developing methods to efficiently remove specific data from trained models—is an open research area with no definitive solutions yet (see Cao & Yang, 2015; Bourtoule et al., 2019), making it difficult to guarantee that sensitive information can be effectively and reliably erased once learned without throwing out the model entirely.

How does the Moveworks Assistant learn and persist preferences in the long-term?

There are no automatic long-term changes in behavior that occur solely based on providing feedback or from interactions between the user and the Assistant. This means that the Assistant does not automatically adapt its behavior based on previous examples of similar interactions outside of a given user’s recent conversation history.

More generally, we do not have an automated feedback loop in the Moveworks Assistant for long-term learning because of challenges in real-time learning in a safe manner. However, we do review randomly sampled usage data to identify key patterns and make continuous improvements for high frequency issues.

Is Moveworks experimenting with safe methods for automated, realtime, long-term learning from user feedback?

Yes! If you’ve tried ChatGPT’s “Memories” feature, Moveworks is designing something similar for the Moveworks Assistant. Rather than training on user preferences, the approach under development makes user-specific learning a data store, where we add relevant user preference text to the input context for Assistant interactions with that user going forward.

Updated about 1 month ago