LLMs & SLMs

An autonomous and intelligent employee support system needs to solve a variety of problems to deliver impact. The Moveworks AI Assistant brings together a collection of diverse large and small machine learning models - each carefully evaluated and selected for the specific problem it's meant to solve. Some examples of the models we use for these tasks are:

Foundation Models - GPT 4.1
- Used for reasoning and action planning based on user utterance
- Execution of “plugins” to fulfill user query
- Summarize results for the user
Task specific discriminative models
- FT-LangDetection - low latency detection of user language
- FT-Roberta - handoff classification, entity recognition
Task specific generative models

FT-Flan-T5 - toxicity judgement for both user input and bot output
FT-Flan-T5 - file search relevance judgement
FT-M2M100 - translation of resources (KBs, forms, etc)

And that's just the tip of the iceberg.

Foundation models vs Task-specific Models

Foundation Models have revolutionized the field of AI over the last few years. They are

Largest models in the world - by training data and model parameters
Generalized to perform any language task - from coding, reasoning, to summarization

As a result, they are extremely capable, but have higher latency, cost and are not always the most reliable models to use for very specific tasks that require a narrow but highly precise focus.

Task specific models, on the other hand, are:

Smaller models that are cheaper, faster, and more controllable
Trained to perform specific language tasks like entity recognition, coding, etc

Task specific models perform subset of tasks performed by foundation models - albeit at higher quality, speed, or lower cost. Therefore, you can see why you might want to pick the most appropriate model for the task at hand.

Evaluating and selecting models

Moveworks relies on a robust and rigorous evaluation process for models to drive continuous innovation.

Leverage comprehensive Evaluation Datasets
1. Curated to cover a broad range of AI Assistant use cases.
2. Guarantees model's effectiveness across expected scenarios and identifies areas of improvement
Diverse types of evaluations to test the performance in a variety of ways to make sure the overall performance is constantly improving
1. End-to-End Evaluation: Tests the overall experience from start to finish.
2. Component Evaluation: Focuses on specific parts like plugin filtering, selection, and argument filling.
3. Human Annotator Evaluation: Involves human annotators to review outputs and interactions, providing nuanced insights and enhanced confidence in evaluation results.
Prompt Tuning for LLM optimization
1. Employed to address any observed degradations in evaluation and to keep improving the copilot experience.
2. Our infrastructure allows for extensive prompt tuning experiments to refine and enhance interaction quality.
  1. In-bot Testing with New Prompts: Immediate, real-time feedback on adjusted prompts.
  2. Large-Volume Evaluation with Comprehensive Datasets: Ensures robust validation across a wide array of scenarios and datasets, affirming improvements and pinpointing further optimization opportunities.

Redundancy

At Moveworks, redundancy of AI models is key to scaling our platform. To do this, we have built an LLM Gateway, a single egress point that dynamically routes requests from other internal Moveworks services to the best machine learning model for the task, which could be one of our many fine-tuned Machine Learning (ML) models or a 3rd party Large Language Models (LLMs) through vendors such as Azure Open AI and Open AI.

The LLM Gateway has intelligent provider routing enabled, so that it can appropriately route a request to the proper endpoint. This means it can prioritize requests based on latency requirements, regionality / data-residency requirements, and even auto-recover in case of an unavailable or throttled 3rd party endpoint, ensuring there is not a single point of failure in the Moveworks product from a model availability standpoint.

Updated 29 days ago