Triage

📘

This is applicable for Moveworks AI Assistant and Moveworks Classic

Triage is a backend service that works independent of the frontend experience

Overview

For issues that cannot be resolved instantly, Moveworks Triage can categorize and route these tickets to ensure that they reach the service desk team best equipped to resolve the issue. Triage intercepts newly created tickets that passes a filter, configured by the customer. Then, Triage will attempt to classify the configured ticket fields, e.g. such as assignment group, category, subcategory, cmdb_ci, etc by retrieving the top 10 most similar tickets and select the most common field value (assignment group = WW Manufacturing). If the prediction confidence level passes the threshold, Triage will automatically update the ticket - all within 1 minute of the ticket being created!

By doing this, Triage can improve the rate at which tickets are resolved by making the categorization of tickets autonomous and reduce amount of tickets that get categorized incorrectly

How does Triage work?

Intercept

Triage intercepts and analyzes new tickets coming into your ticketing system system. Triage is configured to intercept tickets that meet a specific filter criteria, e.g. the initial assignment group. Once a newly created ticket passes the criteria, Triage analyzes it to attempt to make a prediction on the ticket fields.

📘

Triage works best when intercepted tickets created from digital channels

Note: Triage only intercepts tickets coming from the self service portal, email, or the AI Assistant. Phone or walk-up tickets are not eligible for Triage.

Predict

Triage will use a customer-scoped, ticket field specific index to analyze the ticket data to determine if it can make a prediction. The Triage confidence must equal or exceed the set confidence threshold in order for it to make changes to the ticket. For example, if you’ve set the confidence threshold to 67%, the bot will only make updates to the ticket if it is at least 67% confident in its prediction.

If Triage is not confident enough in its predictions it will leave the ticket as is and let the service desk agent update the ticket fields.

Read more about the technology: https://help.moveworks.com/docs/triage-2-pt-0-technology-overview#/

Update

If Triage is confident in its predictions, then it will automatically update each of the ticket fields it has been trained to predict. If Triage has been trained on assignment group and category models, meaning it has learned from two different datasets, then the model will be able to update both of those fields simultaneously.

The most common fields Triage can predict are:

  • assignment group
  • category
  • subcategory
  • cmdb_ci
  • component (Jira)
  • hr service

How does Triage make predictions?

In order for Triage to make predictions, Triage requires historical ticketing data, which includes all fields with populated values. With this data, Moveworks will build a customer-owned, ticket field specific index.

🚧

Good Data In = Great Data Out

Note: Because Triage is reliant on the data its model is trained on, it’s important to provide the best data available to build an index

Building an Index

Store historical tickets as embeddings into an index. When a newly created ticket is analyzed, Triage converts it to embeddings and searches for the most relevant historical tickets - Embedding Nearest Neighbor Search - in the index.

Embedding Nearest Neighbor Search

Embedding Nearest Neighbor Search first converts queries and documents into dense vector embeddings that capture their semantic meaning. It then measures distance via cosine similarity between the query embedding and each document embedding to identify the closest matches. In Triage, we establish a threshold so that only tickets above a cosine similarity are considered. This approach enables Triage to surface semantically relevant tickets even when there are no shared keywords and select the right ticket field value, e.g. assignment group = WW Manufacturing.

Triage Performance

After the Triage model has been trained, it’s important to establish a framework for monitoring performance. The evaluation of Triage's performance involves the assessment of two primary metrics:

Coverage

In essence, coverage refers to the proportion of tickets on which Triage has made predictions compared to the total number of tickets received.

For instance, let us consider the example of Company.com, which experienced the influx of 1000 tickets within the previous month. Out of these 1000 tickets, Triage made predictions for 300 of them. This implies that Triage covered 300 out of 1000 tickets, indicating a coverage rate of 30% for Company.com's tickets.

Mathematically, coverage (C) can be calculated as:

C = Number of tickets covered / Total number of tickets = 300 / 1000 = 0.30

It is important to note that at this point, we have solely focused on Triage's prediction coverage and have not yet evaluated its accuracy.

Precision

Precision refers to the percentage of correct predictions made by Triage among all the tickets on which predictions were made.

How do we define "correct" or "incorrect" predictions?

To establish the criteria for determining whether a prediction is correct or incorrect, we consider the following:

  • A prediction is considered correct if Triage predicts that the assignment group for a ticket, such as INC12345, is "VPN Helpdesk," and the ticket is indeed closed or resolved within the "VPN Helpdesk" group. In this case, our prediction is accurate.
    • Prediction: Ticket closed/resolved in "VPN Helpdesk"
  • On the other hand, a prediction is considered incorrect if Triage predicts that the assignment group for a ticket, such as INC12345, is "VPN Helpdesk," but the ticket is actually closed or resolved within the "SAP Helpdesk" group. In this scenario, our prediction is deemed inaccurate.
    • Prediction: Ticket closed/resolved ≠ "VPN Helpdesk"

For example, let's consider Company.com, which received a total of 1000 tickets in the past month. Out of these 1000 tickets, Triage made predictions for 300 of them. Thus, we can infer that Triage "covered" 300 out of 1000 tickets, representing a coverage rate of 30% for [Company.com]'s tickets.

Furthermore, out of the 300 tickets covered, Triage's predictions were correct 150 times, indicating a precision rate of 50%.

P = 150/300 = 0.50

How to enable Triage?

While Moveworks Triage offers powerful capabilities, it is not a one-size-fits-all solution. To determine whether your organization’s system is eligible for Triage, please reach out to your Customer Success team to collect necessary information to assess whether this skill is suitable for your specific environment and needs.

Once the relevant information is collected, Moveworks Prof Serv will ingest your ticketing data to build indicies tailored to your specific requirements. This build process ensures that Triage is optimized to provide accurate predictions and effective ticket categorization based on the unique characteristics and patterns within your organization’s ticketing system.

By following this careful evaluation and training process, Moveworks ensures that Triage is deployed appropriately and delivers the desired benefits.

Configuration options

Confidence Threshold

Adjusting the Confidence Threshold allows for customizing the level of certainty required for Triage to predict and route tickets. However, it's important to consider the tradeoff between coverage and precision when modifying this threshold.

Lowering the Confidence Threshold increases coverage by enabling the bot to make predictions and route a larger number of tickets. However, this can potentially decrease the accuracy of the bot's predictions, leading to more incorrect assignments.

On the other hand, raising the Confidence Threshold improves the precision of the bot's predictions. By setting a higher threshold, the bot only predicts when it is more confident, resulting in more accurate routing and reduced misclassification. However, this may reduce coverage, as fewer tickets meet the threshold for automatic routing.

Blocklist and Allowlist

Moveworks provides the flexibility to implement field blocklists and allowlists, enabling selective prediction capabilities for Triage.

Blocklist

The blocklist feature allows the specification of certain fields for which Triage will abstain from making predictions. By including fields in the blocklist, organizations can define which fields should not be subject to prediction by the bot.

Allowlist

Conversely, the allowlist feature allows organizations to identify a specific set of fields for which Triage is exclusively permitted to make predictions. Any field not included in the allowlist will be restricted from receiving predictions.

AutoML

AutoML is an optional configuration step in the process that enables automated retrains every month.

Automatic Retrains

At the beginning of each month, AutoML will check to see if the minimum ticket volume threshold of at least 5,000 tickets polled (e.g. passes the triage ticket filter) has been met since the launch of Triage.

If so, then AutoML aggregates all polled tickets that passes the triage ticket filter, up to the last 12 months, and retrains a new model. To validate the new model, AutoML samples 10% of the polled tickets and analyzes the model prediction between the current and new model in a "side-by-side" evaluation.

If there is an improvement in key Triage metrics, e.g. Precision and Coverage, the new model is promoted into production. This entire process is done without customer intervention.

Manual Retrain

For organizations with smaller ticket volumes below the minimum ticket volume threshold, Moveworks can manually complete the retrain as aforementioned. Given the work required, Moveworks only recommends doing this 6 months after launch.

Metadata Cache Service

The Metadata Cache Service is a crucial component that prevents updating tickets with retired / outdated sys_ids or ensures that the same sys_id is used, despite a display name change. At its core, it's a key-value lookup store, powered by DynamoDB, designed to quickly retrieve essential information. This is only configurable for ServiceNOW.

What Information Does it Store?

This service stores vital metadata for various fields, including:

  • Field name: Such as assignment_group or cmdb_ci.
  • System ID (sys_id): A unique identifier of the record.
  • Display name: The user-friendly name associated with the sys_id, e.g. the display value in ServiceNOW.

The cache can hold a maximum of 1 million records. By default, it retrieves data in batches of 100,000 entries, a setting that can be adjusted if needed. The data within the cache is refreshed once every day to ensure it's up-to-date.

Understanding Potential Issues

The Metadata Cache Service is designed to emit errors when certain criteria are met:

  • Exceeding Record Limit: If the number of records being pulled into the cache goes above 1 million, the data ingestion process will fail, and you'll receive a notification.
  • Missing System ID: If a sys_id does not exist in the cache when the Triage system attempts to update a ticket, the ticket will not be updated, and an error will be logged.

FAQ

Q: Which ITSM does Triage support?

A: Triage is supported by all ITSM integrations. See our integration documentation for the full list of supported ITSM integrations.

Q: What does Triage use to make predictions?

A: By default, Triage uses description and short description as the inputs for building the index and making predictions. Based on your organization's business processes, additional metadata fields such as location can be also taken into account.

Q: How long does it take for Triage to route tickets?

A: Triage can route tickets at a minimum of 30 seconds, and a maximum of 1 minute. Triage can also be configured to route the ticket on the initial creation of the ticket itself. This means when the ticket is created, it will have the correct values on creation.

Q: How many tickets does it take to train a Triage model?

A: Depends on the ticket distribution; however, it can be as low as 1,000 tickets although not strongly recommended. A minimum ticket volume is 2,000.

Q: How does the model “learn”?

A: Triage updates the index for continuous learning and improvement. To continue enhancing its performance, it's important for Service Desk Agents to reroute any incorrectly predicted tickets. When Triage makes incorrect predictions and tickets are rerouted by agents, this feedback is automatically collected and used for future retraining. Retraining the model with new data helps it "relearn" and improve its accuracy for future predictions.