Overview

Data APIs are a set of APIs exposed by Moveworks, allowing you to export all data points related to Moveworks AI Assistant analytics. These APIs can be consumed to build integrations with data lakes like Snowflake, SQL Server, and others, enabling the ingested data to be used for custom reporting and analytics.

This document outlines a process to develop an integration pipeline using a set of tools that can be customized or changed to fit your tech stack. In this exercise, we have used Snowflake to store the data, Workato to build ingestion pipelines & PowerBI to build custom dashboards. This is a prescriptive documentation to guide you on how integrations can be built please feel free to utilize any other tools if they are of the similar functions to our tools

Pre-requisites

Please ensure you have access according to the checklist below and are ready to set up the API integration:

Access to data lake: Ensure you have a role that allows the creation of tables and upsert of data into tables.
Automation capabilities: Ensure you have the ability to create workflows within an automation tool or enable REST API calls through a script.
BI tools permissions: Ensure you have the correct permissions to connect BI tools with your data lake and build metrics on top of the ingested data.

Timeline view

Building tables into data lake

Each API endpoint provides data for a base objects such as conversations, interactions, plugin calls, plugin resources and users. These objects also needs to be replicated in your data lake. For this exercise we are creating our tables in snowflake.

Before we create a new table, we need to create a database in snowflake and a schema which allows us to store different tables

We have created a new data base called Data_API
We have created a PROD schema

Navigate to data bases and click on “+ Database” to add a new database and select the data base to add new schema.

To start creating tables, let’s start a new work sheet and placing the below SQL queries which allows us to build tables. Please ensure to replace the Database name & Schema name accordingly. Place the below SQL queries into a worksheet and start executing the SQL queries.

We need to create a table for each REST endpoints this way we will be able to store data individually. Refer to the sample SQL queries which allows you to create this tables. Let's consider the following names for each endpoint.

/conversations - CONVERSATIONS
/interactions - INTERACTIONS
/plugin-calls - PLUGIN_CALLS
/plugin-resources - PLUGIN_RESOURCES
/users - USERS

Conversations

create or replace TABLE DATA_API.PROD.CONVERSATIONS (
	LAST_UPDATED_AT VARCHAR(16777216) COMMENT 'Last updated timestamp of the record',
	ID VARCHAR(16777216) COMMENT 'Id identifying the unique conversation',
	USER_ID VARCHAR(16777216) COMMENT 'Unique id of the user',
	CREATED_AT VARCHAR(16777216) COMMENT 'The timestamp of the first interaction in the conversation',
	ROUTE VARCHAR(16777216) COMMENT 'Route/origin of the conversation - DM, Ticket, Channel, Notification',
	PRIMARY_DOMAIN VARCHAR(16777216) COMMENT 'The main detected domain in the conversation',
);

Interactions

create or replace TABLE DATA_API.PROD.INTERACTIONS (
	LAST_UPDATED_AT VARCHAR(16777216) COMMENT 'Last updated timestamp of the record',
	CONVERSATION_ID VARCHAR(16777216) COMMENT 'Id identifying the unique conversation',
	USER_ID VARCHAR(16777216) COMMENT 'Unique id of the user',
	ID VARCHAR(16777216) COMMENT 'Id identifying the unique interaction',
	CREATED_AT VARCHAR(16777216) COMMENT 'Timestamp of the interaction',
	PLATFORM VARCHAR(16777216) COMMENT 'Platform of the interaction - Slack/msteams etc.',
	TYPE VARCHAR(16777216) COMMENT 'Type of the interaction - INTERACTION_TYPE_UTTERANCE, INTERACTION_TYPE_BOT_MESSAGE etc.',
	LABEL VARCHAR(16777216) COMMENT 'LABEL for the interaction type UI form submission'
	PARENT_INTERACTION_ID VARCHAR(16777216) COMMENT 'Parent interaction id for UI Action link click and button click interactions',
	DETAILS VARIANT COMMENT 'Details of the interaction (Json col)- content, type, domain, detail, entity etc.',
	ACTOR VARCHAR(16777216) COMMENT 'Actor - User/Bot'
);

Plugin calls

create or replace TABLE DATA_API.PROD.PLUGIN_CALLS (
	LAST_UPDATED_AT VARCHAR(16777216) COMMENT 'Last updated timestamp of the record',
	CONVERSATION_ID VARCHAR(16777216) COMMENT 'Id identifying the unique conversation related to the plugin call',
	USER_ID VARCHAR(16777216) COMMENT 'Unique id of the user',
	INTERACTION_ID VARCHAR(16777216) COMMENT 'Id identifying the unique interaction related to the plugin call',
	ID VARCHAR(16777216) COMMENT 'Unique Id identifying the plugin call',
	CREATED_AT VARCHAR(16777216) COMMENT 'Timestamp of the plugin call',
	PLUGIN_UPDATE_TIME VARCHAR(16777216) COMMENT 'Timestamp of the latest plugin call update',
	PLUGIN_NAME VARCHAR(16777216) COMMENT 'Name of the executed Plugin - Product display name for the Native plugin and Config name for the Custom plugin',
	PLUGIN_STATUS VARCHAR(16777216) COMMENT 'Semantic Plugin status for each of the executed Plugin call',
	SERVED BOOLEAN COMMENT 'Whether or not plugin is served to the user - Plugin served or plugin waiting for user input',
	USED BOOLEAN COMMENT 'Whether or not plugin is successfully executed'
);

Plugin resources

create or replace TABLE DATA_API.PROD.PLUGIN_RESOURCES (
	ID VARCHAR(16777216) COMMENT 'Plugin resource item id as the hash of interaction_id, resource_id and resource_type',
	CONVERSATION_ID VARCHAR(16777216) COMMENT 'conversation_id identifying unique conversation (series_id)',
	INTERACTION_ID VARCHAR(16777216) COMMENT 'interaction_id identifying interaction corresponding to the resource being used',
	PLUGIN_CALL_ID VARCHAR(16777216) COMMENT 'Id for the plugin call',
	USER_ID VARCHAR(16777216) COMMENT 'Id for the user record',
	TYPE VARCHAR(16777216) COMMENT 'Type of Resource - FILE, FORM, KNOWLEDGE, Ticket etc.',
	RESOURCE_ID VARCHAR(16777216) COMMENT 'Resource ID - KB ID, TICKET_ID, etc.',
	DETAILS VARIANT COMMENT 'Details of resources searched in the plugin - e.g., json of {name, url, domain, title, snippet}',
	CITED BOOLEAN COMMENT 'If the resource is cited in the AI Assistant response',
	LAST_UPDATED_AT VARCHAR(16777216) COMMENT 'Last updated timestamp of the record',
	CREATED_AT VARCHAR(16777216) COMMENT 'timestamp of plugin call'
);

Users

create or replace TABLE DATA_API.PROD.USERS (
	ID VARCHAR(16777216) COMMENT 'Unique id of the user',
	FIRST_NAME VARCHAR(16777216) COMMENT 'First name of the user',
	LAST_NAME VARCHAR(16777216) COMMENT 'Last name of the user',
	EMAIL_ADDR VARCHAR(16777216) COMMENT 'Email address of the user',
	USER_PREFERRED_LANGUAGE VARCHAR(16777216) COMMENT 'User preferred language configured in AI Assistant',
	HAS_ACCESS_TO_BOT BOOLEAN COMMENT 'Whether or not the user has access to AI Assistant',
	EXTERNAL_IDS VARCHAR(16777216) COMMENT 'All external system identities from which user data is ingested',
	LAST_UPDATED_AT VARCHAR(16777216) COMMENT 'Timestamp at which user was last updated in Moveworks',
	FIRST_INTERACTION_AT VARCHAR(16777216) COMMENT 'Timestamp at which the users first interaction was recorded',
	LAST_INTERACTION_AT VARCHAR(16777216) COMMENT 'Timestamp at which the users latest interaction was recorded',
);

Once you have successfully executed these queries you should see all the tables created under your schema. Congratulations! 🎉 on completing your first step towards setting up the integration.

Building integration pipelines

Once the tables are created in your data lake, we need to upsert these tables with data so insights can be build on top of that. Building the pipelines is subject to your tech stack. In this exercise we have used workato to build data pipelines so we can explain visually on how to ingest the data you can also utilize the traditional method to write code and connect it programmatically with your data lake.

In workato, each endpoint needs to have a separate workflow to ingest the data. This can allow you to ingest data parallely and handle data ingestion per endpoint.

There are 2 types of ingestion jobs required

Historical ingestion : This allows you to fetch historical data till date the API is available for you
Daily ingestion : This allows you to fetch new data on a daily basis and keep your metrics updated.

We will use workato recipes to build this ingestion jobs. Start by setting up the historical ingestion job which can be replicated to create a daily ingestion job.

You can start creating a new recipe by clicking on the “Create” button and provide the recipe name as “Data API conversations workflow”.

Historical ingestion job

Select the trigger point as “Trigger from app” while creating the historical ingestion job.

To start the ingestion we need to make the first API call for the data that is being provided by the API. The data available to you will be from the current date (when you are building this integration) till Jan 1st (Data till Aug 1st will be available at a later point of time).

To backfill your data, begin by fetching responses where the last updated time is less than the current date (This would be the Data on which you start the Data ingestion pipeline) and continue backwards until all available data is obtained from the API. Note that in the ODATA API, the Odata.nextlink attribute will be absent from the response when there is no more data to retrieve.

The API URL will be structured in the below format while adding in the HTTP application in workato

https://api.moveworks.ai/export/v1beta2/records/conversations?$filter=last_updated_at le '{current_timestamp}'&$orderby = created_time desc

In workato, select the HTTP app and setup the connection with Data API. Once this is done setup the API call so all the field can be read through the API call.

Screenshot 2025-02-25 at 3.23.34 PM.png — Setting up the Connection with Data API

Once the response schema is defined and identified in workato we need to also setup snowflake connection and map this data from the API to snowflake tables.

Select the snowflake upsert action and setup the snowflake connection
Once the connection is setup select the appropriate table. In this case we will select the conversations table ‘DATA_API_COPILOT_CONVERSATIONS’
Map each field from the API response to the table columns.

Screenshot 2025-02-25 at 4.13.07 PM.png — Setting up connection to snowflake

To start mapping the table field from the API name, let's go through an example for conversations table. The field names used in the previous step correspond to the API name received from the API response.

Column name in snowflake	Corresponding API response field
ID	id
USER_ID	user_id
PRIMARY_DOMAIN	primary_domain
USER_ID	user_id
ROUTE	route
CREATED_TIME	created_time
LAST_UPDATED_TIME	last_updated_time

After the above step the base components are setup for us to create a pipeline to complete the historical ingestion you can use the IF statement and REPEAT WHILE loop to go over all API responses.

Ensure to handle rate limiting and other error codes while building the connection to the API. We also need to add retries if the API is has failed (other than 200 OK response). The current rate limit on API is around 60 request per minute. This proves to be sufficient enough to handle large volume of data through an Endpoint.

Thus, your historical data pipeline is setup. Start this to backfill all historical data. Let’s now setup the daily ingestion pipeline in a similar fashion.

Daily ingestion pipeline

The daily ingestion pipeline behaves very similar to the historical pipeline in fetching the data and storing it in snowflake. The primary difference here is to fetch data on a daily basis. Data API SLA mentions 48 hours of time before the new data is available read more about this here.

To setup a daily ingestion pipeline in workato, you can setup the trigger to “Run on a schedule”

To call the API on a daily basis we need to fetch the data by the last updated time field. This allows you to fetch the recent data as well as update the historical records with new information.

Append the API URL with $filter query parameter and use the below condition to fetch records from a certain date.

https://api.moveworks.ai/export/v1beta2/records/conversations?$filter=last_updated_time eq '2025-02-17T05:09:09.028Z' (Replace the data with the current date)

The other components of the pipeline still remain the same. Once the API calls have been modified your daily ingestion job is also created. Run this on a daily basis to fetch new data into your data lake.

The process we followed above was for conversations endpoint. Data API also exposes 4 more endpoints such as Interactions, Plugin calls, Plugin resources & Users. Replicate the same process for them and ingest data into your tables.

Connecting BI tools with data lake

Integrating your BI tool with data lake is the final step of the integration. In this exercise we are using PowerBI along with Snowflake to build metrics on top of the ingested data. All of the standard BI tools do support direct connectors with Snowflake or any other SQL based data storage system.

In order to connect PowerBI with snowflake we will be selecting the snowflake connector available in PowerBI and setup a live connection with the database.

Add the appropriate credentials while setting up the connector. Please take a note of our auth credentials and ensure the account has access to fetch data from the data lake and import into PowerBI.

Once the Data is imported the process to build visualizations is pretty simple, you just need to select the type of metric & visualization and power it from a attribute from the ingested data.

To go through different ideas on what can be built using the ingested data, read more here.