> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://help.moveworks.com/llms.txt.
> For full documentation content, see https://help.moveworks.com/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://help.moveworks.com/_mcp/server.

# Operational Guide

> Sync pattern, capacity planning, rate limits, file size, and other operational details for running a Moveworks Content Gateway in production

## Overview

How Moveworks actually calls your gateway in production: the sync cadence, what gets re-fetched vs. cached, how to bound your load via rate-limit signals, and the file size cap. Read this before sizing your gateway's hosting environment or wiring up rate-limit headers.

***

## How do I throttle Moveworks' calls to my gateway?

Moveworks honors two complementary rate-limit mechanisms. Use either or both:

* **Proactive: rate-limit headers.** Return `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `X-RateLimit-Reset` on every response. Moveworks reads them and adjusts its call rate to fit your advertised capacity, slowing down before you have to fail anything. Common header-name variants (`X-Rate-Limit-*`, `RateLimit-*` per RFC 9456) are also recognized.
* **Reactive: 429 + Retry-After.** When you're at capacity, return `429 Too Many Requests` with a `Retry-After` header. Moveworks honors the wait value and retries.

Returning fewer items per response than the requested `$top` (with `@odata.nextLink` for the rest) is also a clean way to bound per-request work. Useful when your backend can't sustain large response payloads.

See [Errors](/api-reference/content-gateway/errors) for the expected error response format.

***

## What does the sync pattern look like, and how should I size my gateway?

Moveworks performs scheduled full sync runs against your gateway. Each run walks the complete inventory of files, file metadata, file permissions, groups, group memberships, and users. There isn't an "incremental diff" mode that only fetches changes since the previous run.

The main cost savings between syncs come from the **file binary cache**. If you return an accurate `last_modified_datetime` on every file, Moveworks skips re-downloading binaries whose timestamp hasn't changed since the previous sync. File metadata, permissions, group memberships, and users are re-walked each run, but the binary downloads (typically the largest payloads) are skipped for unchanged files.

For capacity planning, here is what gets called on every sync versus what gets skipped via the file binary cache:

| Endpoint                                          | Re-called every sync?                                              |
| ------------------------------------------------- | ------------------------------------------------------------------ |
| `GET /files` (list)                               | Yes                                                                |
| `GET /files/{id}` (metadata, including HTML body) | Yes                                                                |
| `GET /files/{id}/download` (binary)               | **Skipped when `last_modified_datetime` matches the cached value** |
| `GET /files/{id}/permissions`                     | Yes                                                                |
| `GET /files/permissions/metadata`                 | Yes                                                                |
| `GET /groups`                                     | Yes                                                                |
| `GET /groups/{groupId}/members`                   | Yes                                                                |
| `GET /users`                                      | Yes                                                                |

Concurrent scheduled syncs are **skipped, not stacked**: if a sync is still running when the next scheduled run would fire, the next run is skipped until the current one completes. You won't see overlapping load even if first sync exceeds your scheduled cadence.

If your backend has limited capacity, use the rate-limit mechanisms above to bound the call rate. Total wall-clock time per sync extends to fit your advertised capacity; calls per second stay within what you allow.

***

## Is there a maximum file size?

Yes. Moveworks caps individual file binary content at **25 MB**. Files larger than this are downloaded by Moveworks, then rejected by the indexing pipeline with status `FILE_SIZE_LIMIT_EXCEEDED`. **They are not indexed and will not appear in search results.** The error is non-retryable; Moveworks does not retry oversize files on subsequent syncs.

A few implications worth being deliberate about:

* **Files over 25 MB consume bandwidth on both sides every sync.** They are downloaded in full before the size check happens. If you have a known set of oversize files, filter them out at the source (don't include them in `/files`) rather than letting Moveworks re-download them each cycle.
* **Returning an accurate `content.size` field on file metadata is recommended.** Moveworks doesn't currently pre-check size before downloading, but a future optimization will, and accurate `size` lets that short-circuit work.
* The cap applies to binary file content (`/files/{id}/download` payload). HTML body returned inline via `content.body` is not subject to the same numeric cap, though oversized HTML payloads will still hit response-time limits.

***

## Why is `last_modified_datetime` important on file responses?

It's the cache fingerprint Moveworks uses to decide whether to re-download a file's binary content. When the timestamp on a file matches what was returned in a previous sync, the `/files/{id}/download` call is skipped and the cached binary is reused. This is the primary mechanism that makes subsequent syncs cheaper than first sync.

Returning an accurate, monotonically-updated timestamp on every file is the single most effective way to reduce ingestion load over time, especially for corpora with large attachments. Files with stale, always-current, or missing `last_modified_datetime` values will be re-downloaded every sync regardless of whether their content actually changed.

**One caveat:** the cache applies only to binary file content (PDF, DOCX, PPTX, plain text). HTML content (files where `content.mime_type == "text/html"` and the body is returned inline via `content.body`) is re-fetched every sync regardless of `last_modified_datetime`. If your corpus is mostly HTML, ongoing sync load will be roughly the same as first-sync load.

***

## Can I build my own APIs that have custom endpoints and responses?

No. The Content Gateway approach relies on you to follow the Moveworks Gateway spec. We've designed this spec based on [OData and API design best practices](https://www.odata.org/documentation/).

***

## What if I have multiple backend systems?

You can create as many gateways as you want. Moveworks will integrate with them all. We recommend 1 gateway per instance to avoid stability issues.

***

## What if I already built a Gateway with a system and now want to build another gateway? What can I re-use?

Every content source system should typically be connected to a dedicated Gateway connector with new URLs & authorization. A majority of their previous gateway setup should be re-usable and you can duplicate your previous setup as a start. Only change will be when fetch content for the new gateway setup, make sure to retrieve it from the new source system.

***

## What are legacy gateways?

These are older gateways built on the previous Moveworks search infrastructure. They may have performance issues, are harder to troubleshoot, and do not support permission ingestion. While they continue to be supported, we strongly recommend using Content Gateway when possible. Post in [Moveworks Community](https://community.moveworks.com/) if you have additional requirements.

***

## Related

* [How Permissions Work](/api-reference/content-gateway/how-permissions-work): the ReBAC model
* [Common Pitfalls](/api-reference/content-gateway/common-pitfalls): the most frequent integration mistakes
* [Errors](/api-reference/content-gateway/errors): error response format and rate-limit header details
* [Supported MIME Types](/api-reference/content-gateway/supported-mime-types): file formats and the 25 MB cap