Common Pitfalls

View as Markdown

Overview

These are the mistakes we most often see when reviewing partner-built Content Gateway integrations. Most of them silently overstate access, inflate sync cost, or break search relevance without raising an error from the protocol layer. Worth scanning before your first production sync.


Returning every member of a large group in one response

/groups/{groupId}/members is the highest-fan-out endpoint during sync. For groups with many thousands of members, return a page (e.g., 1,000-5,000 members) plus @odata.nextLink and let Moveworks follow the chain.


Using {type: USER, id: "*"} as a wildcard

Only {type: "GROUP", "id": "*", "action": "VIEW"} is interpreted as “any user.” The USER variant looks similar but does not grant public access. See How Permissions Work for the full wildcard semantics.


Stale, always-current, or missing last_modified_datetime on file responses

This timestamp is the cache key for the file binary cache. If it’s wrong, every sync re-downloads every binary file regardless of whether it changed. See the Operational Guide for the full cache behavior.


Walking permission inheritance live for every file

If /files/{id}/permissions makes 2-4 live calls to your source per file, your first-sync load is multiplied by that factor. Pre-fetch permissions in bulk during server startup or on the first /files request, then serve /files/{id}/permissions from an in-memory map. The starter code’s fetch_permissions_for_file docstring shows the recommended bulk-cache pattern.


Recursing into nested groups when returning /groups/{groupId}/members

Return only the direct members of a group, including any sub-groups as {"type": "GROUP", "id": "..."} entries. Moveworks follows the chain itself. Returning the flattened user list will silently overstate access.


Including HTML body in the /files list response

Moveworks reads HTML body from /files/{id}, not from the list response. Including body in every list response is wasted payload.


Defaulting to a public wildcard when permissions can’t be resolved

A common defensive pattern is to return {type: GROUP, id: "*"} whenever fetch_permissions_for_file can’t find a matching entry: unknown file IDs, cache misses during a bulk pre-fetch, transient errors after retries, or newly-created documents whose ACL hasn’t been loaded yet. This silently grants public access to any file the caller asks about.

Fail closed instead: return 404 for files that genuinely don’t exist, or [] (empty permissions list) when the file exists but its permissions couldn’t be resolved.


Single-threaded production deployment

Running python content_gateway.py directly in production serializes concurrent Moveworks requests. Configure your hosting platform for concurrent request handling. See the starter code concurrency note.


Exposing files over 25 MB

Files larger than 25 MB are downloaded by Moveworks but rejected before indexing, so they will silently never appear in search. If you knowingly have oversize content, filter it out at the source rather than letting Moveworks re-download it on every sync. See Supported MIME Types for the full behavior.