File Search FAQ | Moveworks

Overview

File Search superpowers the Answers capability in the Moveworks bot in that beyond Knowledge Base Articles, you can now ingest, index, and serve content from within .pdf, .txt, .doc, .docx and .pptx files in response to your employees’ search queries.

This guide answers questions about the capabilities of File Search today.

Q: Which external systems can I integrate with File Search?

A: File Search supports integrations with Sharepoint Online, Box, Google Drive, and Oracle.

Q: Which file formats does Moveworks File Search support?

A: File Search today supports PDF, Word Documents (.doc, .docx), TXT, and Powerpoint (.ppt/.pptx) file types. For Google Drive, native Google Docs and Google Slides are also supported as part of File Search.

Note: For some more details on file-types not supported:

Image-based files, e.g. PDFs of scanned documents, are not supported
Images, such as .PNG and .JPEG
Pre-2007 Word documents, with the .doc extension
Passworded-protected files, such as Adobe-encrypted PDF or Word documents protected with passwords

Q: How many files can be ingested in File Search?

A: Please see the page for the latest.

Q: Are there any limits on file length?

A: Starting at 100 pages per file, please reach out to your Customer Success team to learn more about serving larger files with File Search. For snippetization of files, we have certain limits:

PDF , DOC, PPT: 25MB
TXT: 3 MB

If size goes beyond this Moveworks will simply create one file title snippet (versus many snippets from the file itself)

Q: How does Moveworks File Search respect my source’s file-level ACL permissions?

A: Moveworks maintains strict enforcement of the file-level ACL permissions from your source system. What this means is that an employee will never access or see search results from a file that they cannot access in your Sharepoint, Google Drive, etc. For more on how File Search respects your ACL permissions, see: </ai-assistant/moveworks-classic/search/file-search/file-search-respecting-file-permissions

Q: What happens when a file’s source ACL permissions change?

A: Your source system’s file-level ACL permissions are re-ingested hourly, but depend on the volume of user and group permissions in your system – file access changes made in your source system will always be honored upon ingestion completion.

Q: How are my files stored?

A: Moveworks uses AWS S3 buckets as the main customer data store. Dedicated buckets are allocated for each customer and encrypted with unique encryption keys per-customer generated via AWS KMS service. All data is encrypted at rest using AES 256. File Search follows all of the same security protocols as our Answers skill today.

Q: How does Moveworks extract and store embeddings from Files?

A: Moveworks employs a chunk-based architecture to process files, dividing your documents into smaller pieces, or “chunks,” to efficiently create embeddings. This architecture especially benefits Moveworks’ retrieval-augmented generation system (RAG), allowing the AI Assistant to generate helpful answers using more contextual, relevant text.

Q: Can File Search comprehend tables?

A: We can support text-only table extraction. Meaning we can perform semantic search over text in individual cells but will not be able to comprehend Row / Column relationships

e.g. “What was the population of California in the year 2000?” we will not be able to return “31,880,000” since it requires comprehension of rows and columns.

	2000	2001	2002
California	31,880,000	33,990,000	36,490,000
Washington	23,000,000	23,000,000	23,000,000
Arizona	43,000,000	43,000,000	43,000,000

e.g. “What was the population of California in the year 2000?” can work since there is enough context within the cell on its own.

	2000	2001	2002
California	In 2000, there was a cumulative population of 31,880,000	In 2001, there was a cumulative population of 31,880,000	In 2002, there was a cumulative population of 31,880,000
Washington	In 2000, there was a cumulative population of 23,000,000	In 2001, there was a cumulative population of 23,000,000	In 2002, there was a cumulative population of 23,000,000
Arizona	In 2000, there was a cumulative population of 23,000,000	In 2001, there was a cumulative population of 23,000,000	In 2002, there was a cumulative population of 23,000,000

Q: Does File Search leverage Location-based personalization (geocode-boosting)?

A: Yes, we currently can prioritize file search results based on matches between location in the File Name, i.e. PTO Policies USA.pdf , and the location of the searching user.

Q: Can File Search perform Optical Character Recognition (OCR)?

A: Optical Character Recognition (OCR) is not supported today. As a result, .PDFs created by image scans of physical documents are not recommended, as results will not be optimal.

Q: Does File Search currently support non-English files?

A: Yes, learn more here.