File Search FAQ
Overview
File Search superpowers the Answers capability in the Moveworks bot in that beyond Knowledge Base Articles, you can now ingest, index, and serve content from within .pdf, .docx and .pptx files in response to your employees’ search queries.
This guide answers questions about the capabilities of File Search today.
Q: Which external systems can I integrate with File Search?
A: File Search supports integrations with Sharepoint Online, Box, Google Drive, and Oracle.
Q: Which file formats does Moveworks File Search support?
A: File Search today supports PDF, Word Documents (.docx), and Powerpoint (.ppt/.pptx) file types. For Google Drive, native Google Docs and Google Slides are also supported as part of File Search.
Note: For some more details on file-types not supported:
- Image-based files, e.g. PDFs of scanned documents, are not supported
- Images, such as .PNG and .JPEG
- Pre-2007 Word documents, with the .doc extension
- Passworded-protected files, such as Adobe-encrypted PDF or Word documents protected with passwords
Q: How many files can be served in File Search?
A: Starting at 50,000, please reach out to your Customer Success team to learn more about higher limits.
Q: Are there any limits on file length?
A: Starting at 100 pages per file, please reach out to your Customer Success team to learn more about serving larger files with File Search.
Q: How often are my files ingested?
A: All changes to your files from your source system will be reflected into your Moveworks bot within a 24 hour SLA. Under the hood, our system crawls through the repositories in your application (as specified in Moveworks Setup), downloads the files and relevant metadata, extracts (parses) out the text, and then indexes/stores the parsed content in our hybrid-search indexes.
We are actively developing search infrastructure for supporting incremental ingestion, where our system will scan for changes in your file repositories and propagate them quickly into our search index. In this world coming soon, the SLA for mirroring changes to your source files into your Moveworks search experience will be on the order of minutes.
Q: How does Moveworks File Search respect my source’s file-level ACL permissions?
A: Moveworks maintains strict enforcement of the file-level ACL permissions from your source system. What this means is that an employee will never access or see search results from a file that they cannot access in your Sharepoint, Google Drive, etc.
For more on how File Search respects your ACL permissions, see: https://help.moveworks.com/docs/file-search-respecting-file-permissions
Q: What happens when a file’s source ACL permissions change?
A: Your source system’s file-level ACL permissions are re-ingested hourly, but depend on the volume of user and group permissions in your system – file access changes made in your source system will always be honored upon ingestion completion.
Q: How do I purchase File Search?
A: Reach out to your sales representative from Moveworks to discuss bringing File Search to your employees.
Q: How are my files stored?
A: Moveworks uses AWS S3 buckets as the main customer data store. Dedicated buckets are allocated for each customer and encrypted with unique encryption keys per-customer generated via AWS KMS service. All data is encrypted at rest using AES 256.
File Search follows all of the same security protocols as our Answers skill today.
Q: How does Moveworks extract and store embeddings from Files?
A: Moveworks employs a chunk-based architecture to process files, dividing your documents into smaller pieces, or “chunks,” to efficiently create embeddings. This architecture especially benefits Moveworks Copilot's retrieval-augmented generation system (RAG), allowing Copilot to generate helpful answers using more contextual, relevant text.
Q: Can File Search comprehend tables?
A: We can support text-only table extraction. Meaning we can perform semantic search over text in individual cells but will not be able to comprehend Row / Column relationships
- e.g. “What was the population of California in the year 2000?” we will not be able to return “31,880,000” since it requires comprehension of rows and columns.
2000 | 2001 | 2002 | |
---|---|---|---|
California | 31,880,000 | 33,990,000 | 36,490,000 |
Washington | 23,000,000 | 23,000,000 | 23,000,000 |
Arizona | 43,000,000 | 43,000,000 | 43,000,000 |
- e.g. “What was the population of California in the year 2000?” can work since there is enough context within the cell on its own.
2000 | 2001 | 2002 | |
---|---|---|---|
California | In 2000, there was a cumulative population of 31,880,000 | In 2001, there was a cumulative population of 31,880,000 | In 2002, there was a cumulative population of 31,880,000 |
Washington | In 2000, there was a cumulative population of 23,000,000 | In 2001, there was a cumulative population of 23,000,000 | In 2002, there was a cumulative population of 23,000,000 |
Arizona | In 2000, there was a cumulative population of 23,000,000 | In 2001, there was a cumulative population of 23,000,000 | In 2002, there was a cumulative population of 23,000,000 |
Q: Does File Search leverage Location-based personalization (geocode-boosting)?
A: Yes, we currently can prioritize file search results based on matches between location in the File Name, i.e. PTO Policies USA.pdf , and the location of the searching user.
Q: Can File Search perform Optical Character Recognition (OCR)?
A: Optical Character Recognition (OCR) is not supported today. As a result, .PDFs created by image scans of physical documents are not recommended, as results will not be optimal.
Q: Does File Search currently support non-English files?
A: Non-English File Search support is now in Limited Preview! Reach out your Customer Success partner to learn more.
Updated about 1 month ago