File Search superpowers the Answers capability in the Moveworks bot in that beyond Knowledge Base Articles, you can now ingest, index, and serve content from within .pdf, .txt, .doc, .docx and .pptx files in response to your employees’ search queries.
This guide answers questions about the capabilities of File Search today.
A: File Search supports integrations with Sharepoint Online, Box, Google Drive, and Oracle.
A: File Search today supports PDF, Word Documents (.doc, .docx), TXT, and Powerpoint (.ppt/.pptx) file types. For Google Drive, native Google Docs and Google Slides are also supported as part of File Search.
Note: For some more details on file-types not supported:
A: Please see the page for the latest.
A: Starting at 100 pages per file, please reach out to your Customer Success team to learn more about serving larger files with File Search. For snippetization of files, we have certain limits:
If size goes beyond this Moveworks will simply create one file title snippet (versus many snippets from the file itself)
A: Moveworks maintains strict enforcement of the file-level ACL permissions from your source system. What this means is that an employee will never access or see search results from a file that they cannot access in your Sharepoint, Google Drive, etc. For more on how File Search respects your ACL permissions, see: </ai-assistant/moveworks-classic/search/file-search/file-search-respecting-file-permissions
A: Your source system’s file-level ACL permissions are re-ingested hourly, but depend on the volume of user and group permissions in your system – file access changes made in your source system will always be honored upon ingestion completion.
A: Moveworks uses AWS S3 buckets as the main customer data store. Dedicated buckets are allocated for each customer and encrypted with unique encryption keys per-customer generated via AWS KMS service. All data is encrypted at rest using AES 256. File Search follows all of the same security protocols as our Answers skill today.
A: Moveworks employs a chunk-based architecture to process files, dividing your documents into smaller pieces, or “chunks,” to efficiently create embeddings. This architecture especially benefits Moveworks’ retrieval-augmented generation system (RAG), allowing the AI Assistant to generate helpful answers using more contextual, relevant text.
A: We can support text-only table extraction. Meaning we can perform semantic search over text in individual cells but will not be able to comprehend Row / Column relationships
A: Yes, we currently can prioritize file search results based on matches between location in the File Name, i.e. PTO Policies USA.pdf , and the location of the searching user.
A: Optical Character Recognition (OCR) is not supported today. As a result, .PDFs created by image scans of physical documents are not recommended, as results will not be optimal.
A: No, it doesn’t support non-english files.