For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Logo
DeveloperAcademyCommunityStatus
  • Getting Started
    • Welcome to Moveworks
    • Roadmap & Release Notes
    • Moveworks Best Practices
    • Labs
    • Professional Services
    • Support
  • AI Assistant
    • AI Assistant Overview
    • Capabilities
    • Web Experiences
    • Analytics & Performance
  • Enterprise Search
    • Overview
    • Agentic RAG Overview
    • Content Ingestion Platform
    • Profile Boosting
    • Retrieval
    • Permissions Platform
    • Built-in Content Connectors
    • Build your own Content Connectors
    • Configure Search
      • File Ingestion
        • File Search FAQ
        • How to Configure ServiceNow File Ingestion
        • How to Configure Google Drive File Search
        • Troubleshooting File Ingestion From Google Drive
      • Internal Knowledge Ingestion
      • Moveworks FAQ (Google Sheets Integration)
      • Indexed Content
      • Configure Search Plugin Settings
      • Content / Search Troubleshooting Guide
    • Configure Enterprise Search
    • Vetted Content
    • Writing AI-Ready KB Articles
    • Document Chunking and Snippetization Overview
  • Productivity Boost
    • Overview
    • Configure Productivity Boost
    • Quick GPT
    • Calendar Management
    • Brief Me
DeveloperAcademyCommunityStatus
On this page
  • Overview
  • Q: Which external systems can I integrate with File Search?
  • Q: Which file formats does Moveworks File Search support?
  • Q: How many files can be ingested in File Search?
  • Q: Are there any limits on file length?
  • Q: How does Moveworks File Search respect my source’s file-level ACL permissions?
  • Q: What happens when a file’s source ACL permissions change?
  • Q: How are my files stored?
  • Q: How does Moveworks extract and store embeddings from Files?
  • Q: Can File Search comprehend tables?
  • Q: Does File Search leverage Location-based personalization (geocode-boosting)?
  • Q: Can File Search perform Optical Character Recognition (OCR)?
  • Q: Does File Search currently support non-English files?
Enterprise SearchConfigure SearchFile Ingestion

File Search FAQ

||View as Markdown|
Was this page helpful?
Edit this page
Previous

How to Configure ServiceNow File Ingestion

This Document outlines how Moveworks Integrates with ServiceNow to ingest and serve up Files.
Next
Built with

Overview

File Search superpowers the Answers capability in the Moveworks bot in that beyond Knowledge Base Articles, you can now ingest, index, and serve content from within .pdf, .txt, .doc, .docx and .pptx files in response to your employees’ search queries.

This guide answers questions about the capabilities of File Search today.

Q: Which external systems can I integrate with File Search?

A: File Search supports integrations with Sharepoint Online, Box, Google Drive, and Oracle.

Q: Which file formats does Moveworks File Search support?

A: File Search today supports PDF, Word Documents (.doc, .docx), TXT, and Powerpoint (.ppt/.pptx) file types. For Google Drive, native Google Docs and Google Slides are also supported as part of File Search.

Note: For some more details on file-types not supported:

  • Image-based files, e.g. PDFs of scanned documents, are not supported
  • Images, such as .PNG and .JPEG
  • Pre-2007 Word documents, with the .doc extension
  • Passworded-protected files, such as Adobe-encrypted PDF or Word documents protected with passwords

Q: How many files can be ingested in File Search?

A: Please see the page for the latest.

Q: Are there any limits on file length?

A: Starting at 100 pages per file, please reach out to your Customer Success team to learn more about serving larger files with File Search. For snippetization of files, we have certain limits:

  • PDF , DOC, PPT: 25MB
  • TXT: 3 MB

If size goes beyond this Moveworks will simply create one file title snippet (versus many snippets from the file itself)

Q: How does Moveworks File Search respect my source’s file-level ACL permissions?

A: Moveworks maintains strict enforcement of the file-level ACL permissions from your source system. What this means is that an employee will never access or see search results from a file that they cannot access in your Sharepoint, Google Drive, etc. For more on how File Search respects your ACL permissions, see: </ai-assistant/moveworks-classic/search/file-search/file-search-respecting-file-permissions

Q: What happens when a file’s source ACL permissions change?

A: Your source system’s file-level ACL permissions are re-ingested hourly, but depend on the volume of user and group permissions in your system – file access changes made in your source system will always be honored upon ingestion completion.

Q: How are my files stored?

A: Moveworks uses AWS S3 buckets as the main customer data store. Dedicated buckets are allocated for each customer and encrypted with unique encryption keys per-customer generated via AWS KMS service. All data is encrypted at rest using AES 256. File Search follows all of the same security protocols as our Answers skill today.

Q: How does Moveworks extract and store embeddings from Files?

A: Moveworks employs a chunk-based architecture to process files, dividing your documents into smaller pieces, or “chunks,” to efficiently create embeddings. This architecture especially benefits Moveworks’ retrieval-augmented generation system (RAG), allowing the AI Assistant to generate helpful answers using more contextual, relevant text.

Q: Can File Search comprehend tables?

A: We can support text-only table extraction. Meaning we can perform semantic search over text in individual cells but will not be able to comprehend Row / Column relationships

  • e.g. “What was the population of California in the year 2000?” we will not be able to return “31,880,000” since it requires comprehension of rows and columns.
200020012002
California31,880,00033,990,00036,490,000
Washington23,000,00023,000,00023,000,000
Arizona43,000,00043,000,00043,000,000
  • e.g. “What was the population of California in the year 2000?” can work since there is enough context within the cell on its own.
200020012002
CaliforniaIn 2000, there was a cumulative population of 31,880,000In 2001, there was a cumulative population of 31,880,000In 2002, there was a cumulative population of 31,880,000
WashingtonIn 2000, there was a cumulative population of 23,000,000In 2001, there was a cumulative population of 23,000,000In 2002, there was a cumulative population of 23,000,000
ArizonaIn 2000, there was a cumulative population of 23,000,000In 2001, there was a cumulative population of 23,000,000In 2002, there was a cumulative population of 23,000,000

Q: Does File Search leverage Location-based personalization (geocode-boosting)?

A: Yes, we currently can prioritize file search results based on matches between location in the File Name, i.e. PTO Policies USA.pdf , and the location of the searching user.

Q: Can File Search perform Optical Character Recognition (OCR)?

A: Optical Character Recognition (OCR) is not supported today. As a result, .PDFs created by image scans of physical documents are not recommended, as results will not be optimal.

Q: Does File Search currently support non-English files?

A: No, it doesn’t support non-english files.