User Ingestion Processors Guide
Processors are data transformation tools that help clean, filter, and enrich user data during identity ingestion
Overview
Processors are optional data transformation steps in your User Identity Flow. They clean, filter, and enhance user data during ingestion from source systems (Okta, Active Directory, Workday, etc.).
Table of Contents
- Quick Reference
- How to Configure
- Available Processors
- Best Practices
- Common Scenarios
- Troubleshooting
- Rule Syntax Reference
- Limitations
Quick Reference
| Processor | Use Case |
|---|---|
| User Filter Processor | Remove users matching specific field values |
| Filter Rule Post Processor | Complex filtering with multiple conditions |
| User Timezone Processor | Auto-populate timezone from location |
| User Password Meta Info Processor | Fill missing password expiration dates |
| User Geocode Processor | Add coordinates for analytics and location-based content retrieval |
| DSL First Match Dedupe Processor | Deduplicate users across sources |
| Unified Resolve Manager Processor | Link manager-employee hierarchy |
How to Configure
Navigation
- Navigate to Import Users
- Select your source(s) on the Connectors page
- Proceed to Configure Selected Sources
- Click Advanced Mode
In Advanced Mode
Processors to Apply: Add transformation processors that run during ingestion. Processors execute in the order listed.
Filter and Attribute List: Control which records and fields are imported at the source level before any processors run.
Available Processors
Filter Users by Field Value
Processor Name: User Filter Processor
Excludes users from ingestion when a specified field matches any value in your exclusion list. This is a simple, single-field filter that performs exact matching.
Use Cases: Exclude terminated employees, contractors, test accounts, or specific departments based on a single field value.
Configuration:
| Field | Description | Example |
|---|---|---|
| Filter Key | User field to check | employment_status |
| Filter List | Values to exclude (comma-separated) | Terminated, Inactive |
Examples:
Exclude inactive employees:
Filter Key: employment_status
Filter List: Terminated, Inactive, On Leave
Exclude non-employees:
Filter Key: user_type
Filter List: Contractor, Temp, External
Filter Users by Rule
Processor Name: Filter Rule Post Processor
Excludes users from ingestion based on complex conditional logic. Unlike the simple Field Value filter, this processor allows you to combine multiple field conditions using AND/OR logic, perform date comparisons, and apply sophisticated filtering rules.
Use Cases: Apply multi-condition filtering (e.g., "active AND hired after date"), date-based filtering, or any logic requiring multiple field comparisons.
Configuration:
| Field | Description |
|---|---|
| Filter Condition (DSL) | Rule determining which users to keep |
Examples:
Keep only active employees:
employment_status == "Active"
Active employees in specific departments:
employment_status == "Active" AND department IN ["Engineering", "Sales"]
Exclude users without company email:
email_addr CONTAINS "@company.com"
Keep only users with employee IDs:
employee_id != ""
💡 Tip: See Rule Syntax Reference for complete syntax.
Remove Duplicate Users
Processor Name: DSL First Match Dedupe Processor
When the same user appears multiple times (identified by your Index Key, typically email), this processor evaluates all duplicate records and keeps only the first one that matches your filter condition. All other duplicates are discarded. This operates across all sources after they're merged together.
Use Cases: Multiple integrations provide overlapping users, need to select which source's data to prioritize, ensure each user appears only once in final roster.
⚠️ Note: Can be attached to any source - operates on merged data from all sources after ingestion.
Configuration:
| Field | Description | Example |
|---|---|---|
| Index Key | Field to identify duplicates | email_addr |
| Filter Condition (DSL) | Rule to select which duplicate to keep | record.employee_id != "" |
| Lowercase | Convert index key to lowercase | true (recommended for emails) |
Common Rules:
Prefer active users:
record.employment_status == "Active"
Prefer records with employee ID:
record.employee_id != ""
⚠️ Important: Always set Lowercase to
truewhen usingemail_addras Index Key.
Set User Timezone
Processor Name: User Timezone Processor
Automatically infers and populates the user's timezone field by analyzing their location information (city, state, country). The processor uses geographic data to determine the most likely timezone for each user's location.
Use Cases: Source system doesn't provide timezone field, need consistent timezone data for time-based notifications and scheduling.
Configuration: No configuration needed - just add the processor. It automatically reads from standard location fields.
Calculate Password Expiration
Processor Name: User Password Meta Info Processor
Fills in missing password date information using your organization's password policy configuration. This processor operates on two fields in the user record: password_last_changed and password_expires.
What Fields It Uses:
- Input fields:
password_last_changed(date),password_expires(date) - Password policy: Uses your org's configured
password_expiry_in_dayssetting - Output: Populates whichever field is missing
How It Works:
| Scenario | What It Does | Calculation |
|---|---|---|
password_last_changed exists, password_expires empty | Calculates expiry date | password_expires = password_last_changed + password_expiry_in_days |
password_expires exists, password_last_changed empty | Calculates last changed date | password_last_changed = password_expires - password_expiry_in_days |
| Both fields populated | No action taken | (already complete) |
| Both fields empty | No action taken | (insufficient data) |
Configuration:
| Field | Description | Default | Use Case |
|---|---|---|---|
| Offset Days | Adjustment to password policy duration | 0 | Use if source system's policy differs from org config (e.g., +5 or -5 days) |
Use Cases:
- Source provides only one of the two password fields
- Need complete password data for expiry notifications and password reset workflows
- Source system password policy differs slightly from Moveworks org configuration
Add Location Coordinates
Processor Name: User Geocode Processor
Enriches user records with geographic coordinates (latitude/longitude) by geocoding their location information. The processor constructs a location query from specified fields, sends it to a geocoding service, and adds the resulting coordinates to the user's geocodes field.
What Fields It Uses:
- Input: Any combination of location fields you specify (typically
country_code,state,city) - Output: Populates
geocodesfield with latitude/longitude data
Use Cases:
- Enable location-based analytics and reporting
- Support features that require geographic coordinates
- Enrich user profiles with precise location data
⚠️ Important: Attach to the source that contains the location fields you want to geocode.
Performance Note: Makes external API calls for geocoding - may slow ingestion for large user sets.
Configuration:
| Field | Description | Example |
|---|---|---|
| Location Fields | Fields for geocoding | country_code, state, city |
Resolve Manager Relationships
Processor Name: Unified Resolve Manager Processor
Establishes manager-employee relationships by resolving manager email addresses to internal user IDs. This processor builds an index of all users (email → ID), then replaces each user's manager_email field value with the corresponding manager's internal ID, enabling proper organizational hierarchy.
What Fields It Uses:
- Input:
manager_email(manager's email address) - Index built from:
email_addr(all users' emails) - Output: Replaces
manager_emailvalue with manager's internal identifier
How It Works:
- Builds an index mapping every user's email address to their internal ID
- For each user record, looks up their
manager_emailin the index - Replaces the email with the manager's internal ID
- Result: Proper manager-employee links throughout the organization
Use Cases:
- Source provides manager email instead of manager ID
- Need to build organizational reporting hierarchy
- Manager data comes from different source than employee data
⚠️ Note: Can be attached to any source - operates on all users after merge. Add AFTER deduplication to ensure manager links resolve correctly.
Configuration: No configuration needed - just add the processor.
Best Practices
1. Filter Early
Add filter processors before enrichment (like geocoding) to reduce processing time.
✅ Good Order:
1. Filter Users by Field Value (remove terminated)
2. Set User Timezone
3. Add Location Coordinates
❌ Bad Order:
1. Add Location Coordinates (slow)
2. Filter Users by Field Value (wastes processing)2. Deduplicate Before Manager Resolution
If using both processors, always apply deduplication first.
✅ Correct Order:
1. Remove Duplicate Users
2. Resolve Manager Relationships
❌ Incorrect Order:
1. Resolve Manager Relationships
2. Remove Duplicate Users3. Use Lowercase for Email Deduplication
When deduplicating by email, always set Lowercase to true.
✅ Correct:
Index Key: email_addr
Lowercase: true4. Attach Geocode to Source with Location Data
Add the geocode processor to the source that has location fields (country_code, state, city).
5. Test with Sample Data First
- Configure processor on test integration
- Run ingestion with small sample
- Verify results match expectations
- Apply to production
Common Scenarios
Scenario 1: Basic Filtering
Goal: Exclude terminated and inactive users from Okta.
Steps:
- Import Users → Select Okta → Advanced Mode
- In Processors to Apply, add: Filter Users by Field Value
- Configure: Filter Key:
employment_status, Filter List:Terminated, Inactive
Scenario 2: Multi-Source with Deduplication
Goal: Use both Okta and Workday, preferring records with employee IDs.
Okta Source:
- Import Users → Select Okta → Advanced Mode
- Add: Set User Timezone
Workday Source:
- Import Users → Select Workday → Advanced Mode
- Add: Filter Users by Field Value
- Filter Key:
worker_type, Filter List:Contractor, Temp
- Filter Key:
Either Source (Deduplication):
- Add: Remove Duplicate Users
- Index Key:
email_addr - Filter Condition:
record.employee_id != "" - Lowercase:
true
- Index Key:
Scenario 3: Complex Filtering
Goal: Keep only active, full-time employees with company email addresses.
Steps:
- Import Users → Select source → Advanced Mode
- Add: Filter Users by Rule
- Configure Filter Condition:
employment_status == "Active" AND employment_type == "Full-time" AND email_addr CONTAINS "@company.com"
Scenario 4: Manager Hierarchy
Goal: Establish manager relationships when source provides manager emails.
Steps:
- Import Users → Select any source → Advanced Mode
- Add: Resolve Manager Relationships (no configuration needed)
Note: Add AFTER any deduplication processors.
Troubleshooting
❌ Too many users filtered out
Solution:
- Review filter conditions and test with small sample
- Verify field names match source data exactly (case-sensitive)
- Check logical operators match intent (AND vs OR)
❌ Duplicate users still appearing
Check:
- ✓ Lowercase set to
truefor email-based deduplication - ✓ Index Key matches field name exactly (case-sensitive)
- ✓ Filter condition correctly identifies preferred record
❌ Manager relationships not working
Check:
- ✓ Manager processor added AFTER deduplication
- ✓ Manager emails exist in ingested user data
- ✓ Manager email field populated in source data
❌ Rule syntax error
Check:
- ✓ Field names match exactly (case-sensitive)
- ✓ Strings in quotes:
"value"notvalue - ✓ Lists use brackets:
["value1", "value2"]
Rule Syntax Reference
Filter Users by Rule
Direct field names, no prefix needed.
Basic Comparisons
field_name == "value" # Equal to
field_name != "value" # Not equal to
field_name > 100 # Greater than
field_name >= 100 # Greater than or equal
field_name < 100 # Less than
field_name <= 100 # Less than or equal
List Operations
field_name IN ["value1", "value2"] # Field is in list
field_name NOT IN ["value1", "value2"] # Field is not in list
Text Matching
field_name CONTAINS "text" # Contains substring
field_name STARTS_WITH "text" # Starts with text
field_name ENDS_WITH "text" # Ends with text
Combining Conditions
condition1 AND condition2 # Both must be true
condition1 OR condition2 # Either must be true
NOT condition # Opposite/negation
Examples
# Keep active employees
employment_status == "Active"
# Active employees in specific departments
employment_status == "Active" AND department IN ["Engineering", "Sales"]
# Users with company email
email_addr CONTAINS "@company.com"
Limitations
Processor Limits:
- Maximum 20 processors per integration source
- Processors run in configured order
- No processor loops or conditional execution
Rule Constraints:
- Field names are case-sensitive
- Changes require running ingestion to take effect
Performance Considerations:
- Geocoding processors make external API calls (slower)
- Large filter lists may impact performance
- Test with sample data before full ingestion
Updated about 1 hour ago