User Ingestion Processors Guide

Processors are data transformation tools that help clean, filter, and enrich user data during identity ingestion

Overview

Processors are optional data transformation steps in your User Identity Flow. They clean, filter, and enhance user data during ingestion from source systems (Okta, Active Directory, Workday, etc.).


Table of Contents

  1. Quick Reference
  2. How to Configure
  3. Available Processors
  4. Best Practices
  5. Common Scenarios
  6. Troubleshooting
  7. Rule Syntax Reference
  8. Limitations

Quick Reference

ProcessorUse Case
User Filter ProcessorRemove users matching specific field values
Filter Rule Post ProcessorComplex filtering with multiple conditions
User Timezone ProcessorAuto-populate timezone from location
User Password Meta Info ProcessorFill missing password expiration dates
User Geocode ProcessorAdd coordinates for analytics and location-based content retrieval
DSL First Match Dedupe ProcessorDeduplicate users across sources
Unified Resolve Manager ProcessorLink manager-employee hierarchy

How to Configure

Navigation

  1. Navigate to Import Users
  2. Select your source(s) on the Connectors page
  3. Proceed to Configure Selected Sources
  4. Click Advanced Mode

In Advanced Mode

Processors to Apply: Add transformation processors that run during ingestion. Processors execute in the order listed.

Filter and Attribute List: Control which records and fields are imported at the source level before any processors run.


Available Processors

Filter Users by Field Value

Processor Name: User Filter Processor

Excludes users from ingestion when a specified field matches any value in your exclusion list. This is a simple, single-field filter that performs exact matching.

Use Cases: Exclude terminated employees, contractors, test accounts, or specific departments based on a single field value.

Configuration:

FieldDescriptionExample
Filter KeyUser field to checkemployment_status
Filter ListValues to exclude (comma-separated)Terminated, Inactive

Examples:

Exclude inactive employees:
  Filter Key: employment_status
  Filter List: Terminated, Inactive, On Leave

Exclude non-employees:
  Filter Key: user_type
  Filter List: Contractor, Temp, External

Filter Users by Rule

Processor Name: Filter Rule Post Processor

Excludes users from ingestion based on complex conditional logic. Unlike the simple Field Value filter, this processor allows you to combine multiple field conditions using AND/OR logic, perform date comparisons, and apply sophisticated filtering rules.

Use Cases: Apply multi-condition filtering (e.g., "active AND hired after date"), date-based filtering, or any logic requiring multiple field comparisons.

Configuration:

FieldDescription
Filter Condition (DSL)Rule determining which users to keep

Examples:

Keep only active employees:
  employment_status == "Active"

Active employees in specific departments:
  employment_status == "Active" AND department IN ["Engineering", "Sales"]

Exclude users without company email:
  email_addr CONTAINS "@company.com"

Keep only users with employee IDs:
  employee_id != ""

💡 Tip: See Rule Syntax Reference for complete syntax.


Remove Duplicate Users

Processor Name: DSL First Match Dedupe Processor

When the same user appears multiple times (identified by your Index Key, typically email), this processor evaluates all duplicate records and keeps only the first one that matches your filter condition. All other duplicates are discarded. This operates across all sources after they're merged together.

Use Cases: Multiple integrations provide overlapping users, need to select which source's data to prioritize, ensure each user appears only once in final roster.

⚠️ Note: Can be attached to any source - operates on merged data from all sources after ingestion.

Configuration:

FieldDescriptionExample
Index KeyField to identify duplicatesemail_addr
Filter Condition (DSL)Rule to select which duplicate to keeprecord.employee_id != ""
LowercaseConvert index key to lowercasetrue (recommended for emails)

Common Rules:

Prefer active users:
  record.employment_status == "Active"

Prefer records with employee ID:
  record.employee_id != ""

⚠️ Important: Always set Lowercase to true when using email_addr as Index Key.


Set User Timezone

Processor Name: User Timezone Processor

Automatically infers and populates the user's timezone field by analyzing their location information (city, state, country). The processor uses geographic data to determine the most likely timezone for each user's location.

Use Cases: Source system doesn't provide timezone field, need consistent timezone data for time-based notifications and scheduling.

Configuration: No configuration needed - just add the processor. It automatically reads from standard location fields.


Calculate Password Expiration

Processor Name: User Password Meta Info Processor

Fills in missing password date information using your organization's password policy configuration. This processor operates on two fields in the user record: password_last_changed and password_expires.

What Fields It Uses:

  • Input fields: password_last_changed (date), password_expires (date)
  • Password policy: Uses your org's configured password_expiry_in_days setting
  • Output: Populates whichever field is missing

How It Works:

ScenarioWhat It DoesCalculation
password_last_changed exists, password_expires emptyCalculates expiry datepassword_expires = password_last_changed + password_expiry_in_days
password_expires exists, password_last_changed emptyCalculates last changed datepassword_last_changed = password_expires - password_expiry_in_days
Both fields populatedNo action taken(already complete)
Both fields emptyNo action taken(insufficient data)

Configuration:

FieldDescriptionDefaultUse Case
Offset DaysAdjustment to password policy duration0Use if source system's policy differs from org config (e.g., +5 or -5 days)

Use Cases:

  • Source provides only one of the two password fields
  • Need complete password data for expiry notifications and password reset workflows
  • Source system password policy differs slightly from Moveworks org configuration

Add Location Coordinates

Processor Name: User Geocode Processor

Enriches user records with geographic coordinates (latitude/longitude) by geocoding their location information. The processor constructs a location query from specified fields, sends it to a geocoding service, and adds the resulting coordinates to the user's geocodes field.

What Fields It Uses:

  • Input: Any combination of location fields you specify (typically country_code, state, city)
  • Output: Populates geocodes field with latitude/longitude data

Use Cases:

  • Enable location-based analytics and reporting
  • Support features that require geographic coordinates
  • Enrich user profiles with precise location data

⚠️ Important: Attach to the source that contains the location fields you want to geocode.

Performance Note: Makes external API calls for geocoding - may slow ingestion for large user sets.

Configuration:

FieldDescriptionExample
Location FieldsFields for geocodingcountry_code, state, city

Resolve Manager Relationships

Processor Name: Unified Resolve Manager Processor

Establishes manager-employee relationships by resolving manager email addresses to internal user IDs. This processor builds an index of all users (email → ID), then replaces each user's manager_email field value with the corresponding manager's internal ID, enabling proper organizational hierarchy.

What Fields It Uses:

  • Input: manager_email (manager's email address)
  • Index built from: email_addr (all users' emails)
  • Output: Replaces manager_email value with manager's internal identifier

How It Works:

  1. Builds an index mapping every user's email address to their internal ID
  2. For each user record, looks up their manager_email in the index
  3. Replaces the email with the manager's internal ID
  4. Result: Proper manager-employee links throughout the organization

Use Cases:

  • Source provides manager email instead of manager ID
  • Need to build organizational reporting hierarchy
  • Manager data comes from different source than employee data

⚠️ Note: Can be attached to any source - operates on all users after merge. Add AFTER deduplication to ensure manager links resolve correctly.

Configuration: No configuration needed - just add the processor.


Best Practices

1. Filter Early

Add filter processors before enrichment (like geocoding) to reduce processing time.

✅ Good Order:
  1. Filter Users by Field Value (remove terminated)
  2. Set User Timezone
  3. Add Location Coordinates

❌ Bad Order:
  1. Add Location Coordinates (slow)
  2. Filter Users by Field Value (wastes processing)

2. Deduplicate Before Manager Resolution

If using both processors, always apply deduplication first.

✅ Correct Order:
  1. Remove Duplicate Users
  2. Resolve Manager Relationships

❌ Incorrect Order:
  1. Resolve Manager Relationships
  2. Remove Duplicate Users

3. Use Lowercase for Email Deduplication

When deduplicating by email, always set Lowercase to true.

✅ Correct:
  Index Key: email_addr
  Lowercase: true

4. Attach Geocode to Source with Location Data

Add the geocode processor to the source that has location fields (country_code, state, city).

5. Test with Sample Data First

  1. Configure processor on test integration
  2. Run ingestion with small sample
  3. Verify results match expectations
  4. Apply to production

Common Scenarios

Scenario 1: Basic Filtering

Goal: Exclude terminated and inactive users from Okta.

Steps:

  1. Import Users → Select Okta → Advanced Mode
  2. In Processors to Apply, add: Filter Users by Field Value
  3. Configure: Filter Key: employment_status, Filter List: Terminated, Inactive

Scenario 2: Multi-Source with Deduplication

Goal: Use both Okta and Workday, preferring records with employee IDs.

Okta Source:

  1. Import Users → Select Okta → Advanced Mode
  2. Add: Set User Timezone

Workday Source:

  1. Import Users → Select Workday → Advanced Mode
  2. Add: Filter Users by Field Value
    • Filter Key: worker_type, Filter List: Contractor, Temp

Either Source (Deduplication):

  1. Add: Remove Duplicate Users
    • Index Key: email_addr
    • Filter Condition: record.employee_id != ""
    • Lowercase: true

Scenario 3: Complex Filtering

Goal: Keep only active, full-time employees with company email addresses.

Steps:

  1. Import Users → Select source → Advanced Mode
  2. Add: Filter Users by Rule
  3. Configure Filter Condition:
    employment_status == "Active" AND employment_type == "Full-time" AND email_addr CONTAINS "@company.com"

Scenario 4: Manager Hierarchy

Goal: Establish manager relationships when source provides manager emails.

Steps:

  1. Import Users → Select any source → Advanced Mode
  2. Add: Resolve Manager Relationships (no configuration needed)

Note: Add AFTER any deduplication processors.


Troubleshooting

❌ Too many users filtered out

Solution:

  • Review filter conditions and test with small sample
  • Verify field names match source data exactly (case-sensitive)
  • Check logical operators match intent (AND vs OR)

❌ Duplicate users still appearing

Check:

  • ✓ Lowercase set to true for email-based deduplication
  • ✓ Index Key matches field name exactly (case-sensitive)
  • ✓ Filter condition correctly identifies preferred record

❌ Manager relationships not working

Check:

  • ✓ Manager processor added AFTER deduplication
  • ✓ Manager emails exist in ingested user data
  • ✓ Manager email field populated in source data

❌ Rule syntax error

Check:

  • ✓ Field names match exactly (case-sensitive)
  • ✓ Strings in quotes: "value" not value
  • ✓ Lists use brackets: ["value1", "value2"]

Rule Syntax Reference

Filter Users by Rule

Direct field names, no prefix needed.

Basic Comparisons

field_name == "value"         # Equal to
field_name != "value"         # Not equal to
field_name > 100              # Greater than
field_name >= 100             # Greater than or equal
field_name < 100              # Less than
field_name <= 100             # Less than or equal

List Operations

field_name IN ["value1", "value2"]          # Field is in list
field_name NOT IN ["value1", "value2"]      # Field is not in list

Text Matching

field_name CONTAINS "text"         # Contains substring
field_name STARTS_WITH "text"      # Starts with text
field_name ENDS_WITH "text"        # Ends with text

Combining Conditions

condition1 AND condition2          # Both must be true
condition1 OR condition2           # Either must be true
NOT condition                      # Opposite/negation

Examples

# Keep active employees
employment_status == "Active"

# Active employees in specific departments
employment_status == "Active" AND department IN ["Engineering", "Sales"]

# Users with company email
email_addr CONTAINS "@company.com"

Limitations

Processor Limits:

  • Maximum 20 processors per integration source
  • Processors run in configured order
  • No processor loops or conditional execution

Rule Constraints:

  • Field names are case-sensitive
  • Changes require running ingestion to take effect

Performance Considerations:

  • Geocoding processors make external API calls (slower)
  • Large filter lists may impact performance
  • Test with sample data before full ingestion