Filtering

What This Does

Filtering is the second pipeline stage. It takes raw scraped contacts and applies workspace-specific filter rules to decide which contacts move forward to verification. This reduces volume and ensures only eligible contacts are emailed.

How It Works

After scraping completes, filtering automatically starts. The system:

Loads filter rules configured for the workspace
Applies each rule (age range, coverage type, geography, etc.)
Uses deterministic hash-based sampling for reproducibility — the same contact always gets the same sampling decision
Outputs to the filtered_contacts table

Deterministic Sampling

Filtering uses hash-based sampling so results are reproducible:

hash_input = f"routing_sample_{contact_id}"
hash_val = (int(md5(hash_input).hexdigest(), 16) % 10000) / 10000.0

Same contact always gets the same sampling decision.

How To Use It

Viewing Filtered Contacts

Go to Contacts in the sidebar
Select the workspace and month
The contacts table shows filtering status for each contact
Use the status filter dropdown to see only filtered contacts

Filter Rules

Filter rules are configured per workspace in Settings → Filter Configuration. Common rules:

Age range (e.g., 65-80)
Coverage type (Medicare Supplement, etc.)
State inclusion/exclusion
Sampling rate (e.g., 50% of eligible contacts)

Common Issues

Symptom	Cause	Fix
0 contacts after filtering	Filter rules too restrictive or no rules configured	Check filter rules in Settings. If no rules exist, filtering passes everything through
Filtering takes too long	Large batch (>50K contacts)	Normal for large batches. Check queue depth on Status Page — filtering queue has concurrency 2
Different results after re-run	Shouldn't happen — deterministic sampling	If contact IDs changed (re-scraped), the hash changes. This is expected for new scrapes

PipelineSuccessRateSLOBreach: Filtering failures contribute to overall pipeline SLO
CeleryHighFailureRate: Check if filtering tasks are failing

Filtering

On this page