Operations
Filtering
How filter rules work, sampling logic, and reviewing filtered contacts
What This Does
Filtering is the second pipeline stage. It takes raw scraped contacts and applies workspace-specific filter rules to decide which contacts move forward to verification. This reduces volume and ensures only eligible contacts are emailed.
How It Works
After scraping completes, filtering automatically starts. The system:
- Loads filter rules configured for the workspace
- Applies each rule (age range, coverage type, geography, etc.)
- Uses deterministic hash-based sampling for reproducibility — the same contact always gets the same sampling decision
- Outputs to the
filtered_contactstable
Deterministic Sampling
Filtering uses hash-based sampling so results are reproducible:
Same contact always gets the same sampling decision.
How To Use It
Viewing Filtered Contacts
- Go to Contacts in the sidebar
- Select the workspace and month
- The contacts table shows filtering status for each contact
- Use the status filter dropdown to see only
filteredcontacts
Filter Rules
Filter rules are configured per workspace in Settings → Filter Configuration. Common rules:
- Age range (e.g., 65-80)
- Coverage type (Medicare Supplement, etc.)
- State inclusion/exclusion
- Sampling rate (e.g., 50% of eligible contacts)
Common Issues
| Symptom | Cause | Fix |
|---|---|---|
| 0 contacts after filtering | Filter rules too restrictive or no rules configured | Check filter rules in Settings. If no rules exist, filtering passes everything through |
| Filtering takes too long | Large batch (>50K contacts) | Normal for large batches. Check queue depth on Status Page — filtering queue has concurrency 2 |
| Different results after re-run | Shouldn't happen — deterministic sampling | If contact IDs changed (re-scraped), the hash changes. This is expected for new scrapes |
Related Alerts
- PipelineSuccessRateSLOBreach: Filtering failures contribute to overall pipeline SLO
- CeleryHighFailureRate: Check if filtering tasks are failing