Maverick Docs
Troubleshooting

Alerts & Troubleshooting

What each Slack alert means and decision trees for common issues

What This Does

The platform sends alerts to Slack when things go wrong. Alerts are routed by severity:

  • Critical#maverick-alerts-critical (immediate action needed)
  • Warning#maverick-alerts (investigate when possible)

Alert Reference

API Health

AlertSeverityTriggerWhat To Do
APIInstanceDownCriticalPrometheus can't reach API for >2 minCheck Status Page → Core Services. If down, SSH to server and run docker compose -f docker-compose.production.yml restart api
APIHighErrorRateWarningWarning5xx rate >1% over 5 minCheck Status Page → Logs → api for error patterns
APIHighErrorRateCriticalCritical5xx rate >5% over 3 minLikely a code bug or DB issue. Check Logs → api for stack traces. May need rollback
APIHighLatencyWarningWarningP95 latency >1sCheck database latency on Status Page → Database module
APIHighLatencyCriticalCriticalP95 latency >3sDatabase or Redis likely overloaded. Check System Resources for CPU/memory

Celery Workers and Queues

AlertSeverityTriggerWhat To Do
EmailEventsQueueDepthHighWarningemail_events queue >200Webhook backlog building. Check if email_bison worker is running
EmailEventsQueueDepthCriticalCriticalemail_events queue >1000Worker likely crashed. Restart: docker restart celery-email-bison
VerificationQueueDepthHighWarningverification queue >5000Large batch submitted. Normal — takes ~2h to clear at 3K/hour
ScrapingQueueDepthHighWarningscraping queue >10Multiple scraping jobs queued. Only 1 runs at a time (browser automation). Queue will drain slowly
CeleryHighFailureRateWarningWarning>10% task failuresCheck Logs for failing task patterns
CeleryHighFailureRateCriticalCritical>25% task failuresSystemic issue — check DB connectivity, Redis, external APIs

System Resources

AlertSeverityTriggerWhat To Do
DiskSpaceWarningWarningDisk >70% fullCheck Docker images and logs consuming space. Run docker system prune on server
DiskSpaceCriticalCriticalDisk >85% fullUrgent — clean up immediately or services will crash
HighMemoryUsageWarningWarningRAM >80%Check which containers are using most memory: docker stats
HighCPULoadWarning5-min load avg >6Usually scraping or verification spike. Should resolve on its own

Pipeline SLOs

AlertSeverityTriggerWhat To Do
PipelineSuccessRateSLOBreachWarningbelow 95% success over 1hCheck which pipeline stage is failing — look at Celery Workers module
VerificationBacklogSLOBreachWarningVerification queue >15KMassive batch — will take 4+ hours. No action needed unless it grows
BisonAPIHighErrorRateWarningBison >10% errorsCheck Bison status. May be rate limiting or token issue
BisonAPIDownCriticalBison >75% errorsBison API is likely down. Check send.maverickmarketingllc.com. No action until they resolve
DebounceAPIHighErrorRateWarningDebounce >10% errorsLikely rate limiting. Verification will slow but continue

General Troubleshooting Decision Tree

Something seems wrong
├── Is the Status Page showing any red modules?
│   ├── Yes → Click the red module for details
│   │   ├── Core Services down → restart API container
│   │   ├── Celery down → check which queue, restart that worker
│   │   ├── Database down → check Supabase status
│   │   └── System Resources critical → check disk/memory
│   └── No → Check Slack for recent alerts

├── Is a specific pipeline slow?
│   ├── Check Celery Workers → Active Tasks tab for what's running
│   ├── Check queue depths — high depth = backlog, not failure
│   └── Check Logs module for the relevant worker

└── Is data not updating?
    ├── Campaign data → wait for hourly sync (check Beat schedule)
    ├── Pipeline data → check if the stage completed (jobs table)
    └── Dashboard → hard refresh (Ctrl+Shift+R)

On this page