The problem
Standard uptime monitors hit your site from one origin, with one profile, and log a 200 OK. Meanwhile, a silent WAF rule blocks everyone without a specific cookie. A geo-IP rule flags legitimate traffic from a real country as bot traffic. A rate limit fires against a user pattern nobody tested.
False-positive blocks on real customers are invisible to conventional monitoring. You find out from a support ticket days later — after the customer has already given up and bought from a competitor.
Approach
Crawl every target URL through multiple configurable profiles — different user agents, headers, proxies, request patterns. Persist the response fingerprints. When profiles diverge on the same URL, flag it.
- Scheduled crawls via APScheduler.
- Full response capture (status, headers, body hash) for every probe.
- Analysis engine compares across profiles and surfaces discrepancies.
- Alerts via Slack webhook and SMTP when a discrepancy persists past a threshold.
Architecture
Python 3.11+ FastAPI backend with an async httpx crawler, SQLAlchemy 2.0 async + PostgreSQL + Alembic migrations. React + Vite + TypeScript frontend with Recharts for visualization. Docker Compose, nginx reverse proxy, Let's Encrypt HTTPS.
Rule-based discrepancy detection in a small rule engine, so new categories can be added without touching the crawler.
Status & links
Live at siteprobe.james-gault.com. Companion mock target is MockShield — deployed separately, used for regression tests and demos against targets whose behavior is controllable on demand.