Home / Work / 03 · SiteProbe

SiteProbe — for the "some customers can't get through" problem.

A crawler that hits target URLs through multiple simulated visitor profiles and flags when responses diverge. Built because a real production incident taught me that "uptime monitors all green" doesn't mean "all customers can actually buy."

Role
Solo — design, build, deploy
Timeline
Mar 2026 — ongoing
Stack
Python, FastAPI, PostgreSQL, React, Vite, Docker, nginx
Status
Live at siteprobe.james-gault.com

The problem

Standard uptime monitors hit your site from one origin, with one profile, and log a 200 OK. Meanwhile, a silent WAF rule blocks everyone without a specific cookie. A geo-IP rule flags legitimate traffic from a real country as bot traffic. A rate limit fires against a user pattern nobody tested.

False-positive blocks on real customers are invisible to conventional monitoring. You find out from a support ticket days later — after the customer has already given up and bought from a competitor.

Approach

Crawl every target URL through multiple configurable profiles — different user agents, headers, proxies, request patterns. Persist the response fingerprints. When profiles diverge on the same URL, flag it.

  • Scheduled crawls via APScheduler.
  • Full response capture (status, headers, body hash) for every probe.
  • Analysis engine compares across profiles and surfaces discrepancies.
  • Alerts via Slack webhook and SMTP when a discrepancy persists past a threshold.

Architecture

Python 3.11+ FastAPI backend with an async httpx crawler, SQLAlchemy 2.0 async + PostgreSQL + Alembic migrations. React + Vite + TypeScript frontend with Recharts for visualization. Docker Compose, nginx reverse proxy, Let's Encrypt HTTPS.

Rule-based discrepancy detection in a small rule engine, so new categories can be added without touching the crawler.

Status & links

Live at siteprobe.james-gault.com. Companion mock target is MockShield — deployed separately, used for regression tests and demos against targets whose behavior is controllable on demand.