A static rate limit can't tell a sales spike from a credential-stuffing burst. Set it loose and you let attackers in; set it tight and you 429 your own customers in the middle of a launch. This post covers how we moved our WAF off fixed per-route ceilings, and how the adaptive cap behaves against real traffic.
Why static caps weren't enough
Static caps treat every second as equally suspicious. A flash sale and a botnet pulse consume the same budget. For per-principal caps we already had something defensible: req/s on a sliding window keyed by API client. The gap was the shared, unauthenticated edge (/login, /forgot-password, /signup) where most abuse landed.
Our adaptive approach
- A short-horizon anomaly predictor that looks at the last few minutes of request shape: path entropy, UA-mix divergence, body-size distribution, geo dispersion.
- A budget controller that rebalances per-route ceilings inside a sliding window while respecting the published SLO floor.
- A strict floor on the cap so a noisy classifier can't lock out legitimate users.
The budget controller is essentially a leaky bucket whose ceiling shifts on anomaly score:
bucket = 0
ceiling = base_ceiling
for window in request_stream:
score = anomaly_predictor(window.shape)
ceiling = base_ceiling * (1 - alpha * clamp(score, 0, 1))
ceiling = max(ceiling, slo_floor)
if window.rps > ceiling:
throttle(window)
Results
Across two months of production traffic we measured a 4× faster mitigation onset on credential-stuffing campaigns versus the previous static thresholds. False-positive rate on legitimate spikes stayed flat. The cap's floor never slipped below the documented SLO.
Lessons
The hardest part wasn't the predictor, it was the distribution path. CDN configs, client SDKs, and one ancient batch job all assumed fixed caps and retried tightly on 429. Finding and softening those assumptions took longer than the predictor itself.