Adaptive rate caps for a noisy edge

A static rate limit can't tell a sales spike from a credential-stuffing burst. Set it loose and you let attackers in; set it tight and you 429 your own customers in the middle of a launch. This post covers how we moved our WAF off fixed per-route ceilings, and how the adaptive cap behaves against real traffic.

Why static caps weren't enough

Static caps treat every second as equally suspicious. A flash sale and a botnet pulse consume the same budget. For per-principal caps we already had something defensible: req/s on a sliding window keyed by API client. The gap was the shared, unauthenticated edge (/login, /forgot-password, /signup) where most abuse landed.

The two failure modes pull in opposite directions, and a fixed number has to sit at one point between them:

Cap setting	Sales spike	Credential-stuffing burst
Loose	customers get through	attackers get through too
Tight	real users hit 429 at launch	abuse is blocked
Adaptive	cap stays open	cap tightens on the drifting route

A static cap forces one number to answer two questions it cannot tell apart: is this load or is this an attack.

Our adaptive approach

A short-horizon anomaly predictor that looks at the last few minutes of request shape: path entropy, UA-mix divergence, body-size distribution, geo dispersion.
A budget controller that rebalances per-route ceilings inside a sliding window while respecting the published SLO floor.
A strict floor on the cap so a noisy classifier can't lock out legitimate users.

Static cap is flat. The adaptive cap tightens during the anomaly window and releases when shape returns to baseline.

The budget controller is essentially a leaky bucket whose ceiling shifts on anomaly score:

bucket = 0
ceiling = base_ceiling
for window in request_stream:
    score   = anomaly_predictor(window.shape)
    ceiling = base_ceiling * (1 - alpha * clamp(score, 0, 1))
    ceiling = max(ceiling, slo_floor)
    if window.rps > ceiling:
        throttle(window)

Results

Across two months of production traffic we measured a 4× faster mitigation onset on credential-stuffing campaigns versus the previous static thresholds. False-positive rate on legitimate spikes stayed flat. The cap's floor never slipped below the documented SLO.

Lessons

The hardest part wasn't the predictor, it was the distribution path. CDN configs, client SDKs, and one ancient batch job all assumed fixed caps and retried tightly on 429. Finding and softening those assumptions took longer than the predictor itself.

A static rate limit can't tell a sales spike from a credential-stuffing burst. Loose lets attackers in; tight 429s your own customers at launch. We moved the WAF off fixed per-route ceilings and let the cap flex with traffic shape.

Where static caps broke. Per-principal caps were fine: req/s on a sliding window keyed by API client. The gap was the shared, unauthenticated edge (/login, /forgot-password, /signup), where one fixed number had to cover both load and abuse.

What we built. Three parts:

an anomaly predictor reading the last few minutes of request shape (path entropy, UA mix, body size, geo spread);
a budget controller that rebalances per-route ceilings in a sliding window;
a hard floor so a noisy classifier can't lock real users out.

It's a leaky bucket whose ceiling drops as the anomaly score rises, clamped to the SLO floor. When a route's shape drifts, its cap tightens; when shape returns to baseline, the cap releases.

The cap spends budget where traffic gets weird, and leaves the quiet routes alone.

Results. Over two months of production traffic, 4× faster mitigation onset on credential-stuffing versus static thresholds, false positives on legitimate spikes flat, and the floor never dipped below the documented SLO.

The real cost was distribution. CDN configs, client SDKs, and one ancient batch job all assumed fixed caps and retried tightly on 429. Finding and softening those assumptions took longer than the predictor.