A static rate limit can't tell a sales spike from a credential-stuffing burst. Set it loose and you let attackers in; set it tight and you 429 your own customers in the middle of a launch. This post covers how we moved our WAF off fixed per-route ceilings, and how the adaptive cap behaves against real traffic.

Why static caps weren't enough

Static caps treat every second as equally suspicious. A flash sale and a botnet pulse consume the same budget. For per-principal caps we already had something defensible: req/s on a sliding window keyed by API client. The gap was the shared, unauthenticated edge (/login, /forgot-password, /signup) where most abuse landed.

Our adaptive approach

rps time static cap adaptive cap tightens where the shape drifts
Static cap is flat. The adaptive cap tightens during the anomaly window and releases when shape returns to baseline.

The budget controller is essentially a leaky bucket whose ceiling shifts on anomaly score:

bucket = 0
ceiling = base_ceiling
for window in request_stream:
    score   = anomaly_predictor(window.shape)
    ceiling = base_ceiling * (1 - alpha * clamp(score, 0, 1))
    ceiling = max(ceiling, slo_floor)
    if window.rps > ceiling:
        throttle(window)

Results

Across two months of production traffic we measured a 4× faster mitigation onset on credential-stuffing campaigns versus the previous static thresholds. False-positive rate on legitimate spikes stayed flat. The cap's floor never slipped below the documented SLO.

Lessons

The hardest part wasn't the predictor, it was the distribution path. CDN configs, client SDKs, and one ancient batch job all assumed fixed caps and retried tightly on 429. Finding and softening those assumptions took longer than the predictor itself.