Interval-Aware Caching for Druid

At scale, analytical query systems hit the same pattern: identical time-range queries fired over and over by dashboards, scheduled jobs, and ad-hoc exploration. When each one traverses the full compute path, it burns cycles re-deriving answers the cluster already produced minutes ago. This is a walkthrough of an interval-aware caching layer we built on top of Druid: it cuts the redundant work sharply while keeping results correct as data arrives late.

The problem

Druid is optimised for slicing time-series data by interval. A typical dashboard query asks for metrics over the last 24 hours, the last 7 days, or a specific business window. Two queries that differ by a few seconds of range still require independent scans, even though 99% of the underlying segments are shared.

A naive whole-query cache fails here for two reasons:

Query keys rarely match byte-for-byte: dashboards constantly shift their time windows forward.
Partial invalidation is awkward. When late-arriving data lands in a segment, any cached answer that overlaps it must be recomputed.

A whole-query cache treats two near-identical questions as strangers. The shared 99% pays full price on every call.

Interval-aware keys

Instead of caching the full query result, we decompose each query into closed and open intervals. Closed intervals point at segments Druid has already sealed; these are safe to cache indefinitely. Open intervals cover the live edge of the data and are always recomputed from source.

Interval	Covers	Cache policy	Key is a function of
Closed	sealed segments, fixed history	cache indefinitely, shared across dashboards	segment id and version
Open	the live edge, still mutating	always recompute from source	nothing, never cached

Query time range split into sealed (cacheable) and open (live) intervals

A simplified key derivation looks like this:

def cache_keys(query, segments):
    closed, open_ = [], []
    for seg in segments.covering(query.interval):
        if seg.is_sealed():
            closed.append(("seg", seg.id, seg.version, query.agg))
        else:
            open_.append(seg)
    return closed, open_

Once a segment is sealed, any aggregate over it is a pure function of the segment id. That is a cache hit reusable across every dashboard asking the same question.

Late arrivals and correctness

Real-world pipelines don't always deliver events in order. To handle this, each cached interval records the segment version it was computed against. When a segment is replaced, by compaction, late events, or re-ingestion, the cache entry is invalidated atomically.

on_segment_replaced(old_id, new_id):
    for key in cache.scan(prefix=("seg", old_id)):
        cache.delete(key)
    metrics.inc("cache.invalidations")

Results

After rolling this out across the analytics fleet we observed:

Query latency dropped by roughly 70% on cache-friendly workloads.
Druid broker CPU usage fell by a double-digit percentage during peak dashboard hours.
Cache correctness incidents went to zero: the interval versioning caught every late-arrival case automatically.

What's next

We're exploring adaptive TTLs on open intervals based on observed event-arrival patterns, and a shared cache layer across regions so a rebuilt dashboard in one datacentre warms the others. If you operate analytics systems at scale and want to compare notes on interval caching, reach out.

Dashboards, scheduled jobs, and ad-hoc queries fire the same time-range questions over and over. Each one runs the full compute path and re-derives an answer the cluster produced minutes ago. The waste is in the overlap: two queries that differ by a few seconds of range still scan independently, even when 99% of their segments are shared.

Why a whole-query cache fails. Dashboards shift their windows forward, so query keys almost never match byte-for-byte. And invalidation is coarse: one late event in a segment forces you to drop every cached answer that touches it.

Split the query, not the cache

Decompose each query into closed and open intervals.

Closed: sealed segments. The aggregate is a pure function of segment id and version, so cache it indefinitely and share it across every dashboard.
Open: the live edge, still mutating. Never cache it; always recompute from source.

A sealed segment's aggregate never changes. That is a cache hit every dashboard asking the same question can reuse.

Stay correct under late data

Each cached interval records the segment version it was computed against. When a segment is replaced by compaction, late events, or re-ingestion, every key under that segment id is deleted atomically. Versioning, not a TTL, is what keeps stale answers from leaking.

Results

Across the analytics fleet: query latency down roughly 70% on cache-friendly workloads, broker CPU down a double-digit percentage at peak, and zero correctness incidents. Next: adaptive TTLs on open intervals, and a cross-region cache so one warmed dashboard warms the others.

Sources

Apache Druid documentation, Query caching: how Druid's per-segment result cache keys on segment id and is invalidated when a segment is replaced, the primitive this layer builds on.
Apache Druid documentation, Segments and versioning: how sealed segments become immutable and how compaction and re-ingestion produce new versions, which is what the cache keys track.

Interval-aware caching for Druid

The problem

Interval-aware keys

Late arrivals and correctness

Results

What's next

Split the query, not the cache

Stay correct under late data

Results

Sources