Detection teams ship one-off models per signal class. A new EDR signal arrives, somebody trains an autoencoder. Threat-intel feeds get richer, somebody else trains a similarity model. Each model carries its own preprocessing, its own evaluation, its own retraining cadence. ThreatFM is our attempt to collapse that into a single shared embedding for security telemetry. The metric that ended up mattering most isn't any single head's accuracy; it's what happens when two heads disagree on the same host. More on that at the end.

The signals we care about

These are heterogenous, but for detection they answer the same question: does this look like the kind of thing we've seen go bad?

Architecture

Logs Netflow Endpoint Threat intel Cross-modal Transformer Shared embedding
Per-modality encoders feed a shared cross-modal transformer, producing one embedding space across all four signal classes.

Downstream consumers fetch embeddings by host or session:

emb = threatfm.embed(host_id, window="1h")
# emb.shape == (num_events, d_model)
neighbors = index.knn(emb.mean(axis=0), k=20)

What it powers

Once the embedding is stable, downstream detection heads are cheap to add: anomaly detection, IoC similarity, attack-chain clustering, alert deduplication. New heads ship in days instead of quarters because the representational work already happened.

The metric that surprised us

The most useful metric isn't any single head's AUC. It's cross-head consistency: when an anomaly score and an attack-chain cluster disagree on the same host, the disagreement itself is a signal worth investigating, and roughly a third of the time it surfaces something the original heads would have missed on their own.