ThreatFM · H2 Labs

Detection teams ship one-off models per signal class. A new EDR signal arrives, somebody trains an autoencoder. Threat-intel feeds get richer, somebody else trains a similarity model. Each model carries its own preprocessing, its own evaluation, its own retraining cadence. ThreatFM is our attempt to collapse that into a single shared embedding for security telemetry. The metric that ended up mattering most isn't any single head's accuracy; it's what happens when two heads disagree on the same host. More on that at the end.

The signals we care about

Logs: application, OS, audit. Structured and unstructured.
Netflow: connection-level traffic graphs and volume rollups.
Endpoint telemetry: process trees, file events, registry, EDR alerts.
Threat intel: indicators, narrative reports, TTP mappings.

These are heterogeneous, but for detection they answer the same question: does this look like the kind of thing we've seen go bad?

Architecture

Per-modality tokenizers: each signal class has its own encoder head with shape-aware preprocessing.
A cross-modal transformer that fuses tokens into a single embedding sequence per host or per session.
Contrastive pretraining: the netflow window around a host should sit close to its endpoint events in embedding space.

Per-modality encoders feed a shared cross-modal transformer, producing one embedding space across all four signal classes.

Downstream consumers fetch embeddings by host or session:

emb = threatfm.embed(host_id, window="1h")
# emb.shape == (num_events, d_model)
neighbors = index.knn(emb.mean(axis=0), k=20)

Do the representational work once. Every detection head after that is cheap.

What it powers

Once the embedding is stable, downstream detection heads are cheap to add: anomaly detection, IoC similarity, attack-chain clustering, alert deduplication. New heads ship in days instead of quarters because the representational work already happened.

The metric that surprised us

The most useful metric isn't any single head's AUC. It's cross-head consistency: when an anomaly score and an attack-chain cluster disagree on the same host, the disagreement itself is a signal worth investigating, and roughly a third of the time it surfaces something the original heads would have missed on their own.

Detection teams train a separate model per signal class. A new EDR signal lands, someone builds an autoencoder; a richer intel feed lands, someone else builds a similarity model. Each one carries its own preprocessing, evaluation, and retraining schedule. ThreatFM replaces that with a single shared embedding over all the telemetry.

The signals. Four classes, all answering the same question for detection: does this look like something we have seen go bad?

Logs: application, OS, audit (structured and unstructured).
Netflow: connection-level traffic graphs and volume rollups.
Endpoint: process trees, file events, registry, EDR alerts.
Threat intel: indicators, reports, TTP mappings.

How it is built

Each signal class gets its own tokeniser with shape-aware preprocessing. A cross-modal transformer fuses those tokens into one embedding sequence per host or session. Contrastive pretraining pulls related signals together: a host's netflow window should sit near its endpoint events in the embedding space.

Consumers fetch embeddings by host or session and run k-nearest-neighbour lookups against an index. Once the embedding is stable, new detection heads (anomaly scoring, IoC similarity, attack-chain clustering, alert dedup) are cheap because the representational work is already done. Heads ship in days, not quarters.