Zoya is a sovereign agent runtime: a single static binary you host on your own hardware that drives an LLM, holds its own memory, reaches the world through tools, and answers across half a dozen channels. It is, in the broad sense, a harness, a program that runs an agent loop and calls a model, which puts it in the same family as the dozen agent harnesses shipping this year. So the useful question isn't whether to use one of those; it's how Zoya differs from them. The short version: the constraints it's built to (one binary, zero runtime dependencies, sandboxed, sovereign, MCP-native, provider-agnostic) are the ones those harnesses tend to give up.

channels Telegram Discord CLI HTTP API MCP Zoya one Zig binary · zero deps ReAct agent loop 64 tools · MCP client/server memory · SQLite FTS5 sandbox · vault · approval provider ranking + failover models (fallback) Anthropic OpenAI Zhipu Google local probe · rank · fail over Zoya is the runtime, not the model: one binary between your channels and the models it drives.
Zoya runs the loop, holds the memory, and ranks and fails over across a chain of model providers. The model is a service it calls, one of several.

What Zoya actually is

Concretely, Zoya is one ~140 MB executable, roughly 39,000 lines of Zig across 48 modules, with SQLite compiled in and no runtime dependencies beyond the LLM API. It runs a ReAct loop: take a message, inject context (memory, profiles, timeline, commitments), call the model, run any tool calls, repeat until there's an answer, deliver it. Around that loop sits everything that makes it usable unattended:

The agent loop itself is the small part. The reason Zoya is worth building rather than adopting is everything in that list around the loop.

What sets Zoya apart

The agent-harness shelf is full, and some of it is good work, so the difference isn't quality; it's the constraints. Most are Python applications carrying a large dependency surface, often coupled to a managed service or wired to one model, with security treated as something you add later. Zoya inverts all of those defaults, and the defaults are the whole point.

Next to OpenClaw and Hermes

OpenClaw and Hermes are the two closest comparables. OpenClaw is the one that looks most like Zoya from a distance: an open-source personal agent that runs on your own hardware, answers over Telegram, WhatsApp, Slack, and Discord, plugs into any model from Claude to a local Ollama, keeps persistent memory, and gates risky actions through a human-in-the-loop layer. It is also the breakout project of the year, north of two hundred thousand stars and fifty-plus tools. Hermes is the other neighbour, memory-first and multi-surface, smaller in mindshare but the same broad shape. Put side by side with Zoya, the family resemblance is real, and so are the differences.

With OpenClaw the shared DNA is striking: both are sovereign personal agents that run on your hardware, speak over the same messaging apps, drive local or hosted models, keep persistent memory, and gate the risky calls. Zoya lands on those conclusions by convergence, not inheritance, the same way serious agents keep arriving at the same handful of ideas because those are the ones that survive contact with the real world.

The difference is the build, not the idea. OpenClaw and Hermes are large Python projects: a runtime, a dependency tree, a sprawling plugin and tool surface, with sandboxing and credential handling layered on. Zoya is a single static Zig binary with SQLite compiled in and no runtime deps; OS-level sandboxing, an encrypted vault, and approval gates are core modules rather than add-ons; it acts as an MCP server as well as a client; and instead of pointing at a model it drives a ranked fallback chain across several providers and local models, probing and failing over. Same idea, opposite center of gravity: OpenClaw optimizes for reach and a huge ecosystem, Zoya for a small, self-contained, sovereign binary you can audit in one place.

The one place the comparison turns into borrowing is memory. Hermes's memory design is worth reading, and Zoya took ideas from it, the same way it studied Mem0, Zep, Letta, and Honcho and then hardened its own native store instead of adopting any of them, because memory is an attack surface, not a recall contest. Take the idea, skip the dependency.

The wider field

The rest of the shelf, on the same axes:

ProjectWhat it isSovereignHow it differs from Zoya
OpenClawpopular personal-agent harness (Python)YesLarge Python project + plugin surface; security layered on; no ranked provider fallback.
Hermes Agentmemory-first multi-surface harnessYes (MIT)Richer memory, Python-shaped; its memory design is worth borrowing.
Browser UseLLM-driven browserYesA second model to drive a browser Zoya drives itself.
Stagehandbrowser agent frameworkcloud-coupledPulls toward Browserbase; not sovereign.
Skyvernform/workflow agentYes (AGPL)Workflow focus; Python; copyleft.
computer-usemanaged desktop driverNoManaged desktop; not sovereign.
Zoyasovereign agent runtime (Zig, one binary)YesSingle binary, sandboxed, provider-agnostic.

The line that separates them

Strip away features and the difference comes down to one axis: packaging and posture. One static binary versus a runtime and a dependency tree. OS-level sandboxing and a vault built in versus added later. A ranked fallback across providers and local models versus a single hard-wired endpoint. Sovereign and self-contained versus cloud-coupled. Most of the contrasts above are that same difference seen from different sides.

Where Zoya converges with the field

Building from scratch doesn't mean reaching different conclusions. A few patterns show up in every serious agent, Zoya included, not because anyone copied anyone, but because they're what holds up:

What would change my mind

Zoya already orchestrates several models, ranking and failing over between them. The next step that would pull a harness-shaped component back in is delegation: handing a whole sub-task to a smaller, cheaper local model (a classifier to pre-filter pages before the main model reads them, say). At that point there's a second, lesser model to drive, and a library like OpenClaw's loop becomes a candidate for that one internal component, used as a tool inside the runtime rather than as the runtime.

Until then, the loop was never the hard part. The hard part is the single binary, the sandbox, the fallback chain, and the MCP surface around it, and that is the part you can't import.