Zoya is a sovereign agent runtime: a single static binary you host on your own hardware that drives an LLM, holds its own memory, reaches the world through tools, and answers across half a dozen channels. It is, in the broad sense, a harness, a program that runs an agent loop and calls a model, which puts it in the same family as the dozen agent harnesses shipping this year. So the useful question isn't whether to use one of those; it's how Zoya differs from them. The short version: the constraints it's built to (one binary, zero runtime dependencies, sandboxed, sovereign, MCP-native, provider-agnostic) are the ones those harnesses tend to give up.
What Zoya actually is
Concretely, Zoya is one ~140 MB executable, roughly 39,000 lines of Zig across 48 modules, with SQLite compiled in and no runtime dependencies beyond the LLM API. It runs a ReAct loop: take a message, inject context (memory, profiles, timeline, commitments), call the model, run any tool calls, repeat until there's an answer, deliver it. Around that loop sits everything that makes it usable unattended:
- Single static binary, zero runtime deps. One file you drop on a box, or a 140 MB container. Nothing to
pip install, no interpreter, no dependency tree to keep patched. - Sovereign and self-hosted. Runs on the operator's own hardware. No managed control plane, no telemetry leaving the box, no vendor that can change the terms underneath it.
- MCP-native, both ways. It's an MCP client (external tools plug in over stdio) and an MCP server (its own 64 tools are exposed to other clients).
- Provider-agnostic, with a fallback chain. It drives Anthropic, OpenAI, Zhipu, Google, and local models, ranking them on live latency, probing health, breaking circuits, and failing over. It isn't one model; it orchestrates several and picks the fast healthy one.
- Multi-channel in one process. Telegram, Discord, WhatsApp, CLI, an HTTP API, and MCP, with per-user profiles and group awareness.
- Security built in, not bolted on. OS-level sandboxing (landlock/bubblewrap), an encrypted vault, human-in-the-loop approval for sensitive tools, per-sender budgets, rate limits, cyclic-call detection, output leak checks.
The agent loop itself is the small part. The reason Zoya is worth building rather than adopting is everything in that list around the loop.
What sets Zoya apart
The agent-harness shelf is full, and some of it is good work, so the difference isn't quality; it's the constraints. Most are Python applications carrying a large dependency surface, often coupled to a managed service or wired to one model, with security treated as something you add later. Zoya inverts all of those defaults, and the defaults are the whole point.
- The binary is the deliverable. For an agent meant to run unattended on your own machine, the dependency surface is the attack surface and the ops burden. A single static binary with SQLite inside has neither; a Python harness brings the runtime and the tree.
- Sovereignty is non-negotiable. Anything cloud-coupled (Stagehand toward Browserbase, hosted computer-use) is out on contact. That sets it apart from a chunk of the shelf before features are even weighed.
- The security primitives are core modules. Sandbox, vault, approval, budgets, loop detection, built in rather than retrofitted onto a general-purpose harness later.
- It already reasons. Zoya runs its own loop and emits typed JSON from what it fetches. A second LLM-driven harness on top, to drive a browser or extract structured data, would be a redundant second model doing what the primary already does (the reasoning behind skipping the LLM extractors).
Next to OpenClaw and Hermes
OpenClaw and Hermes are the two closest comparables. OpenClaw is the one that looks most like Zoya from a distance: an open-source personal agent that runs on your own hardware, answers over Telegram, WhatsApp, Slack, and Discord, plugs into any model from Claude to a local Ollama, keeps persistent memory, and gates risky actions through a human-in-the-loop layer. It is also the breakout project of the year, north of two hundred thousand stars and fifty-plus tools. Hermes is the other neighbour, memory-first and multi-surface, smaller in mindshare but the same broad shape. Put side by side with Zoya, the family resemblance is real, and so are the differences.
With OpenClaw the shared DNA is striking: both are sovereign personal agents that run on your hardware, speak over the same messaging apps, drive local or hosted models, keep persistent memory, and gate the risky calls. Zoya lands on those conclusions by convergence, not inheritance, the same way serious agents keep arriving at the same handful of ideas because those are the ones that survive contact with the real world.
The difference is the build, not the idea. OpenClaw and Hermes are large Python projects: a runtime, a dependency tree, a sprawling plugin and tool surface, with sandboxing and credential handling layered on. Zoya is a single static Zig binary with SQLite compiled in and no runtime deps; OS-level sandboxing, an encrypted vault, and approval gates are core modules rather than add-ons; it acts as an MCP server as well as a client; and instead of pointing at a model it drives a ranked fallback chain across several providers and local models, probing and failing over. Same idea, opposite center of gravity: OpenClaw optimizes for reach and a huge ecosystem, Zoya for a small, self-contained, sovereign binary you can audit in one place.
The one place the comparison turns into borrowing is memory. Hermes's memory design is worth reading, and Zoya took ideas from it, the same way it studied Mem0, Zep, Letta, and Honcho and then hardened its own native store instead of adopting any of them, because memory is an attack surface, not a recall contest. Take the idea, skip the dependency.
The wider field
The rest of the shelf, on the same axes:
| Project | What it is | Sovereign | How it differs from Zoya |
|---|---|---|---|
| OpenClaw | popular personal-agent harness (Python) | Yes | Large Python project + plugin surface; security layered on; no ranked provider fallback. |
| Hermes Agent | memory-first multi-surface harness | Yes (MIT) | Richer memory, Python-shaped; its memory design is worth borrowing. |
| Browser Use | LLM-driven browser | Yes | A second model to drive a browser Zoya drives itself. |
| Stagehand | browser agent framework | cloud-coupled | Pulls toward Browserbase; not sovereign. |
| Skyvern | form/workflow agent | Yes (AGPL) | Workflow focus; Python; copyleft. |
| computer-use | managed desktop driver | No | Managed desktop; not sovereign. |
| Zoya | sovereign agent runtime (Zig, one binary) | Yes | Single binary, sandboxed, provider-agnostic. |
The line that separates them
Strip away features and the difference comes down to one axis: packaging and posture. One static binary versus a runtime and a dependency tree. OS-level sandboxing and a vault built in versus added later. A ranked fallback across providers and local models versus a single hard-wired endpoint. Sovereign and self-contained versus cloud-coupled. Most of the contrasts above are that same difference seen from different sides.
Where Zoya converges with the field
Building from scratch doesn't mean reaching different conclusions. A few patterns show up in every serious agent, Zoya included, not because anyone copied anyone, but because they're what holds up:
- Provenance-tagged memory. Every row carries where it came from, so an observed input is never silently promoted to a believed fact (Agent memory is an attack surface).
- The escalation ladder. Try cheap, escalate only on failure, which is how Zoya's web tools are wired (web_fetch → lightpanda → scrapling).
- Approval gates. Risky tool calls wrap in human-in-the-loop. Every agent that actually touches the world converges here.
What would change my mind
Zoya already orchestrates several models, ranking and failing over between them. The next step that would pull a harness-shaped component back in is delegation: handing a whole sub-task to a smaller, cheaper local model (a classifier to pre-filter pages before the main model reads them, say). At that point there's a second, lesser model to drive, and a library like OpenClaw's loop becomes a candidate for that one internal component, used as a tool inside the runtime rather than as the runtime.
Until then, the loop was never the hard part. The hard part is the single binary, the sandbox, the fallback chain, and the MCP surface around it, and that is the part you can't import.