zvm3: a balanced-ternary VM

A follow-up to the LC-3 toolchain: a balanced-ternary VM with the same scaffolding (assembler, image format, fetch/decode/dispatch loop) but a different number system underneath. The question this answers is what changes at the ISA level when each digit takes three values instead of two. The most concrete answer the project produced is one instruction:

BR3 R1, BELOW, EQUAL, ABOVE

One instruction, three branch offsets, dispatch on the sign of R1. Negative goes to BELOW, zero to EQUAL, positive to ABOVE. There is no binary equivalent. Every binary-ISA branch is a fork, never a trichotomy, and writing one explains why an exercise like this is worth doing.

Sizing

The 1958 Setun, with its 9-trit short word and 18-trit long word, remains the historical reference for ternary instruction set design. The project takes its 18-trit width from there:

18 trits per word, signed range ±193,710,244.
9 trits per address, memory of 19,683 word slots. Same order as LC-3's 65,536-word space.
9 registers, encoded in 2 trits per field (3² = 9 distinct values).
3 trits per opcode, giving a range of −13..+13. Seventeen ops are defined; the rest are unused.

Words are stored host-side as i32. Trit-level encoding only happens at instruction decode and at the .o3 file boundary; the inner loop sees plain integers. The code mostly looks like a binary VM with a few unfamiliar arithmetic helpers around the edges.

The instruction worth the project: BR3

The encoding for BR3 packs three signed PC-relative offsets into a single 18-trit word:

The 18-trit BR3 instruction. Three offsets fit into one word; an offset of zero falls through.

The reach is asymmetric on purpose. 4-trit offsets (±40) cover the cases that often resolve to short jumps (return-to-loop, fall-through); 5 trits (±121) covers the case that more often hops past a block. Not a deep claim about real workloads, just a pragmatic budget for the trits that fit.

An LC-3-style equivalent of BR3 exists as a sequence: load the comparison, examine the sign, branch on each flag in turn. Three instructions minimum, often four. The ternary ISA collapses that to one because the source register's sign is already a three-valued thing; the dispatch is mechanical once the ALU is balanced.

Every binary branch is a fork. A trit's sign is already a trichotomy, so one instruction can dispatch three ways.

Where balanced ternary actually feels different

A handful of places where the binary mental model doesn't quite carry over:

NEG is genuinely one instruction. In binary, NEG is "flip every bit and add 1" because two's complement biases the encoding. In balanced ternary, negation is just flipping the sign of every trit. NEG R1, R2 writes −R2 into R1 with no fixup pass.
Subtraction has no special case. Because arithmetic is symmetric, ADD R1, R2, #-5 just works: the immediate is a balanced-ternary value, the carry is also balanced, the result is whatever it is. The sign of the result lands directly in the source register, ready for BR3.
MIN and MAX are first-class. The tritwise analogues of AND and OR (sometimes called Łukasiewicz operators) generalise Boolean logic to three values. Useful in their own right, and the same algebra SQL uses for NULL three-valued logic. MIN(+1, −1) = −1; MAX(0, −1) = 0. Cleaner than emulating them on top of bitwise ops.
Rounding is truncation. Right-shifting (dividing by 3) of a balanced-ternary number rounds to nearest, with no bias. Binary right-shift always rounds toward negative infinity for negative numbers; programmers learn to compensate. Ternary doesn't make you compensate.

An encoding bug that doesn't exist in binary

The first iteration of the opcode table assigned values 0..14 plus −1 and −2: seventeen ops, exactly what the ISA needs. The first run of the assembler and VM produced halted: BadOpcode on the very first instruction.

The reason: a 3-trit balanced field has range −13..+13, not −13..+14. JMP = 14 doesn't fit. The opcode value silently wraps when encoded, and the VM's range check catches it. Renumbering moves jmp, jsr, and trap to negative codes (−1, −2, −3) so everything in {0..13} stays positive. Seventeen ops in the −3..+13 window.

What makes this bug ternary-specific is that the binary equivalent doesn't exist. A 4-bit opcode field has range 0..15, and any value in that range is a valid bit pattern. The off-by-one between "values that fit balanced" and "values that fit unsigned" is the kind of small thing you trip over once when designing ternary instruction formats. The validation code in the dispatch loop made it a one-line fix instead of a half-hour debugging session.

A small language whose only branch is `match sign`

BR3 is the instruction that makes this ISA distinctive, so the follow-up was a small high-level language whose only branching primitive is match sign(expr), compiling directly to one BR3. There is no if/else. Once two-way branches are allowed, the compiler emits the same BR3 anyway and the language stops feeling distinctively ternary.

What that looks like in source:

fn classify(x: int) -> int {
  match sign(x) {
    neg  => -x,
    zero => 0,
    pos  => x,
  }
}

fn cmp(a: int, b: int) -> int {
  match sign(a + -b) {
    neg  => -1,
    zero => 0,
    pos  => 1,
  }
}

The rule is enforced by the grammar: the only conditional construct is match sign, with three mandatory arms. Loops are loop { ... } with explicit break, and the loop-exit condition has to be expressed in terms of the sign trichotomy:

let n = 3;
loop {
  match sign(n) {
    neg  => { break; },
    zero => { putc(n + 48); putc(10); break; },
    pos  => { putc(n + 48); putc(10); n = n + -1; },
  }
}

The verbosity is the point. In a binary language you'd write while n > 0 and never think about the difference between zero and negative. Here you have to spell it out, which is exactly what the hardware sees. The compiler emits one BR3 per match, and that's the only branch instruction it ever produces.

Tooling shape: the compiler emits .s3 text and hands off to the assembler, so the pipeline is now three stages instead of two:

foo.t3 → zcomp3 → foo.s3 → zasm3 → foo.o3 → zvm3 → output

About 750 lines for the compiler, same order of magnitude as the assembler. Single-pass codegen (no AST, no IR), variables live in memory at fixed slots, calling convention passes args in R0 to R3 with the return value in R0. No software stack means no recursion in v1; everything is iterative. The generated assembly is mechanical and verbose, but it assembles, links, and runs without hand fixups.

A binary instruction family on the same registers

Can a single VM run both ternary and binary code? zvm3 already runs on a binary host with 18-trit values stored as plain i32, so the bits are just bits; only the operation that reads them decides whether they represent a balanced-ternary number or a two's-complement integer. Adding a binary family is roughly five new opcodes:

AND, OR, XOR, BNOT: bitwise ops over the same i32 storage that ADD/MIN/MAX already use.
BR with an nzp flag mask, plus the seven combined mnemonics (BRn, BRz, BRp, BRnz, BRnp, BRzp, BRnzp) that match LC-3's branch idiom.

The BR form deliberately departs from LC-3 in one detail: it takes an explicit source register. zvm3 has no implicit condition register, so every dispatch (both BR3 and the new BR) is a named sign check on a named register. Small pedagogical difference, but real; nothing in the machine moves silently.

The strongest hybrid program in the project is a tic-tac-toe winner check. Each cell is one trit (+1=X, −1=O, 0=empty), and a line of three cells sums to exactly ±3 only when all three are non-empty and identical. Empty cells drag the magnitude below 3 automatically; mixed cells cancel. The win predicate is a single arithmetic test, and it's a property of balanced ternary that two's-complement can't match.

The binary half accumulates two facts ("did X win any line?" and "did O win any line?") into a 2-bit flag using OR-with-immediate across all eight winning lines. After the loop the decoder is the tightest sequence in the project:

AND R0, R6, #2     ; binary view: extract X bit (0 or 2)
AND R1, R6, #1     ; binary view: extract O bit (0 or 1)
NEG R1, R1         ; ternary view: one-instruction negate
ADD R2, R0, R1     ; pos = X, neg = O, zero = none
BR3 R2, O_WINS, NO_WIN, X_WINS

Five instructions, three different interpretations of R6. Lines 1 and 2 read it as a binary bit-set. Line 3 negates one component using ternary's free NEG. Line 4 adds them as signed integers; line 5 dispatches on the sign of the result. Same register file throughout; only the opcode picks the lens.

Pure-binary and pure-ternary versions of the program both work, slightly worse. Pure ternary makes the eight-lines accumulator a four-arm BR3 chain instead of an OR-bitset. Pure binary makes the per-line predicate two BRzs instead of one cell-sum. Mixing the two gives you both wins; the cost is remembering which lens applies to which instruction.

A multiplication-free matmul

The most current use of the primitives this VM has is the trick BitNet b1.58 demonstrated in February 2024: when neural-network weights are restricted to {−1, 0, +1}, every weight × activation in a matrix multiplication collapses into selective addition. The matmul path becomes BR3 over add, skip, or NEG-then-add, with no integer multiplier required. That is exactly what zvm3 has.

The demo is a 3×3 ternary weight matrix times a 3-vector of integer activations. Each weight × activation sub-expression compiles to a five-instruction fall-through pattern in which the SUB arm intentionally flows into the ADD arm:

LDR R2, R1, #0           ; weight (a trit: -1, 0, +1)
LD  R3, X_0              ; activation (a small integer)
BR3 R2, SUB_0, SKP_0, ADD_0
SUB_0   NEG R3, R3       ; SUB arm falls through to ADD with the negation
ADD_0   ADD R0, R0, R3   ; weighted accumulation
SKP_0                    ; (zero arm lands here, no work done)

At most one NEG, one ADD, plus the loads. No MUL opcode, no shifter, no condition register. Three multiplications per output cell, three output cells, all in a 70-word program. Across seven test cases (identity, zeros, negative-identity, cyclic shift, cancellation patterns), every output came out arithmetically correct.

That is the connection between this VM and present-day work. The Setun history is interesting, the SQL three-valued logic is useful in databases, and the BitNet line is the recent reason the encoding matters again. Replace transformer multiplications with conditional adds, recover competitive results at lower cost. zvm3 won't outrun a real BitNet kernel, but it does make legible what such a kernel is doing: BR3 over add/skip/subtract, looped over a tensor.

Restrict weights to minus one, zero, plus one and the multiplier disappears. Every multiply becomes a three-way branch over add, skip, or negate-then-add.

Where ternary's structure pays off

Beyond the demos: is there actually mathematics that's simpler in ternary than in binary? Not new mathematics, but a list of domains where ternary's structure matches the problem:

Free negation. Sign-flip is per-trit; no two's-complement +1 fixup. The savings compound in algorithms with frequent sign changes (Karatsuba multiplication, CORDIC iterations, sign-flip steps in numerical methods).
Unbiased division by 3. Truncation rounds correctly. Arithmetic right-shift of a negative two's-complement value rounds toward negative infinity; the equivalent operation on a balanced-ternary value rounds to nearest. Matters in fixed-point DSP and any algorithm that divides while looking at signed residuals.
Multiplication-free linear algebra with weights in {−1, 0, +1}: the BitNet trick the matmul demo demonstrates. Currently being adopted as a quantization target for transformer inference in research and a few open-source projects.
Compressed sensing with sparse ternary sensing matrices: a known construction in the literature with recovery guarantees comparable to Gaussian sensing matrices, but easier to implement in measurement hardware.
Fractal geometry. The middle-thirds Cantor set is the set of reals whose ternary expansion has no 1 digit. Many self-similar constructions (Sierpinski gasket, Koch curve, Devil's staircase) are one-line in ternary and awkward in binary.
Qutrit quantum computing. Three-level quantum systems give more compact circuits for some algorithms (qudit Deutsch-Jozsa, qudit phase estimation). Active research direction.
Lattice cryptography. NTRU and several post-quantum schemes use {−1, 0, +1} polynomial coefficients as small-noise distributions; the sparsity of those distributions is one of the parameters the security argument turns on.

None of these requires a new mathematical universe. They're domains where the operational primitives this VM exposes (sign-of-difference dispatch, free negation, ternary-weight inner products) line up with what the problem actually needs. The list is long enough that ternary stops being a single hook and starts looking like a family of related techniques, several of them being adopted in practice.

What runs end-to-end

Twelve small programs exercise the toolchain. Nine are hand-written in assembly, three are written in the high-level language:

hello.s3: LEA/PUTS/HALT, prints a greeting.
count.s3: counts down 3, 2, 1, 0 using a BR3 loop.
sign.s3: reads a character, classifies as below/equal/above 'M' in one BR3 instruction.
sort3.s3: selection-sort of three numbers via three compare-and-swap calls. CSWAP is a JSR/RET subroutine that exercises LDR, STR, NEG, ADD, and BR3 in 7 instructions.
calc.s3: a single-digit calculator. Reads d op d, supports + and −, prints the signed result. Verified across 11 inputs spanning every sign/magnitude combination.
parity.s3: small hybrid. AND extracts the low bit (binary view), BR3 dispatches even/odd (ternary view). Verified across all 10 digits.
flags.s3: purely binary chmod-style permission decoder using AND with BRz and BRnzp. All eight bit patterns verified, from --- through rwx.
ttt.s3: the substantive hybrid. Tic-tac-toe winner check. Per-line predicate uses balanced-ternary cell sums (±3 when all match); the across-lines accumulator and the result decoder use OR/AND/NEG/BR3. Verified on seven board configurations.
matmul.s3: multiplication-free 3×3 ternary matmul times a 3-vector. The BitNet b1.58 trick. Each weight × activation collapses into BR3 over add / skip / subtract, no MUL opcode required. Verified across seven matrices including identity, zeros, negative-identity, and cyclic shift.
abs.t3: classify(x) returning -x/0/x. Uses match sign as the function's return expression, compiles to a single BR3.
cmp.t3: two-arg cmp(a, b) returning −1/0/1 from the sign of the difference. Main's output uses a separate match for character dispatch (so two BR3s per run).
count.t3: counted-down loop in the high-level language. Produces output identical to count.s3, generated mechanically by the compiler.

The calculator is the most concrete demonstration that "no two's complement" isn't a decorative claim. To subtract, negate the second operand and add:

; If op is '-', negate the second operand. Then always add.
ADD R6, R2, #-45                ; R6 = op - '-'
BR3 R6, NOT_MINUS, IS_MINUS, NOT_MINUS
IS_MINUS NEG R3, R3
NOT_MINUS

ADD R5, R1, R3                  ; R5 = result

The LC-3 (or any two's-complement ISA) equivalent of NEG R3, R3 is NOT R3, R3 ; ADD R3, R3, #1: two instructions for what zvm3 does in one. Trivial savings on any single subtraction, mildly annoying repetition once you've written it ten times.

The same single-instruction win shows up in sort3.s3's comparator. NEG R6, R4 ; ADD R6, R3, R6 is the entire "compute A − B" sequence, which immediately feeds into a BR3 dispatching on the sign of the difference. Two-instruction subtraction plus one-instruction three-way branch is the closest the project gets to "this ISA was actually designed for the workload."

What it doesn't have

Two real gaps. First, no port of a real-world ROM. Setun programs from the 1960s exist as schematics and instruction listings, not as files anyone hosts; running something at the scale of the LC-3 2048 demo from the previous post would mean writing it from scratch.

Second, this isn't fast in any absolute sense. The dispatch loop converts each instruction word to a trit array on every fetch, extracts fields by trit index, then does plain i32 arithmetic. Roughly an order of magnitude slower than the LC-3 VM at the same task. Tolerable for a teaching tool: still hundreds of thousands of instructions per second, enough to drive an interactive demo. No one's going to run a workload on it.

The educational payoff is the BR3 instruction itself, and the moment in the assembler where writing NEG R1, R2 just works without any thought about two's complement. Both are concrete enough to remember after the implementation is done.

The wider framing (radix economy, the Setun history, why BitNet b1.58 in 2024 made ternary weights matter again for AI) is well-trodden elsewhere. This post is the specific report on what one Zig-hosted balanced-ternary VM ended up looking like.

A balanced-ternary VM, hosted in Zig, reusing the LC-3 scaffolding (assembler, image format, fetch/decode/dispatch) but with three values per digit instead of two. The whole question is what changes at the ISA level, and the sharpest answer is a single instruction.

BR3 is the payoff. One instruction packs three PC-relative offsets and dispatches on the sign of a register: negative, zero, or positive each go somewhere. A trit's sign is already three-valued, so the branch is mechanical. The binary equivalent is three or four instructions: load, test the sign, branch per flag.

The number system removes fixups. Negation flips the sign of every trit, so NEG is one instruction with no two's-complement plus-one. Subtraction is negate-then-add, no special case. Dividing by three truncates and rounds to nearest, with none of binary's round-toward-minus-infinity bias.

Restrict weights to minus one, zero, plus one and the multiplier disappears: every multiply becomes a branch over add, skip, or negate-then-add.

Sizing and a ternary-only bug

Words are 18 trits, addresses 9 trits (19,683 slots), 9 registers, 3-trit opcodes. Words live host-side as i32; trit encoding only happens at decode and at the file boundary, so the inner loop is plain integers. One bug has no binary analogue: a 3-trit balanced field runs minus 13 to plus 13, so an opcode of 14 silently wraps. Renumbering three ops to negative codes fixed it.

What it demonstrates

A tiny language whose only branch is match sign(expr) with three mandatory arms, compiling one-to-one to BR3. No if/else, or it stops feeling ternary.
A binary opcode family (AND, OR, XOR, plus an LC-3-style BR) over the same registers, so one program can read a register as bits or as a balanced number.
A multiplication-free matmul, the BitNet b1.58 trick: ternary weights turn each multiply into BR3 over add, skip, or subtract.

What it isn't

No real-world ROM port (Setun programs survive as schematics, not files) and it is about an order of magnitude slower than the LC-3 VM, since it rebuilds a trit array on every fetch. Fine for teaching, useless for workloads. The payoff is conceptual: BR3, and writing NEG that just works.

Sources

N. P. Brusentsov's Setun (Moscow State University, 1958): the historical balanced-ternary computer, source of the 18-trit word width used here.
Ma et al., "The Era of 1-bit LLMs: BitNet b1.58" (2024): transformer weights restricted to {-1, 0, +1}, turning matrix multiplication into selective addition, the recent reason ternary encodings matter again.
The companion piece, the LC-3 toolchain on Zig 0.16: the binary VM this one mirrors, with the same assembler and dispatch scaffolding.

What balanced ternary actually buys, in one instruction