A follow-up to the LC-3 toolchain: a balanced-ternary VM with the same scaffolding (assembler, image format, fetch/decode/dispatch loop) but a different number system underneath. The question this answers is what changes at the ISA level when each digit takes three values instead of two. The most concrete answer the project produced is one instruction:

BR3 R1, BELOW, EQUAL, ABOVE

One instruction, three branch offsets, dispatch on the sign of R1. Negative goes to BELOW, zero to EQUAL, positive to ABOVE. There is no binary equivalent. Every binary-ISA branch is a fork, never a trichotomy, and writing one explains why an exercise like this is worth doing.

Sizing

The 1958 Setun, with its 9-trit short word and 18-trit long word, remains the historical reference for ternary instruction set design. The project takes its 18-trit width from there:

Words are stored host-side as i32. Trit-level encoding only happens at instruction decode and at the .o3 file boundary; the inner loop sees plain integers. The code mostly looks like a binary VM with a few unfamiliar arithmetic helpers around the edges.

The instruction worth the project: BR3

The encoding for BR3 packs three signed PC-relative offsets into a single 18-trit word:

opcode 3 trits sr 2 trits neg_off 4 trits zero_off 4 trits pos_off 5 trits trit 17 trit 0 if R1 < 0 if R1 == 0 if R1 > 0 one instruction word, three landing sites
The 18-trit BR3 instruction. Three offsets fit into one word; an offset of zero falls through.

The reach is asymmetric on purpose. 4-trit offsets (±40) cover the cases that often resolve to short jumps (return-to-loop, fall-through); 5 trits (±121) covers the case that more often hops past a block. Not a deep claim about real workloads, just a pragmatic budget for the trits that fit.

An LC-3-style equivalent of BR3 exists as a sequence: load the comparison, examine the sign, branch on each flag in turn. Three instructions minimum, often four. The ternary ISA collapses that to one because the source register's sign is already a three-valued thing; the dispatch is mechanical once the ALU is balanced.

Where balanced ternary actually feels different

A handful of places where the binary mental model doesn't quite carry over:

An encoding bug that doesn't exist in binary

The first iteration of the opcode table assigned values 0..14 plus −1 and −2: seventeen ops, exactly what the ISA needs. The first run of the assembler and VM produced halted: BadOpcode on the very first instruction.

The reason: a 3-trit balanced field has range −13..+13, not −13..+14. JMP = 14 doesn't fit. The opcode value silently wraps when encoded, and the VM's range check catches it. Renumbering moves jmp, jsr, and trap to negative codes (−1, −2, −3) so everything in {0..13} stays positive. Seventeen ops in the −3..+13 window.

What makes this bug ternary-specific is that the binary equivalent doesn't exist. A 4-bit opcode field has range 0..15, and any value in that range is a valid bit pattern. The off-by-one between "values that fit balanced" and "values that fit unsigned" is the kind of small thing you trip over once when designing ternary instruction formats. The validation code in the dispatch loop made it a one-line fix instead of a half-hour debugging session.

A small language whose only branch is match sign

BR3 is the instruction that makes this ISA distinctive, so the follow-up was a small high-level language whose only branching primitive is match sign(expr), compiling directly to one BR3. There is no if/else. Once two-way branches are allowed, the compiler emits the same BR3 anyway and the language stops feeling distinctively ternary.

What that looks like in source:

fn classify(x: int) -> int {
  match sign(x) {
    neg  => -x,
    zero => 0,
    pos  => x,
  }
}

fn cmp(a: int, b: int) -> int {
  match sign(a + -b) {
    neg  => -1,
    zero => 0,
    pos  => 1,
  }
}

The rule is enforced by the grammar: the only conditional construct is match sign, with three mandatory arms. Loops are loop { ... } with explicit break, and the loop-exit condition has to be expressed in terms of the sign trichotomy:

let n = 3;
loop {
  match sign(n) {
    neg  => { break; },
    zero => { putc(n + 48); putc(10); break; },
    pos  => { putc(n + 48); putc(10); n = n + -1; },
  }
}

The verbosity is the point. In a binary language you'd write while n > 0 and never think about the difference between zero and negative. Here you have to spell it out, which is exactly what the hardware sees. The compiler emits one BR3 per match, and that's the only branch instruction it ever produces.

Tooling shape: the compiler emits .s3 text and hands off to the assembler, so the pipeline is now three stages instead of two:

foo.t3 → zcomp3 → foo.s3 → zasm3 → foo.o3 → zvm3 → output

About 750 lines for the compiler, same order of magnitude as the assembler. Single-pass codegen (no AST, no IR), variables live in memory at fixed slots, calling convention passes args in R0–R3 with the return value in R0. No software stack means no recursion in v1; everything is iterative. The generated assembly is mechanical and verbose, but it assembles, links, and runs without hand fixups.

A binary instruction family on the same registers

Can a single VM run both ternary and binary code? zvm3 already runs on a binary host with 18-trit values stored as plain i32, so the bits are just bits; only the operation that reads them decides whether they represent a balanced-ternary number or a two's-complement integer. Adding a binary family is roughly five new opcodes:

The BR form deliberately departs from LC-3 in one detail: it takes an explicit source register. zvm3 has no implicit condition register, so every dispatch (both BR3 and the new BR) is a named sign check on a named register. Small pedagogical difference, but real; nothing in the machine moves silently.

The strongest hybrid program in the project is a tic-tac-toe winner check. Each cell is one trit (+1=X, −1=O, 0=empty), and a line of three cells sums to exactly ±3 only when all three are non-empty and identical. Empty cells drag the magnitude below 3 automatically; mixed cells cancel. The win predicate is a single arithmetic test, and it's a property of balanced ternary that two's-complement can't match.

The binary half accumulates two facts ("did X win any line?" and "did O win any line?") into a 2-bit flag using OR-with-immediate across all eight winning lines. After the loop the decoder is the tightest sequence in the project:

AND R0, R6, #2     ; binary view: extract X bit (0 or 2)
AND R1, R6, #1     ; binary view: extract O bit (0 or 1)
NEG R1, R1         ; ternary view: one-instruction negate
ADD R2, R0, R1     ; pos = X, neg = O, zero = none
BR3 R2, O_WINS, NO_WIN, X_WINS

Five instructions, three different interpretations of R6. Lines 1–2 read it as a binary bit-set. Line 3 negates one component using ternary's free NEG. Line 4 adds them as signed integers; line 5 dispatches on the sign of the result. Same register file throughout; only the opcode picks the lens.

Pure-binary and pure-ternary versions of the program both work, slightly worse. Pure ternary makes the eight-lines accumulator a four-arm BR3 chain instead of an OR-bitset. Pure binary makes the per-line predicate two BRzs instead of one cell-sum. Mixing the two gives you both wins; the cost is remembering which lens applies to which instruction.

A multiplication-free matmul

The most current use of the primitives this VM has is the trick BitNet b1.58 demonstrated in February 2024: when neural-network weights are restricted to {−1, 0, +1}, every weight × activation in a matrix multiplication collapses into selective addition. The matmul path becomes BR3 over add, skip, or NEG-then-add, with no integer multiplier required. That is exactly what zvm3 has.

The demo is a 3×3 ternary weight matrix times a 3-vector of integer activations. Each weight × activation sub-expression compiles to a five-instruction fall-through pattern in which the SUB arm intentionally flows into the ADD arm:

LDR R2, R1, #0           ; weight (a trit: -1, 0, +1)
LD  R3, X_0              ; activation (a small integer)
BR3 R2, SUB_0, SKP_0, ADD_0
SUB_0   NEG R3, R3       ; SUB arm falls through to ADD with the negation
ADD_0   ADD R0, R0, R3   ; weighted accumulation
SKP_0                    ; (zero arm lands here, no work done)

At most one NEG, one ADD, plus the loads. No MUL opcode, no shifter, no condition register. Three multiplications per output cell, three output cells, all in a 70-word program. Across seven test cases (identity, zeros, negative-identity, cyclic shift, cancellation patterns), every output came out arithmetically correct.

That is the connection between this VM and present-day work. The Setun history is interesting, the SQL three-valued logic is useful in databases, and the BitNet line is the recent reason the encoding matters again. Replace transformer multiplications with conditional adds, recover competitive results at lower cost. zvm3 won't outrun a real BitNet kernel, but it does make legible what such a kernel is doing: BR3 over add/skip/subtract, looped over a tensor.

Where ternary's structure pays off

Beyond the demos: is there actually mathematics that's simpler in ternary than in binary? Not new mathematics, but a list of domains where ternary's structure matches the problem:

None of these requires a new mathematical universe. They're domains where the operational primitives this VM exposes (sign-of-difference dispatch, free negation, ternary-weight inner products) line up with what the problem actually needs. The list is long enough that ternary stops being a single hook and starts looking like a family of related techniques, several of them being adopted in practice.

What runs end-to-end

Twelve small programs exercise the toolchain. Nine are hand-written in assembly, three are written in the high-level language:

The calculator is the most concrete demonstration that "no two's complement" isn't a decorative claim. To subtract, negate the second operand and add:

; If op is '-', negate the second operand. Then always add.
ADD R6, R2, #-45                ; R6 = op - '-'
BR3 R6, NOT_MINUS, IS_MINUS, NOT_MINUS
IS_MINUS NEG R3, R3
NOT_MINUS

ADD R5, R1, R3                  ; R5 = result

The LC-3 (or any two's-complement ISA) equivalent of NEG R3, R3 is NOT R3, R3 ; ADD R3, R3, #1: two instructions for what zvm3 does in one. Trivial savings on any single subtraction, mildly annoying repetition once you've written it ten times.

The same single-instruction win shows up in sort3.s3's comparator. NEG R6, R4 ; ADD R6, R3, R6 is the entire "compute A − B" sequence, which immediately feeds into a BR3 dispatching on the sign of the difference. Two-instruction subtraction plus one-instruction three-way branch is the closest the project gets to "this ISA was actually designed for the workload."

What it doesn't have

Two real gaps. First, no port of a real-world ROM. Setun programs from the 1960s exist as schematics and instruction listings, not as files anyone hosts; running something at the scale of the LC-3 2048 demo from the previous post would mean writing it from scratch.

Second, this isn't fast in any absolute sense. The dispatch loop converts each instruction word to a trit array on every fetch, extracts fields by trit index, then does plain i32 arithmetic. Roughly an order of magnitude slower than the LC-3 VM at the same task. Tolerable for a teaching tool: still hundreds of thousands of instructions per second, enough to drive an interactive demo. No one's going to run a workload on it.

The educational payoff is the BR3 instruction itself, and the moment in the assembler where writing NEG R1, R2 just works without any thought about two's complement. Both are concrete enough to remember after the implementation is done.

The wider framing (radix economy, the Setun history, why BitNet b1.58 in 2024 made ternary weights matter again for AI) is well-trodden elsewhere. This post is the specific report on what one Zig-hosted balanced-ternary VM ended up looking like.