Conway on the GPU

Same field as the Canvas2D version, different engine. This page runs Conway's Game of Life entirely on the GPU: two RGBA16F textures hold the state, four fragment shaders take turns reading and writing them, and the CPU spends its time on user input and not much else. Move the mouse to paint, hold ctrl (or cmd) and drag to tilt the field in 3D.

The Canvas2D version was already fast enough on a desktop, but it scales badly. Each visible cell costs a drawImage call; a 1080p field at cellSize=14 has about 10,000 cells, of which 1,000-3,000 are visible in steady state, so the CPU is dispatching that many state-changes plus blits per frame. Doubling the resolution doubles the cost linearly. On a GPU the whole grid is one full-screen quad. The cost is a fixed handful of drawArrays calls per frame, near-independent of cell count.

State as a texture

The interesting thing about Conway on a GPU is that the state is already a 2D array. Pack it into a texture and the cellular update becomes a fragment shader: for each output texel, read the eight neighbours of the corresponding texel in the input texture, apply the B3/S23 rule, write the result. Wrap with fract() for the toroidal topology.

Four channels of RGBA16F give us enough room to carry everything in one texture:

R: alive (0 or 1). The Conway grid itself.
G: display fade (0..1). Smoothly ramps toward R; this is what actually gets rendered.
B: noise flash (0..1). Decays toward 0 independently; the subliminal shimmer channel.
A: spare.

Two texture instances (call them A and B) plus two framebuffers means we can ping-pong: each pass reads from A, renders into B, then the two swap. No CPU readback. The state never leaves the GPU.

Four passes per frame

A frame runs these passes, in order, on the same single full-screen quad:

Step, only on tick boundaries. The fragment shader reads its texel's R and the eight neighbours' R, applies B3/S23, writes the new R, carries G and B through unchanged.
Fade, every frame. Reads R and G, ramps G toward R at fadeIn or fadeOut rate (modulated by speed), and decays B toward 0.
Paint, only when there are pending events. The shader takes a paint position and a radius in cell coords and writes new values inside the disk. Used both for mouse strokes (alive=1, boost G) and for the subliminal noise (alive=0, bump B). The CPU generates a handful of random positions per frame for the noise channel.
Render, every frame. Samples the state texture at grid resolution, applies a signed-distance-field mask to give each cell its rounded shape, composites peakAlpha * G for live cells plus noiseAlpha * B for the shimmer. The mask is computed at screen resolution with fwidth, so the cell edges anti-alias correctly at any DPR.

The vertex shader is the same for all four passes: a single triangle strip drawn from {-1,-1, 1,-1, -1,1, 1,1}, output position is the input, UVs are the input scaled and biased to 0..1. Every pass is just "for each pixel, compute this".

The step shader

The Conway rule fits in about a dozen lines of GLSL. The trick is that the eight neighbour samples wrap automatically: textures are bound with REPEAT, so an out-of-range UV just wraps. No modulo, no conditional on the edges.

float aliveAt(vec2 uv) {
    return texture(uState, fract(uv)).r;
}

void main() {
    float a = texture(uState, vUV).r;
    float n = aliveAt(vUV + vec2(-uTexel.x, -uTexel.y))
            + aliveAt(vUV + vec2( 0.0,       -uTexel.y))
            + aliveAt(vUV + vec2( uTexel.x, -uTexel.y))
            + aliveAt(vUV + vec2(-uTexel.x,  0.0))
            + aliveAt(vUV + vec2( uTexel.x,  0.0))
            + aliveAt(vUV + vec2(-uTexel.x,  uTexel.y))
            + aliveAt(vUV + vec2( 0.0,        uTexel.y))
            + aliveAt(vUV + vec2( uTexel.x,  uTexel.y));
    bool alive = a > 0.5;
    bool willLive = abs(n - 3.0) < 0.5
                 || (alive && abs(n - 2.0) < 0.5);
    outColor = vec4(willLive ? 1.0 : 0.0, /* G, B, A passed through */);
}

The Canvas2D step ran in JavaScript with a scratch Uint8Array and a manual modulo-aware neighbour count, iterating about 10,000 cells. The GPU step touches every cell in parallel and finishes in microseconds.

The render shader

The bit that surprised me the most was how much nicer the cell rendering becomes when it's done in a shader. The Canvas2D version baked a per-cell sprite at the start, then blitted it per visible cell. Here the cell shape is a signed-distance-field computed at the output pixel resolution; the rounded corners are mathematically exact and anti-alias to whatever DPR the screen happens to use.

vec2 cellCoord = vUV * uGridSize;
vec2 cellIdx = floor(cellCoord);
vec2 inCell = cellCoord - cellIdx - 0.5;     // -0.5..0.5

// Rounded-rect SDF, halfExtent shrunk by gap and corner radius
// (`half` is reserved in GLSL ES 3.00).
float halfExtent = 0.5 - uCellGap * 0.5 - uCellRadius;
vec2 q = abs(inCell) - vec2(halfExtent);
float dist = length(max(q, 0.0))
           + min(max(q.x, q.y), 0.0)
           - uCellRadius;

float aa = fwidth(dist) * 0.7 + 0.001;
float mask = 1.0 - smoothstep(-aa, aa, dist);

Zero state changes per cell. Zero blits. The fragment shader runs once per output pixel, computes which cell it belongs to, samples that cell's state, applies the mask, blends.

Where this leaves the Canvas2D version

This version is feature-equivalent to the Canvas2D one: multi-channel birth flashes (separate ceilings for noise / pattern, mouse paint, and natural Conway births with the heartbeat-fix gating), locality-biased seed promotion done entirely in the paint shader, drop-pattern button, ctrl-drag 3D tilt, soft edge fade. State spans two RGBA16F textures: the original four channels for alive / display / noise / birth-pop, plus a second texture's R and G channels for paint-pop and the Conway-birth-pop with its negative-value-encodes-pending trick.

Where the Canvas2D version's draw loop chooses different sprites per channel and stacks drawImage overlays, the WebGL render shader collapses the same composition into one fragment: it samples both state textures at grid resolution and adds the channels' weighted contributions into a single output alpha. No per-cell branching, no state changes per cell.

The reason to do this at all isn't current performance, it's headroom. The Canvas2D version starts to feel its limits at cellSize=8 on a 4K monitor; the WebGL version is comfortable at cellSize=4 on the same screen. If you want a dense field with real glow and post-processing effects, the GPU is the only sensible place to do it.

Live controls

The same sliders as the Canvas2D post, driving the GPU simulation behind this page. Hold ctrl (or cmd) and drag anywhere to tilt the field in 3D. Hit "drop pattern" for a random glider, lightweight spaceship, acorn, or Gosper gun.

alpha 0.06

tickMs 2500

speed 1.00x

flashRate 0.060

flashAlpha 0.30

noiseAlpha 0.40

birthAlpha 0.40

paintAlpha 0.20

normalBirthAlpha 0.10

seedBoost 0.005

toneVariation 0.40

tintOpacity 75%

tintBlur 0.0px

renderScale 0.50x

The full simulation, including all four shader sources, fits in about 350 lines of code. The shaders themselves are short enough to read all of them in a sitting; most of the file is the WebGL boilerplate to compile, link, and bind them.

This is the loud, interactive end of the family. The quiet end is the dot field behind every page on the site, tunable on the background settings page, which grew out of the Canvas2D ambient version.