Skip to content

Build a Bytecode VM

A bytecode VM is a small machine inside your program.

A bytecode VM is a small machine inside your program.

It does not run source code directly. It runs simple instructions called bytecode.

For example, instead of running this text:

1 + 2

A VM might run these instructions:

push 1
push 2
add
print

The VM reads one instruction at a time and changes its internal state.

The Goal

We will build a tiny stack-based VM.

It will support:

push integer
add
subtract
multiply
divide
print
halt

The program:

push 1
push 2
add
print
halt

will print:

3

Instructions

Start with an enum:

const OpCode = enum(u8) {
    push,
    add,
    sub,
    mul,
    div,
    print,
    halt,
};

Each opcode is one instruction.

Some instructions need extra data. push needs a number.

So we define an instruction struct:

const Instruction = struct {
    op: OpCode,
    value: i64 = 0,
};

For push, value matters.

For add, print, and halt, value is ignored.

The VM State

A stack VM needs a stack.

const VM = struct {
    stack: [256]i64,
    stack_top: usize,
    instructions: []const Instruction,
    ip: usize,
};

The fields mean:

stack         temporary values
stack_top     next free stack slot
instructions  program bytecode
ip            instruction pointer

The instruction pointer tells the VM which instruction to run next.

Initialize the VM

fn init(instructions: []const Instruction) VM {
    return .{
        .stack = undefined,
        .stack_top = 0,
        .instructions = instructions,
        .ip = 0,
    };
}

At the beginning, the stack is empty and ip points to instruction 0.

Stack Operations

The VM needs push and pop.

fn push(self: *VM, value: i64) !void {
    if (self.stack_top >= self.stack.len) {
        return error.StackOverflow;
    }

    self.stack[self.stack_top] = value;
    self.stack_top += 1;
}

This stores the value and moves stack_top forward.

Now pop:

fn pop(self: *VM) !i64 {
    if (self.stack_top == 0) {
        return error.StackUnderflow;
    }

    self.stack_top -= 1;
    return self.stack[self.stack_top];
}

The last pushed value is the first value returned.

That is why this is called a stack.

Running Instructions

The VM runs a loop:

fetch instruction
execute instruction
repeat

Add this method:

fn run(self: *VM) !void {
    while (self.ip < self.instructions.len) {
        const instruction = self.instructions[self.ip];
        self.ip += 1;

        switch (instruction.op) {
            .push => try self.push(instruction.value),

            .add => {
                const b = try self.pop();
                const a = try self.pop();
                try self.push(a + b);
            },

            .sub => {
                const b = try self.pop();
                const a = try self.pop();
                try self.push(a - b);
            },

            .mul => {
                const b = try self.pop();
                const a = try self.pop();
                try self.push(a * b);
            },

            .div => {
                const b = try self.pop();
                const a = try self.pop();

                if (b == 0) {
                    return error.DivisionByZero;
                }

                try self.push(@divTrunc(a, b));
            },

            .print => {
                const value = try self.pop();
                std.debug.print("{d}\n", .{value});
            },

            .halt => return,
        }
    }
}

Notice the order in subtraction and division:

const b = try self.pop();
const a = try self.pop();

The right operand is popped first.

For:

push 10
push 3
sub

The stack has 10, then 3.

sub computes:

10 - 3

not:

3 - 10

Complete Program

Put this in src/main.zig:

const std = @import("std");

const OpCode = enum(u8) {
    push,
    add,
    sub,
    mul,
    div,
    print,
    halt,
};

const Instruction = struct {
    op: OpCode,
    value: i64 = 0,
};

const VM = struct {
    stack: [256]i64,
    stack_top: usize,
    instructions: []const Instruction,
    ip: usize,

    fn init(instructions: []const Instruction) VM {
        return .{
            .stack = undefined,
            .stack_top = 0,
            .instructions = instructions,
            .ip = 0,
        };
    }

    fn push(self: *VM, value: i64) !void {
        if (self.stack_top >= self.stack.len) {
            return error.StackOverflow;
        }

        self.stack[self.stack_top] = value;
        self.stack_top += 1;
    }

    fn pop(self: *VM) !i64 {
        if (self.stack_top == 0) {
            return error.StackUnderflow;
        }

        self.stack_top -= 1;
        return self.stack[self.stack_top];
    }

    fn run(self: *VM) !void {
        while (self.ip < self.instructions.len) {
            const instruction = self.instructions[self.ip];
            self.ip += 1;

            switch (instruction.op) {
                .push => try self.push(instruction.value),

                .add => {
                    const b = try self.pop();
                    const a = try self.pop();
                    try self.push(a + b);
                },

                .sub => {
                    const b = try self.pop();
                    const a = try self.pop();
                    try self.push(a - b);
                },

                .mul => {
                    const b = try self.pop();
                    const a = try self.pop();
                    try self.push(a * b);
                },

                .div => {
                    const b = try self.pop();
                    const a = try self.pop();

                    if (b == 0) {
                        return error.DivisionByZero;
                    }

                    try self.push(@divTrunc(a, b));
                },

                .print => {
                    const value = try self.pop();
                    std.debug.print("{d}\n", .{value});
                },

                .halt => return,
            }
        }
    }
};

pub fn main() !void {
    const program = [_]Instruction{
        .{ .op = .push, .value = 1 },
        .{ .op = .push, .value = 2 },
        .{ .op = .add },
        .{ .op = .print },
        .{ .op = .halt },
    };

    var vm = VM.init(&program);
    try vm.run();
}

Run:

zig build run

Output:

3

A More Interesting Program

Try this:

const program = [_]Instruction{
    .{ .op = .push, .value = 10 },
    .{ .op = .push, .value = 3 },
    .{ .op = .sub },
    .{ .op = .push, .value = 4 },
    .{ .op = .mul },
    .{ .op = .print },
    .{ .op = .halt },
};

This means:

(10 - 3) * 4

Output:

28

The VM evaluates the expression using the stack.

What the Stack Looks Like

For this program:

push 1
push 2
add
print

The stack changes like this:

start:  []

push 1: [1]

push 2: [1, 2]

add:    [3]

print:  []

The add instruction removes two values and pushes one result.

That pattern appears often in stack VMs.

Add Tests

Add these tests:

test "push and pop" {
    const program = [_]Instruction{};
    var vm = VM.init(&program);

    try vm.push(42);
    const value = try vm.pop();

    try std.testing.expectEqual(@as(i64, 42), value);
}

test "addition program leaves result on stack" {
    const program = [_]Instruction{
        .{ .op = .push, .value = 1 },
        .{ .op = .push, .value = 2 },
        .{ .op = .add },
        .{ .op = .halt },
    };

    var vm = VM.init(&program);
    try vm.run();

    const result = try vm.pop();
    try std.testing.expectEqual(@as(i64, 3), result);
}

test "division by zero fails" {
    const program = [_]Instruction{
        .{ .op = .push, .value = 1 },
        .{ .op = .push, .value = 0 },
        .{ .op = .div },
        .{ .op = .halt },
    };

    var vm = VM.init(&program);

    try std.testing.expectError(error.DivisionByZero, vm.run());
}

Run:

zig build test

Why This Is Called Bytecode

Our Instruction struct is easy to read, but it is not compact.

Real bytecode often stores instructions in a byte array:

opcode byte
optional operand bytes
opcode byte
optional operand bytes

Example:

01 00 00 00 2a

This might mean:

push 42

The opcode is one byte. The number is stored after it.

Our version uses a Zig struct so beginners can see the idea clearly before dealing with binary encoding.

Why Stack VMs Are Popular

A stack VM is simple.

Instructions do not need to name registers.

For example, add just means:

pop two values
add them
push the result

A register VM might say:

r3 = r1 + r2

Register VMs can be faster in some cases, but stack VMs are easier to implement first.

Many language implementations begin with a stack VM because the architecture is small and teachable.

What You Learned

You built a tiny bytecode virtual machine.

You defined opcodes.

You represented instructions.

You stored VM state.

You implemented a stack.

You wrote the fetch-execute loop.

You handled runtime errors like stack underflow and division by zero.

This is the core of many interpreters. A real language VM adds variables, functions, jumps, objects, strings, closures, garbage collection, and debugging support. The center is still the same: read an instruction, execute it, move to the next one.