Benchmarking

A benchmark measures the cost of a program or operation. The result is useful only if the measurement is repeatable and the work being measured is clearly defined.

Small changes in allocation, copying, parsing, or cache behavior can change performance significantly. Zig keeps these operations visible, which makes benchmarking straightforward.

Suppose we want to measure two ways of summing integers in an array.

The first version uses indexing.

fn sumIndex(values: []const u64) u64 {
    var total: u64 = 0;

    var i: usize = 0;
    while (i < values.len) : (i += 1) {
        total += values[i];
    }

    return total;
}

The second version uses a for loop.

fn sumFor(values: []const u64) u64 {
    var total: u64 = 0;

    for (values) |value| {
        total += value;
    }

    return total;
}

A simple benchmark measures elapsed time around repeated execution.

const std = @import("std");

fn sumIndex(values: []const u64) u64 {
    var total: u64 = 0;

    var i: usize = 0;
    while (i < values.len) : (i += 1) {
        total += values[i];
    }

    return total;
}

fn sumFor(values: []const u64) u64 {
    var total: u64 = 0;

    for (values) |value| {
        total += value;
    }

    return total;
}

pub fn main() !void {
    var values: [100000]u64 = undefined;

    for (&values, 0..) |*value, i| {
        value.* = i;
    }

    const stdout = std.io.getStdOut().writer();

    var timer = try std.time.Timer.start();

    var result: u64 = 0;

    var n: usize = 0;
    while (n < 1000) : (n += 1) {
        result ^= sumIndex(&values);
    }

    const index_time = timer.read();

    timer.reset();

    n = 0;

    while (n < 1000) : (n += 1) {
        result ^= sumFor(&values);
    }

    const for_time = timer.read();

    try stdout.print("ignore: {d}\n", .{result});
    try stdout.print("index loop: {d} ns\n", .{index_time});
    try stdout.print("for loop:   {d} ns\n", .{for_time});
}

The benchmark fills an array with predictable values.

for (&values, 0..) |*value, i| {
    value.* = i;
}

The pointer syntax:

|*value, i|

binds value as a pointer to each array element.

The timer starts with:

var timer = try std.time.Timer.start();

The elapsed time is read in nanoseconds.

const index_time = timer.read();

The benchmark repeats each operation many times.

while (n < 1000) : (n += 1) {
    result ^= sumIndex(&values);
}

This is important. Very small operations complete too quickly to measure reliably once.

The variable result prevents the compiler from removing the computation entirely.

result ^= sumIndex(&values);

Without observable use of the result, an optimizing compiler may conclude that the loop has no effect and remove it.

Benchmarks should measure one thing at a time. If allocation cost matters, measure allocation separately from computation.

This benchmark mixes no file I/O, no printing, and no allocation inside the timed region. Only the summation loop is measured.

Build benchmarks in release mode.

zig build-exe main.zig -O ReleaseFast

Debug mode includes safety checks and disables many optimizations. It is useful for development, not performance measurement.

Different release modes change performance characteristics.

Mode	Purpose
`Debug`	safety and debugging
`ReleaseSafe`	optimized with safety checks
`ReleaseFast`	maximum optimization
`ReleaseSmall`	smaller binaries

A benchmark should record:

Question	Example
what is measured	parsing integers
input size	1 million numbers
build mode	`ReleaseFast`
target CPU	x86_64
allocator	arena allocator
iteration count	1000

Without this context, numbers are hard to compare.

Memory allocation is often worth measuring directly.

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    const allocator = gpa.allocator();

    var timer = try std.time.Timer.start();

    var n: usize = 0;

    while (n < 100000) : (n += 1) {
        const memory = try allocator.alloc(u8, 1024);
        allocator.free(memory);
    }

    const elapsed = timer.read();

    try std.io.getStdOut().writer().print(
        "{d} ns\n",
        .{elapsed},
    );
}

This measures allocation and deallocation together.

Benchmarking allocation is useful because allocator choice affects program structure. Arena allocation, fixed-buffer allocation, and general-purpose allocation have different trade-offs.

Microbenchmarks are only one level of measurement. A real program may spend most of its time in:

filesystem access
network latency
memory copying
parsing
cache misses
synchronization
allocator contention

Measure the actual bottleneck before rewriting code.

A useful process is:

Write the correct program.
Measure it.
Find the slowest operation.
Change one thing.
Measure again.

Without measurement, optimization becomes guesswork.

Exercise 20-31. Benchmark array summation with different array sizes.

Exercise 20-32. Compare while and for loops.

Exercise 20-33. Benchmark allocation using an arena allocator.

Exercise 20-34. Measure the cost of parsing integers from strings.

Exercise 20-35. Compare Debug and ReleaseFast builds.