A binary file format stores data as bytes with a specific structure.
Text files store data as readable characters:
name: Alice
age: 30Binary files store data in compact byte layouts:
41 4C 49 43 45 1EThose bytes may represent a name, an integer, a timestamp, an image, a database page, an executable header, or anything else. The bytes only make sense if you know the format.
Text vs Binary
A text file is designed to be read by humans.
A binary file is designed to be read by programs.
That does not mean binary files are mysterious. They are just more strict. Instead of reading lines and words, you read exact byte positions.
For example, a simple binary format might say:
bytes 0..4 magic number
bytes 4..8 version
bytes 8..16 record count
bytes 16.. recordsYour program must follow that layout exactly.
Magic Numbers
Many binary formats start with a magic number.
A magic number is a short byte sequence that identifies the file type.
For example, a custom format might begin with:
ZDB1In Zig:
const magic = "ZDB1";When reading the file, check the first bytes:
if (!std.mem.eql(u8, bytes[0..4], "ZDB1")) {
return error.BadMagic;
}This prevents your parser from treating the wrong file as valid data.
A Tiny Binary Format
Let’s design a small file format for storing unsigned 32-bit numbers.
The file layout:
bytes 0..4 magic: "NUMS"
bytes 4..8 count: u32 little-endian
bytes 8.. count numbers, each u32 little-endianA file with three numbers:
magic = "NUMS"
count = 3
numbers = 10, 20, 30The byte layout is:
4E 55 4D 53 03 00 00 00 0A 00 00 00 14 00 00 00 1E 00 00 00Each number uses 4 bytes.
Writing the File
const std = @import("std");
pub fn main() !void {
var file = try std.fs.cwd().createFile("numbers.bin", .{});
defer file.close();
const numbers = [_]u32{ 10, 20, 30 };
try file.writeAll("NUMS");
var buffer: [4]u8 = undefined;
std.mem.writeInt(u32, &buffer, numbers.len, .little);
try file.writeAll(&buffer);
for (numbers) |n| {
std.mem.writeInt(u32, &buffer, n, .little);
try file.writeAll(&buffer);
}
}The key function is:
std.mem.writeInt(u32, &buffer, n, .little);It writes an integer into bytes using little-endian order.
Reading the File
const std = @import("std");
const ParseError = error{
BadMagic,
Truncated,
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const bytes = try std.fs.cwd().readFileAlloc(
allocator,
"numbers.bin",
1024 * 1024,
);
defer allocator.free(bytes);
const numbers = try parseNumbers(bytes);
for (numbers) |n| {
std.debug.print("{}\n", .{n});
}
}
fn parseNumbers(bytes: []const u8) ParseError![]const u32 {
if (bytes.len < 8) {
return error.Truncated;
}
if (!std.mem.eql(u8, bytes[0..4], "NUMS")) {
return error.BadMagic;
}
const count = std.mem.readInt(u32, bytes[4..8], .little);
const needed = 8 + @as(usize, count) * 4;
if (bytes.len < needed) {
return error.Truncated;
}
// This function returns a view-like idea in spirit, but not a real u32 slice.
// We will parse one number at a time in real code below.
_ = count;
return error.Truncated;
}This version shows validation, but the return type is not the right design. The file contains bytes, not a native []const u32 slice. You should not pretend those bytes are already a safe Zig u32 array.
A better parser reads each integer from the byte slice.
const std = @import("std");
const ParseError = error{
BadMagic,
Truncated,
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const bytes = try std.fs.cwd().readFileAlloc(
allocator,
"numbers.bin",
1024 * 1024,
);
defer allocator.free(bytes);
try printNumbers(bytes);
}
fn printNumbers(bytes: []const u8) ParseError!void {
if (bytes.len < 8) {
return error.Truncated;
}
if (!std.mem.eql(u8, bytes[0..4], "NUMS")) {
return error.BadMagic;
}
const count = std.mem.readInt(u32, bytes[4..8], .little);
const needed = 8 + @as(usize, count) * 4;
if (bytes.len < needed) {
return error.Truncated;
}
var offset: usize = 8;
var i: u32 = 0;
while (i < count) : (i += 1) {
const n = std.mem.readInt(u32, bytes[offset..][0..4], .little);
offset += 4;
std.debug.print("{}\n", .{n});
}
}This is safer. It treats the file as bytes and converts bytes into integers deliberately.
Endianness
Endianness means byte order.
The integer 0x12345678 can be stored in memory as:
big-endian: 12 34 56 78
little-endian: 78 56 34 12Many modern machines are little-endian, but file formats should not depend on the current machine unless they are explicitly machine-local.
A good binary format states its byte order.
Example:
All integers are little-endian.Then every reader and writer must follow that rule.
In Zig, make the byte order explicit:
std.mem.writeInt(u32, &buffer, value, .little);
std.mem.readInt(u32, bytes, .little);This makes the file format portable across machines.
Alignment
A binary file is a sequence of bytes. It does not automatically obey the alignment rules of your CPU.
This is dangerous:
const value: *const u32 = @ptrCast(bytes.ptr);The pointer may not be aligned for u32. The file may use a different endianness. The layout may not match Zig’s in-memory layout.
Prefer this:
const value = std.mem.readInt(u32, bytes[0..4], .little);Parsing through bytes is clearer and safer.
Struct Layout Is Not a File Format
A common beginner mistake is to write a struct directly to disk and treat that as a file format.
const Header = struct {
version: u32,
count: u32,
};The in-memory layout of this struct may include padding. It may depend on alignment rules. It may change if fields change. It may depend on target details unless you carefully control layout.
For file formats, define bytes, not structs.
Better:
bytes 0..4 version, u32 little-endian
bytes 4..8 count, u32 little-endianThen write parsing code that follows the byte layout.
You may use structs internally after parsing, but the file format itself should be described as bytes.
Offsets
Binary parsing is mostly offset management.
You keep track of where you are in the byte slice.
var offset: usize = 0;
const magic = bytes[offset..][0..4];
offset += 4;
const version = std.mem.readInt(u32, bytes[offset..][0..4], .little);
offset += 4;This pattern appears everywhere in parsers.
For larger formats, it is useful to create a small reader.
const ByteReader = struct {
bytes: []const u8,
offset: usize = 0,
fn readBytes(self: *ByteReader, n: usize) ![]const u8 {
if (self.offset + n > self.bytes.len) {
return error.Truncated;
}
const out = self.bytes[self.offset..][0..n];
self.offset += n;
return out;
}
fn readU32(self: *ByteReader) !u32 {
const b = try self.readBytes(4);
return std.mem.readInt(u32, b, .little);
}
};Now the parser is cleaner:
var reader = ByteReader{ .bytes = bytes };
const magic = try reader.readBytes(4);
const count = try reader.readU32();Versioning
Binary formats should include a version field.
Example:
bytes 0..4 magic: "NUMS"
bytes 4..8 version: u32 little-endian
bytes 8..12 count: u32 little-endian
bytes 12.. numbersVersioning lets your format evolve.
Version 1 might store only numbers.
Version 2 might add timestamps.
Version 3 might add compression.
Without a version field, future readers must guess which layout the file uses. Guessing is fragile.
Length Fields
Binary formats often use length fields.
Example:
bytes 0..4 name length, u32 little-endian
next N bytes UTF-8 name bytesWhen parsing length fields, always check bounds.
Bad:
const name = bytes[offset .. offset + name_len];Good:
if (offset + name_len > bytes.len) {
return error.Truncated;
}
const name = bytes[offset .. offset + name_len];Also watch for integer overflow when computing sizes.
const end = std.math.add(usize, offset, name_len) catch {
return error.Truncated;
};For parsers that read untrusted files, these checks are not optional.
Checksums
Some binary formats include checksums.
A checksum is a value computed from bytes to detect corruption.
Example layout:
bytes 0..4 magic
bytes 4..8 payload length
bytes 8..12 checksum
bytes 12.. payloadWhen reading, the parser recomputes the checksum and compares it with the stored checksum.
Checksums do not prove that data is safe or authentic. They mainly detect accidental corruption. For security, use cryptographic authentication such as MACs or signatures.
Binary Formats Must Be Defensive
A binary parser should assume the input may be invalid.
The file may be too short.
The magic number may be wrong.
The version may be unsupported.
A length field may point past the end.
A count may be huge.
Offsets may overflow.
Data may be compressed incorrectly.
Strings may not be valid UTF-8.
Your parser should reject bad data cleanly instead of crashing or reading outside the buffer.
Zig helps because slices carry lengths and integer conversions are explicit, but you still need to write the checks.
When Binary Formats Are Useful
Binary formats are useful when you care about compact size, fast parsing, exact layout, or compatibility with existing systems.
Common examples:
image files
audio files
video files
database files
index files
network packets
executables
object files
archives
game assets
compiler caches
Text formats are often better for configuration, logs, and simple data exchange. Binary formats are better when layout, speed, and size matter more.
Mental Model
A binary file is a contract.
The contract says what each byte means.
Your Zig code should follow that contract exactly: check the magic number, read integers with explicit endianness, validate lengths, manage offsets carefully, and reject malformed data.
Do not treat file bytes as native structs too early. Parse bytes first. Build structured values after validation.