Memory Mapped Files

A memory mapped file is a file that the operating system places into your program’s address space.

That sounds strange at first, so use this simple picture:

Normally, when you read a file, you ask the operating system to copy bytes from the file into a buffer.

With a memory mapped file, you ask the operating system to make the file look like memory. Then your program can read the file through a pointer or slice.

You do not call read() again and again. You access bytes directly.

const bytes = mapped_file[0..];
const first = bytes[0];

The file is still on disk, but the operating system manages how pages of the file are loaded into RAM.

Normal File Reading

In normal file reading, the program owns a buffer.

const std = @import("std");

pub fn main() !void {
    var file = try std.fs.cwd().openFile("data.txt", .{});
    defer file.close();

    var buffer: [1024]u8 = undefined;
    const n = try file.read(&buffer);

    const bytes = buffer[0..n];
    std.debug.print("{s}\n", .{bytes});
}

This program opens a file, reads up to 1024 bytes into buffer, then prints the bytes that were read.

The important thing is this:

try file.read(&buffer);

The operating system copies file data into your buffer.

That is simple and works well for many programs.

Memory Mapping

With memory mapping, the program does not manually copy the whole file into a buffer. Instead, it asks the operating system for a mapped region.

Conceptually:

file on disk -> virtual memory region -> slice of bytes

Your program sees the mapped file as bytes in memory.

This is useful when:

You want random access to a large file.

You want to scan a file without manually managing read buffers.

You want the operating system to decide which parts of the file should be loaded.

You want multiple processes to share the same file-backed memory.

You are building tools like databases, search indexes, compilers, log readers, or binary file analyzers.

Why Memory Mapping Exists

Suppose you have a 4 GB file.

With normal reading, you might read the file in chunks:

read bytes 0..4096
read bytes 4096..8192
read bytes 8192..12288
...

That works, but you need to manage the loop, the buffer, and the current offset.

With memory mapping, you can treat the file like one large byte slice:

bytes[0]
bytes[4096]
bytes[100_000_000]

The operating system loads the needed pages when your program touches them.

A page is a fixed-size block of virtual memory, commonly 4 KiB on many systems. The exact size depends on the operating system and CPU.

Memory Mapping Is Virtual Memory

A memory mapped file depends on virtual memory.

Your program does not directly use physical RAM addresses. It uses virtual addresses. The operating system and CPU translate those virtual addresses to real memory.

When you map a file, the operating system says:

This range of virtual addresses represents this file.

When your program reads from that range, the operating system loads the relevant file page if needed.

This means a mapped file can be larger than available RAM. The operating system does not need to load the whole file immediately.

Read-Only Mapping

The safest first use of memory mapping is read-only mapping.

A read-only mapping lets your program inspect a file without modifying it.

In Zig, exact APIs may differ by operating system and standard library version, but the basic idea is:

const std = @import("std");

pub fn main() !void {
    var file = try std.fs.cwd().openFile("data.txt", .{});
    defer file.close();

    const stat = try file.stat();
    const size = stat.size;

    const mapped = try std.posix.mmap(
        null,
        size,
        std.posix.PROT.READ,
        .{ .TYPE = .PRIVATE },
        file.handle,
        0,
    );
    defer std.posix.munmap(mapped);

    const bytes = mapped[0..size];

    std.debug.print("first byte: {}\n", .{bytes[0]});
}

This example maps data.txt into memory and reads from it.

The key call is:

std.posix.mmap(...)

The matching cleanup call is:

std.posix.munmap(mapped);

This pair matters. If you map memory, you must unmap it when you are done.

Understanding the Arguments

The mmap call has several arguments because memory mapping is a low-level operating system feature.

A simplified view:

std.posix.mmap(
    address,
    length,
    protection,
    flags,
    file_descriptor,
    offset,
)

address is usually null. That means: let the operating system choose where the mapping goes.

length is the number of bytes to map.

protection controls what the program may do with the mapped memory. For read-only mapping, use read permission.

flags control whether writes are private or shared.

file_descriptor identifies the open file.

offset says where in the file the mapping starts.

For beginners, the safest first pattern is:

const mapped = try std.posix.mmap(
    null,
    size,
    std.posix.PROT.READ,
    .{ .TYPE = .PRIVATE },
    file.handle,
    0,
);

This maps the whole file, read-only, starting from byte 0.

Private vs Shared Mapping

Memory mappings can be private or shared.

A private mapping means changes are private to your process. The file itself is not modified.

A shared mapping means changes can be written back to the file and can be visible to other processes mapping the same file.

For beginners, use private read-only mappings first.

.{ .TYPE = .PRIVATE }

This is a good default when you only want to inspect bytes.

Shared writable mappings are more powerful, but they also require more care. You need to think about synchronization, file size, flushing, crashes, and what other processes may see.

Mapping an Empty File

An empty file has size 0.

You should not blindly map a zero-length file.

if (size == 0) {
    std.debug.print("empty file\n", .{});
    return;
}

This check avoids asking the operating system to map nothing.

A robust program should handle empty files separately.

Accessing the Bytes

Once the file is mapped, you usually work with it as a slice:

const bytes = mapped[0..size];

Then you can read it like any other slice:

for (bytes, 0..) |b, i| {
    std.debug.print("{}: {}\n", .{ i, b });
}

You can search inside it:

const needle = "error";

if (std.mem.indexOf(u8, bytes, needle)) |pos| {
    std.debug.print("found at byte {}\n", .{pos});
}

This is one reason memory mapping is pleasant for file analysis. Many normal slice operations work naturally.

Example: Count Newlines

Here is a small example that counts lines in a file by counting newline bytes.

const std = @import("std");

pub fn main() !void {
    var file = try std.fs.cwd().openFile("data.txt", .{});
    defer file.close();

    const stat = try file.stat();
    const size = stat.size;

    if (size == 0) {
        std.debug.print("lines: 0\n", .{});
        return;
    }

    const mapped = try std.posix.mmap(
        null,
        size,
        std.posix.PROT.READ,
        .{ .TYPE = .PRIVATE },
        file.handle,
        0,
    );
    defer std.posix.munmap(mapped);

    const bytes = mapped[0..size];

    var lines: usize = 0;
    for (bytes) |b| {
        if (b == '\n') {
            lines += 1;
        }
    }

    std.debug.print("lines: {}\n", .{lines});
}

This program does not create a read buffer. It maps the file and scans the mapped bytes.

Important Safety Rule

A mapped slice is only valid while the mapping exists.

This is correct:

const mapped = try std.posix.mmap(...);
defer std.posix.munmap(mapped);

const bytes = mapped[0..size];
// use bytes here

This is wrong:

const bytes = mapped[0..size];
std.posix.munmap(mapped);

// bytes is now invalid

After munmap, the memory region no longer belongs to your program. Reading from it is a serious bug.

Treat mapped memory like borrowed memory. It has a lifetime, and that lifetime ends when you unmap it.

Memory Mapping Does Not Remove Errors

Memory mapping can make file access look like ordinary memory access, but the operating system is still involved.

Several things can go wrong:

The file may not exist.

The program may not have permission to open it.

The file may be empty.

The mapping may fail.

The file may be truncated by another process while mapped.

A mapped region may trigger a fault if the backing file becomes invalid.

So memory mapping is not magic. It is a powerful operating system feature, and your program still needs careful error handling.

When to Use Memory Mapping

Memory mapping is a good fit when you need random access to file contents.

For example, a database may map pages of a database file. A search engine may map an index file. A compiler may map source files. A binary analysis tool may map an executable file and inspect headers at different offsets.

Memory mapping can also simplify code. Instead of writing a loop that repeatedly fills a buffer, you can write code over a slice.

But memory mapping is not always better.

For small files, normal reading is often simpler.

For streaming input, normal reading is often better.

For network data, memory mapping usually does not apply.

For highly controlled I/O patterns, explicit reads may be easier to tune.

A Practical Rule

Use normal file reading first.

Reach for memory mapping when one of these is true:

The file is large.

You need random access.

You want the operating system to page file contents lazily.

You want to treat file contents as a byte slice.

You are building a system where file-backed memory is part of the design.

For beginners, memory mapping is best understood as a bridge between files and memory. It lets a file behave like memory, while the operating system handles loading pages behind the scenes.