A Line Filter

A line filter reads text, changes or selects some lines, and writes the result. Many Unix programs have this shape.

This program prints only the lines that contain a given word.

filter needle input.txt

Here is a first version.

const std = @import("std");

pub fn main() !void {
    var args = std.process.args();

    _ = args.next();

    const needle = args.next() orelse {
        std.debug.print("missing search text\n", .{});
        return;
    };

    const path = args.next() orelse {
        std.debug.print("missing input file\n", .{});
        return;
    };

    const cwd = std.fs.cwd();

    var file = try cwd.openFile(path, .{});
    defer file.close();

    var buffer: [4096]u8 = undefined;

    while (try file.reader().readUntilDelimiterOrEof(&buffer, '\n')) |line| {
        if (std.mem.indexOf(u8, line, needle) != null) {
            try std.io.getStdOut().writer().print("{s}\n", .{line});
        }
    }
}

The program reads two arguments. The first is the text to search for. The second is the file name.

const needle = args.next() orelse {
    std.debug.print("missing search text\n", .{});
    return;
};

const path = args.next() orelse {
    std.debug.print("missing input file\n", .{});
    return;
};

The file is opened in the current directory.

var file = try cwd.openFile(path, .{});
defer file.close();

The buffer holds one line at a time.

var buffer: [4096]u8 = undefined;

This means a line longer than 4096 bytes cannot be read by this version. That is an intentional limit. Small programs should make their limits visible.

The loop reads one line on each pass.

while (try file.reader().readUntilDelimiterOrEof(&buffer, '\n')) |line| {
    ...
}

The call returns an optional slice. If a line is read, the loop body receives it as line. At end of file, the value is null and the loop stops.

The test is a substring search.

if (std.mem.indexOf(u8, line, needle) != null) {
    ...
}

std.mem.indexOf returns an optional index. If the result is not null, the line contains the search text.

The output is written to standard output.

try std.io.getStdOut().writer().print("{s}\n", .{line});

This program is useful, but it has two rough edges. It creates a new reader and writer expression inside the loop, and it always adds a newline even if the last line in the file had none.

We can clean up the first point by naming the reader and writer.

const std = @import("std");

pub fn main() !void {
    var args = std.process.args();

    _ = args.next();

    const needle = args.next() orelse return error.MissingNeedle;
    const path = args.next() orelse return error.MissingPath;

    const cwd = std.fs.cwd();

    var file = try cwd.openFile(path, .{});
    defer file.close();

    var reader = file.reader();
    var out = std.io.getStdOut().writer();

    var buffer: [4096]u8 = undefined;

    while (try reader.readUntilDelimiterOrEof(&buffer, '\n')) |line| {
        if (std.mem.indexOf(u8, line, needle) != null) {
            try out.print("{s}\n", .{line});
        }
    }
}

This is the same program, but the main loop is easier to read.

A line filter often has this structure:

while (try reader.readUntilDelimiterOrEof(&buffer, '\n')) |line| {
    if (keep(line)) {
        try write(line);
    }
}

The work is divided into three parts: read a line, decide whether to keep it, and write it.

The decision can be moved into a function.

fn contains(line: []const u8, needle: []const u8) bool {
    return std.mem.indexOf(u8, line, needle) != null;
}

Then the loop says exactly what it does.

while (try reader.readUntilDelimiterOrEof(&buffer, '\n')) |line| {
    if (contains(line, needle)) {
        try out.print("{s}\n", .{line});
    }
}

This is a good habit. Keep I/O code near the edge of the program. Put simple decisions in small functions.

Exercise 20-11. Make the match case-insensitive.

Exercise 20-12. Add a -v option that prints lines that do not match.

Exercise 20-13. Print line numbers before matching lines.

Exercise 20-14. Return an error when a line is too long.

Exercise 20-15. Read from standard input when no file name is given.