Strings Are Bytes

A Zig string is a sequence of bytes.

This is a string literal:

const s = "hello";

It has five visible characters:

h e l l o

It also has a zero sentinel after the last byte, so the literal can be used where sentinel-terminated data is required.

The bytes can be printed one by one:

const std = @import("std");

pub fn main() void {
    const s = "hello";

    for (s) |b| {
        std.debug.print("{d}\n", .{b});
    }
}

The output is:

These are byte values. The letter h is byte 104. The letter e is byte 101.

To print them as characters, use {c}:

const std = @import("std");

pub fn main() void {
    const s = "hello";

    for (s) |b| {
        std.debug.print("{c}\n", .{b});
    }
}

The output is:

h
e
l
l
o

A string literal is not a special string object. Zig has no hidden string class. A string literal is a pointer to a constant sentinel-terminated array of bytes.

In ordinary code, it is often used as a slice:

const s: []const u8 = "hello";

The type []const u8 means a read-only slice of bytes.

This is the most common string type in Zig.

const std = @import("std");

fn printString(s: []const u8) void {
    std.debug.print("{s}\n", .{s});
}

pub fn main() void {
    printString("zig");
    printString("language");
}

The output is:

zig
language

The {s} format prints a byte slice as a string.

Since strings are bytes, s.len gives the number of bytes, not the number of human characters.

const std = @import("std");

pub fn main() void {
    const s = "hello";

    std.debug.print("{d}\n", .{s.len});
}

The output is:

For plain ASCII text, the number of bytes and the number of characters are the same.

For UTF-8 text, they may differ.

const std = @import("std");

pub fn main() void {
    const s = "é";

    std.debug.print("{d}\n", .{s.len});
}

The output is:

The character é is encoded as two bytes in UTF-8.

This is important. Indexing a string gives a byte, not a character.

const std = @import("std");

pub fn main() void {
    const s = "é";

    std.debug.print("{d}\n", .{s[0]});
    std.debug.print("{d}\n", .{s[1]});
}

The output is:

195
169

These are the two UTF-8 bytes for é.

For byte-oriented work, this is exactly what you want. Files, network protocols, and memory buffers are byte sequences.

For text-oriented work, you must decode UTF-8 deliberately.

String literals may contain escapes:

const newline = "first\nsecond";
const tab = "a\tb";
const quote = "he said \"zig\"";
const slash = "c:\\tmp\\file.txt";

A string may also be written across several lines with backslash-backslash syntax:

const text =
    \\first line
    \\second line
    \\third line
;

This produces the bytes for:

first line
second line
third line

Multi-line strings are useful for help text, generated source, and test data.

A mutable string needs mutable storage. A string literal is constant and must not be changed.

var buf = [_]u8{ 'h', 'e', 'l', 'l', 'o' };

buf[0] = 'H';

Now buf contains:

Hello

To pass it to a function that expects a string slice, use slicing:

const std = @import("std");

pub fn main() void {
    var buf = [_]u8{ 'h', 'e', 'l', 'l', 'o' };

    buf[0] = 'H';

    std.debug.print("{s}\n", .{buf[0..]});
}

The output is:

Hello

Use []const u8 for read-only strings. Use []u8 for mutable byte buffers.

Exercises.

Exercise 6-17. Write a program that prints the byte values of "zig".

Exercise 6-18. Write a function that takes []const u8 and prints each byte as a character.

Exercise 6-19. Print the .len of "hello" and "é".

Exercise 6-20. Create a mutable byte array containing hello, change it to Hello, and print it.