Skip to content

Characters and Bytes

Zig does not have a separate character type for ordinary strings.

Zig does not have a separate character type for ordinary strings.

A string is a sequence of bytes.

const message = "hello";

The bytes are:

104 101 108 108 111

These are the ASCII byte values for h, e, l, l, and o.

A byte has type u8.

const c: u8 = 'A';

The value of c is 65.

A character literal with one ASCII character gives a u8.

const a: u8 = 'a';
const newline: u8 = '\n';
const tab: u8 = '\t';

Escape sequences are used for bytes that are hard to write directly.

EscapeMeaning
\nnewline
\ttab
\rcarriage return
\\backslash
\"double quote
\'single quote

A string literal may contain escape sequences.

const text = "one\ntwo\n";

This contains two newline bytes.

Printing it gives:

one
two

A string literal has a sentinel-terminated array type. In ordinary code, it is often used as a slice of constant bytes.

const name: []const u8 = "zig";

Read this as: name is a slice of constant u8 values.

The elements cannot be changed through name.

name[0] = 'Z'; // error

A mutable byte array can be changed.

var name = [_]u8{ 'z', 'i', 'g' };

name[0] = 'Z';

Now the array contains:

Zig

A string is not the same thing as text in the full human sense. Zig strings are bytes. Text encoding is a separate matter.

Most modern text uses UTF-8. UTF-8 stores some characters in one byte and others in several bytes.

const s = "é";

This looks like one character, but in UTF-8 it uses two bytes.

195 169

So this is not a good way to count human characters:

const s = "é";
const n = s.len; // 2

len counts bytes, not Unicode characters.

For ASCII text, one byte usually corresponds to one visible character.

const s = "abc";

Here s.len is 3.

For UTF-8 text, byte length and character count may differ.

const s = "hello 世界";

The visible text has fewer characters than its byte length.

This is deliberate. Zig keeps the low-level representation clear. A string is bytes. If the program needs Unicode rules, it must use code that understands Unicode.

A byte can be printed as a character with {c}.

const std = @import("std");

pub fn main() void {
    const c: u8 = 'A';
    std.debug.print("{c}\n", .{c});
}

The output is:

A

The same byte can be printed as a number with {d}.

std.debug.print("{d}\n", .{c});

The output is:

65

This is often useful when inspecting data.

A simple loop over a string visits bytes.

const std = @import("std");

pub fn main() void {
    const s = "abc";

    for (s) |b| {
        std.debug.print("{c} {d}\n", .{ b, b });
    }
}

The output is:

a 97
b 98
c 99

Each value b has type u8.

Use u8 when you mean a byte. Use []const u8 when you mean a read-only byte string. Treat Unicode as an encoding problem, not as a hidden language feature.

Exercises:

  1. Declare a u8 with value 'A' and print it with {c} and {d}.

  2. Write a string containing a newline and print it.

  3. Create a mutable array containing the bytes for cat, then change it to bat.

  4. Loop over "zig" and print each byte as both a character and a number.

  5. Check the .len of "é" and explain why it is not 1.