Zig does not have a separate character type for ordinary strings.
A string is a sequence of bytes.
const message = "hello";The bytes are:
104 101 108 108 111These are the ASCII byte values for h, e, l, l, and o.
A byte has type u8.
const c: u8 = 'A';The value of c is 65.
A character literal with one ASCII character gives a u8.
const a: u8 = 'a';
const newline: u8 = '\n';
const tab: u8 = '\t';Escape sequences are used for bytes that are hard to write directly.
| Escape | Meaning |
|---|---|
\n | newline |
\t | tab |
\r | carriage return |
\\ | backslash |
\" | double quote |
\' | single quote |
A string literal may contain escape sequences.
const text = "one\ntwo\n";This contains two newline bytes.
Printing it gives:
one
twoA string literal has a sentinel-terminated array type. In ordinary code, it is often used as a slice of constant bytes.
const name: []const u8 = "zig";Read this as: name is a slice of constant u8 values.
The elements cannot be changed through name.
name[0] = 'Z'; // errorA mutable byte array can be changed.
var name = [_]u8{ 'z', 'i', 'g' };
name[0] = 'Z';Now the array contains:
ZigA string is not the same thing as text in the full human sense. Zig strings are bytes. Text encoding is a separate matter.
Most modern text uses UTF-8. UTF-8 stores some characters in one byte and others in several bytes.
const s = "é";This looks like one character, but in UTF-8 it uses two bytes.
195 169So this is not a good way to count human characters:
const s = "é";
const n = s.len; // 2len counts bytes, not Unicode characters.
For ASCII text, one byte usually corresponds to one visible character.
const s = "abc";Here s.len is 3.
For UTF-8 text, byte length and character count may differ.
const s = "hello 世界";The visible text has fewer characters than its byte length.
This is deliberate. Zig keeps the low-level representation clear. A string is bytes. If the program needs Unicode rules, it must use code that understands Unicode.
A byte can be printed as a character with {c}.
const std = @import("std");
pub fn main() void {
const c: u8 = 'A';
std.debug.print("{c}\n", .{c});
}The output is:
AThe same byte can be printed as a number with {d}.
std.debug.print("{d}\n", .{c});The output is:
65This is often useful when inspecting data.
A simple loop over a string visits bytes.
const std = @import("std");
pub fn main() void {
const s = "abc";
for (s) |b| {
std.debug.print("{c} {d}\n", .{ b, b });
}
}The output is:
a 97
b 98
c 99Each value b has type u8.
Use u8 when you mean a byte. Use []const u8 when you mean a read-only byte string. Treat Unicode as an encoding problem, not as a hidden language feature.
Exercises:
Declare a
u8with value'A'and print it with{c}and{d}.Write a string containing a newline and print it.
Create a mutable array containing the bytes for
cat, then change it tobat.Loop over
"zig"and print each byte as both a character and a number.Check the
.lenof"é"and explain why it is not1.