Skip to content

Strings and UTF-8

A string is text.

A string is text.

const name = "Zig";

The text "Zig" is a string literal. A string literal is text written directly in the source code.

const message = "Hello";
const language = "Zig";
const path = "/usr/bin";

Strings look simple, but in Zig they are more explicit than in many beginner languages. Zig does not hide the fact that text is stored as bytes.

Strings are bytes

A Zig string literal is a sequence of bytes.

For ordinary English text, this is easy to see:

const text = "abc";

The letters have byte values:

CharacterByte value
a97
b98
c99

So "abc" is stored as three useful bytes:

97 98 99

There is also a sentinel zero byte at the end of a string literal, which helps with C interoperability. For now, the important point is: the visible text is stored as bytes.

Printing strings

To print a string with std.debug.print, use {s}.

const std = @import("std");

pub fn main() void {
    const language = "Zig";

    std.debug.print("Language: {s}\n", .{language});
}

Output:

Language: Zig

The formatter {s} means “print this as a string.”

This is different from {}, which is used for many ordinary values:

std.debug.print("Number: {}\n", .{123});

For strings, use {s}.

String length

A string has a length.

const text = "hello";

You can ask for the length:

const len = text.len;

Complete example:

const std = @import("std");

pub fn main() void {
    const text = "hello";

    std.debug.print("text = {s}\n", .{text});
    std.debug.print("length = {}\n", .{text.len});
}

Output:

text = hello
length = 5

Here, text.len is 5 because the visible text has five bytes.

Length means bytes, not always characters

For plain ASCII text, one character usually equals one byte.

const text = "hello";

This has 5 characters and 5 bytes.

But many languages use characters that need more than one byte in UTF-8.

const text = "é";

The visible text has one human character, but UTF-8 stores it using more than one byte.

So in Zig:

const text = "é";

text.len is the number of bytes, not the number of human-visible characters.

This is very important. Zig treats strings as bytes. Unicode text is built on top of those bytes.

What is UTF-8

UTF-8 is a way to store Unicode text as bytes.

Unicode is a large system for representing text from many languages and symbol sets. UTF-8 is the common byte encoding used on the web, in source code, in JSON, and in many operating systems.

ASCII text uses one byte per character:

A B C

Many non-ASCII characters use multiple bytes:

é
🙂

Zig string literals are UTF-8 encoded. That means this is allowed:

const greeting = "こんにちは";
const icon = "✓";

But Zig does not pretend that every visible character is one byte. It keeps the byte representation visible.

Indexing a string

You can index a string to get a byte.

const text = "abc";
const first = text[0];

first is the byte for a.

Complete example:

const std = @import("std");

pub fn main() void {
    const text = "abc";

    std.debug.print("{}\n", .{text[0]});
    std.debug.print("{c}\n", .{text[0]});
}

Output:

97
a

The first print uses {} and shows the byte value.

The second print uses {c} and shows it as a character.

String indexes start at zero

Indexes start at zero.

For this string:

const text = "abc";

The indexes are:

IndexByteCharacter
097a
198b
299c

So:

text[0] // 'a'
text[1] // 'b'
text[2] // 'c'

This is invalid:

text[3] // error or runtime safety check failure

There is no visible byte at index 3.

Slicing a string

A slice refers to part of a string.

const text = "hello";
const part = text[0..2];

The slice text[0..2] contains bytes from index 0 up to but not including index 2.

So it contains:

he

Complete example:

const std = @import("std");

pub fn main() void {
    const text = "hello";

    const first_two = text[0..2];
    const rest = text[2..];

    std.debug.print("{s}\n", .{first_two});
    std.debug.print("{s}\n", .{rest});
}

Output:

he
llo

The range rule is:

start included
end excluded

So 0..2 means indexes 0 and 1.

Be careful slicing UTF-8

Because strings are bytes, slicing can cut through the middle of a UTF-8 character.

That creates invalid UTF-8.

For ASCII text, this is fine:

const text = "hello";
const part = text[0..2]; // "he"

For non-ASCII text, you must be more careful:

const text = "é";

The visible character may use multiple bytes. A slice that takes only the first byte would not be valid text.

Zig does not automatically protect you from every Unicode mistake. It gives you byte-level control. When you need proper Unicode processing, use UTF-8 aware logic.

Strings are usually immutable

A string literal should be treated as read-only.

const text = "hello";

You should not try to modify the contents of a string literal.

If you need mutable text, use an array or a buffer.

var buffer = [_]u8{ 'h', 'e', 'l', 'l', 'o' };
buffer[0] = 'H';

Complete example:

const std = @import("std");

pub fn main() void {
    var buffer = [_]u8{ 'h', 'e', 'l', 'l', 'o' };

    buffer[0] = 'H';

    std.debug.print("{s}\n", .{buffer[0..]});
}

Output:

Hello

Here, buffer is a mutable array of bytes. The slice buffer[0..] can be printed as a string because it contains text bytes.

Escape sequences

A string can contain special escape sequences.

EscapeMeaning
\nnewline
\ttab
\"double quote
\\backslash

Example:

const std = @import("std");

pub fn main() void {
    std.debug.print("one\ntwo\n", .{});
}

Output:

one
two

The \n creates a new line.

To include quotes inside a string:

const text = "She said \"hello\"";

To include a backslash:

const path = "C:\\Users\\Ada";

Multiline strings

Zig supports multiline string literals using lines that begin with \\.

const text =
    \\first line
    \\second line
    \\third line
;

Complete example:

const std = @import("std");

pub fn main() void {
    const text =
        \\first line
        \\second line
        \\third line
    ;

    std.debug.print("{s}\n", .{text});
}

Output:

first line
second line
third line

This is useful for embedded text, templates, generated code, SQL, JSON examples, and help messages.

Comparing strings

Do not compare strings with == when you mean “same text.”

Use std.mem.eql.

const std = @import("std");

pub fn main() void {
    const a = "zig";
    const b = "zig";

    if (std.mem.eql(u8, a, b)) {
        std.debug.print("same\n", .{});
    }
}

Output:

same

The call:

std.mem.eql(u8, a, b)

means: compare these two sequences of u8 bytes.

Zig is explicit because strings are byte slices.

Building strings

String literals are fixed. If you need to build a string at runtime, you usually use an allocator or a buffer.

A simple buffer example:

const std = @import("std");

pub fn main() void {
    var buffer: [64]u8 = undefined;

    const name = "Zig";
    const message = std.fmt.bufPrint(&buffer, "Hello, {s}!", .{name}) catch return;

    std.debug.print("{s}\n", .{message});
}

Output:

Hello, Zig!

Here, buffer provides storage. std.fmt.bufPrint writes formatted text into that storage and returns a slice containing the initialized text.

This pattern is common in Zig: caller provides memory, function writes into it.

A complete example

const std = @import("std");

pub fn main() void {
    const language = "Zig";
    const description = "systems programming";

    std.debug.print("Language: {s}\n", .{language});
    std.debug.print("Length in bytes: {}\n", .{language.len});

    const first = language[0];
    std.debug.print("First byte: {}\n", .{first});
    std.debug.print("First character: {c}\n", .{first});

    const short = description[0..7];
    std.debug.print("Short description: {s}\n", .{short});

    if (std.mem.eql(u8, language, "Zig")) {
        std.debug.print("The language is Zig.\n", .{});
    }
}

Output:

Language: Zig
Length in bytes: 3
First byte: 90
First character: Z
Short description: systems
The language is Zig.

This example shows the central facts: strings are byte sequences, .len counts bytes, indexing gets bytes, slicing gets byte ranges, and string comparison uses a memory comparison function.

The Main Idea

In Zig, strings are not magical objects. They are byte sequences, usually encoded as UTF-8.

That design gives you control. You can inspect bytes, slice text, pass strings to C, store raw data, and avoid hidden allocation. The tradeoff is that you must remember the difference between bytes and human characters.

For beginner Zig code, use string literals for fixed text, {s} for printing, .len for byte length, slicing for substrings, and std.mem.eql(u8, a, b) for string comparison.