summary refs log tree commit diff
path: root/src/doc/trpl/strings.md
blob: 61a6ec3eb3f4d2e3f770dc53a7638a0c2b62b2b1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
% Strings

Strings are an important concept for any programmer to master. Rust’s string
handling system is a bit different from other languages, due to its systems
focus. Any time you have a data structure of variable size, things can get
tricky, and strings are a re-sizable data structure. That being said, Rust’s
strings also work differently than in some other systems languages, such as C.

Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values
encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid
encoding of UTF-8 sequences. Additionally, unlike some systems languages,
strings are not null-terminated and can contain null bytes.

Rust has two main types of strings: `&str` and `String`. Let’s talk about
`&str` first. These are called ‘string slices’. String literals are of the type
`&'static str`:

```rust
let string = "Hello there."; // string: &'static str
```

This string is statically allocated, meaning that it’s saved inside our
compiled program, and exists for the entire duration it runs. The `string`
binding is a reference to this statically allocated string. String slices
have a fixed size, and cannot be mutated.

A `String`, on the other hand, is a heap-allocated string. This string is
growable, and is also guaranteed to be UTF-8. `String`s are commonly created by
converting from a string slice using the `to_string` method.

```rust
let mut s = "Hello".to_string(); // mut s: String
println!("{}", s);

s.push_str(", world.");
println!("{}", s);
```

`String`s will coerce into `&str` with an `&`:

```
fn takes_slice(slice: &str) {
    println!("Got: {}", slice);
}

fn main() {
    let s = "Hello".to_string();
    takes_slice(&s);
}
```

Viewing a `String` as a `&str` is cheap, but converting the `&str` to a
`String` involves allocating memory. No reason to do that unless you have to!

## Indexing

Because strings are valid UTF-8, strings do not support indexing:

```rust,ignore
let s = "hello";

println!("The first letter of s is {}", s[0]); // ERROR!!!
```

Usually, access to a vector with `[]` is very fast. But, because each character
in a UTF-8 encoded string can be multiple bytes, you have to walk over the
string to find the nᵗʰ letter of a string. This is a significantly more
expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’
isn’t something defined in Unicode, exactly. We can choose to look at a string as
individual bytes, or as codepoints:

```rust
let hachiko = "忠犬ハチ公";

for b in hachiko.as_bytes() {
    print!("{}, ", b);
}

println!("");

for c in hachiko.chars() {
    print!("{}, ", c);
}

println!("");
```

This prints:

```text
229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172, 
忠, 犬, ハ, チ, 公, 
```

As you can see, there are more bytes than `char`s.

You can get something similar to an index like this:

```rust
# let hachiko = "忠犬ハチ公";
let dog = hachiko.chars().nth(1); // kinda like hachiko[1]
```

This emphasizes that we have to go through the whole list of `chars`.

## Concatenation

If you have a `String`, you can concatenate a `&str` to the end of it:

```rust
let hello = "Hello ".to_string();
let world = "world!";

let hello_world = hello + world;
```

But if you have two `String`s, you need an `&`:

```rust
let hello = "Hello ".to_string();
let world = "world!".to_string();

let hello_world = hello + &world;
```

This is because `&String` can automatically coerece to a `&str`. This is a
feature called ‘[`Deref` coercions][dc]’.

[dc]: deref-coercions.html