about summary refs log tree commit diff
diff options
context:
space:
mode:
authorSteve Klabnik <steve@steveklabnik.com>2014-08-28 13:56:55 -0400
committerSteve Klabnik <steve@steveklabnik.com>2014-08-28 13:56:55 -0400
commitbda3ceda039f0d37c726ca62e0bac457ce39d071 (patch)
tree0b16541d1bc49d81a0c58288f0f51f78261f2ebb
parentb5165321e48c1fd8422803fb40693afab7939c8c (diff)
downloadrust-bda3ceda039f0d37c726ca62e0bac457ce39d071.tar.gz
rust-bda3ceda039f0d37c726ca62e0bac457ce39d071.zip
Add note about string indexing.
Thanks @chris-morgan!
-rw-r--r--src/doc/guide-strings.md56
1 files changed, 56 insertions, 0 deletions
diff --git a/src/doc/guide-strings.md b/src/doc/guide-strings.md
index 6c6d6e36899..bf762d13b78 100644
--- a/src/doc/guide-strings.md
+++ b/src/doc/guide-strings.md
@@ -121,6 +121,62 @@ fn compare(string: String) {
 Converting a `String` to a `&str` is cheap, but converting the `&str` to a
 `String` involves an allocation.
 
+## Indexing strings
+
+You may be tempted to try to access a certain character of a `String`, like
+this:
+
+```{rust,ignore}
+let s = "hello".to_string();
+
+println!("{}", s[0]);
+```
+
+This does not compile. This is on purpose. In the world of UTF-8, direct
+indexing is basically never what you want to do. The reason is that each
+charater can be a variable number of bytes. This means that you have to iterate
+through the characters anyway, which is a O(n) operation. 
+
+To iterate over a string, use the `graphemes()` method on `&str`:
+
+```{rust}
+let s = "αἰθήρ";
+
+for l in s.graphemes(true) {
+    println!("{}", l);
+}
+```
+
+This will print out each character in turn, as you'd expect: first "α", then
+"ἰ", etc. You can see that this is different than just the individual bytes.
+Here's a version that prints out each byte:
+
+```{rust}
+let s = "αἰθήρ";
+
+for l in s.as_bytes().iter() {
+    println!("{}", l);
+}
+```
+
+This will print:
+
+```{notrust,ignore}
+206
+177
+225
+188
+176
+206
+184
+206
+174
+207
+129
+```
+
+Many more bytes than graphemes!
+
 # Other Documentation
 
 * [the `&str` API documentation](/std/str/index.html)