about summary refs log tree commit diff
diff options
context:
space:
mode:
authorSteve Klabnik <steve@steveklabnik.com>2014-07-17 19:24:12 -0400
committerSteve Klabnik <steve@steveklabnik.com>2014-07-17 20:50:14 -0400
commit226b7d1b725300fe6fbcfc1a66dfc1298f0af893 (patch)
tree461d249bd998bd95d11c2dffbbfa6af35b85237c
parente288fc6a996562c5e4aca46e22c1da46eb3d086b (diff)
downloadrust-226b7d1b725300fe6fbcfc1a66dfc1298f0af893.tar.gz
rust-226b7d1b725300fe6fbcfc1a66dfc1298f0af893.zip
Guide: strings
-rw-r--r--src/doc/guide-strings.md129
-rw-r--r--src/libcollections/str.rs8
2 files changed, 133 insertions, 4 deletions
diff --git a/src/doc/guide-strings.md b/src/doc/guide-strings.md
new file mode 100644
index 00000000000..6f301afc849
--- /dev/null
+++ b/src/doc/guide-strings.md
@@ -0,0 +1,129 @@
+% The Strings Guide
+
+# Strings
+
+Strings are an important concept to master in any programming language. If you
+come from a managed language background, you may be surprised at the complexity
+of string handling in a systems programming language. Efficient access and
+allocation of memory for a dynamically sized structure involves a lot of
+details. Luckily, Rust has lots of tools to help us here.
+
+A **string** is a sequence of unicode scalar values encoded as a stream of
+UTF-8 bytes. All strings are guaranteed to be validly-encoded UTF-8 sequences.
+Additionally, strings are not null-terminated and can contain null bytes.
+
+Rust has two main types of strings: `&str` and `String`.
+
+## &str
+
+The first kind is a `&str`. This is pronounced a 'string slice.' String literals
+are of the type `&str`:
+
+```{rust}
+let string = "Hello there.";
+```
+
+Like any Rust type, string slices have an associated lifetime. A string literal
+is a `&'static str`.  A string slice can be written without an explicit
+lifetime in many cases, such as in function arguments. In these cases the
+lifetime will be inferred:
+
+```{rust}
+fn takes_slice(slice: &str) {
+    println!("Got: {}", slice);
+}
+```
+
+Like vector slices, string slices are simply a pointer plus a length. This
+means that they're a 'view' into an already-allocated string, such as a
+`&'static str` or a `String`.
+
+## String
+
+A `String` is a heap-allocated string. This string is growable, and is also
+guaranteed to be UTF-8.
+
+```{rust}
+let mut s = "Hello".to_string();
+println!("{}", s);
+
+s.push_str(", world.");
+println!("{}", s);
+```
+
+You can coerce a `String` into a `&str` with the `as_slice()` method:
+
+```{rust}
+fn takes_slice(slice: &str) {
+    println!("Got: {}", slice);
+}
+
+fn main() {
+    let s = "Hello".to_string();
+    takes_slice(s.as_slice());
+}
+```
+
+You can also get a `&str` from a stack-allocated array of bytes:
+
+```{rust}
+use std::str;
+
+let x: &[u8] = &[b'a', b'b'];
+let stack_str: &str = str::from_utf8(x).unwrap();
+```
+
+## Best Practices
+
+### `String` vs. `&str`
+
+In general, you should prefer `String` when you need ownership, and `&str` when
+you just need to borrow a string. This is very similar to using `Vec<T>` vs. `&[T]`,
+and `T` vs `&T` in general.
+
+This means starting off with this:
+
+```{rust,ignore}
+fn foo(s: &str) {
+```
+
+and only moving to this:
+
+```{rust,ignore}
+fn foo(s: String) {
+```
+
+If you have good reason. It's not polite to hold on to ownership you don't
+need, and it can make your lifetimes more complex. Furthermore, you can pass
+either kind of string into `foo` by using `.as_slice()` on any `String` you
+need to pass in, so the `&str` version is more flexible.
+
+### Comparisons
+
+To compare a String to a constant string, prefer `as_slice()`...
+
+```{rust}
+fn compare(string: String) {
+    if string.as_slice() == "Hello" {
+        println!("yes");
+    }
+}
+```
+
+... over `to_string()`:
+
+```{rust}
+fn compare(string: String) {
+    if string == "Hello".to_string() {
+        println!("yes");
+    }
+}
+```
+
+Converting a `String` to a `&str` is cheap, but converting the `&str` to a
+`String` involves an allocation.
+
+## Other Documentation
+
+* [the `&str` API documentation](/std/str/index.html)
+* [the `String` API documentation](std/string/index.html)
diff --git a/src/libcollections/str.rs b/src/libcollections/str.rs
index 85921f1176a..bd6a13ae2e3 100644
--- a/src/libcollections/str.rs
+++ b/src/libcollections/str.rs
@@ -55,10 +55,10 @@ other languages.
 
 # Representation
 
-Rust's string type, `str`, is a sequence of unicode codepoints encoded as a
-stream of UTF-8 bytes. All safely-created strings are guaranteed to be validly
-encoded UTF-8 sequences. Additionally, strings are not null-terminated
-and can contain null codepoints.
+Rust's string type, `str`, is a sequence of unicode scalar values encoded as a
+stream of UTF-8 bytes. All strings are guaranteed to be validly encoded UTF-8
+sequences. Additionally, strings are not null-terminated and can contain null
+bytes.
 
 The actual representation of strings have direct mappings to vectors: `&str`
 is the same as `&[u8]`.