about summary refs log tree commit diff
path: root/src/libstd
diff options
context:
space:
mode:
authorbors <bors@rust-lang.org>2017-08-19 05:46:46 +0000
committerbors <bors@rust-lang.org>2017-08-19 05:46:46 +0000
commit7f397bdb062fe13a4707219a2f32486c5294f642 (patch)
tree92d3a1f24950bc3dbec0cd389c6b0bb294400373 /src/libstd
parentc7e3c7932c5d76a59d494797fe71530daf534ed3 (diff)
parent1065ad418e9693a8bbd4592237f858bc862d2482 (diff)
downloadrust-7f397bdb062fe13a4707219a2f32486c5294f642.tar.gz
rust-7f397bdb062fe13a4707219a2f32486c5294f642.zip
Auto merge of #43919 - frewsxcv:frewsxcv-char-primitive, r=QuietMisdreavus
Minor rewrite of char primitive unicode intro.

Opened primarily to address #36998.

Despite my love for emoji, the heart example is a little confusing because both heart characters start with the same code point and there can be stark rendering differences across browsers. I also spelled out what each of the code points is in the code block, which (hopefully) sheds light why one character is one code point while the other is two.

Very much open to suggestion and improvements. I'm pretty tired when I wrote this so I might wake up and realize that this is making things more confusing 😅
Diffstat (limited to 'src/libstd')
-rw-r--r--src/libstd/primitive_docs.rs31
1 files changed, 18 insertions, 13 deletions
diff --git a/src/libstd/primitive_docs.rs b/src/libstd/primitive_docs.rs
index c52899db437..6746754ebc3 100644
--- a/src/libstd/primitive_docs.rs
+++ b/src/libstd/primitive_docs.rs
@@ -103,26 +103,31 @@ mod prim_bool { }
 /// [`String`]: string/struct.String.html
 ///
 /// As always, remember that a human intuition for 'character' may not map to
-/// Unicode's definitions. For example, emoji symbols such as '❤️' can be more
-/// than one Unicode code point; this ❤️ in particular is two:
+/// Unicode's definitions. For example, despite looking similar, the 'é'
+/// character is one Unicode code point while 'é' is two Unicode code points:
 ///
 /// ```
-/// let s = String::from("❤️");
+/// let mut chars = "é".chars();
+/// // U+00e9: 'latin small letter e with acute'
+/// assert_eq!(Some('\u{00e9}'), chars.next());
+/// assert_eq!(None, chars.next());
 ///
-/// // we get two chars out of a single ❤️
-/// let mut iter = s.chars();
-/// assert_eq!(Some('\u{2764}'), iter.next());
-/// assert_eq!(Some('\u{fe0f}'), iter.next());
-/// assert_eq!(None, iter.next());
+/// let mut chars = "é".chars();
+/// // U+0065: 'latin small letter e'
+/// assert_eq!(Some('\u{0065}'), chars.next());
+/// // U+0301: 'combining acute accent'
+/// assert_eq!(Some('\u{0301}'), chars.next());
+/// assert_eq!(None, chars.next());
 /// ```
 ///
-/// This means it won't fit into a `char`. Trying to create a literal with
-/// `let heart = '❤️';` gives an error:
+/// This means that the contents of the first string above _will_ fit into a
+/// `char` while the contents of the second string _will not_. Trying to create
+/// a `char` literal with the contents of the second string gives an error:
 ///
 /// ```text
-/// error: character literal may only contain one codepoint: '❤
-/// let heart = '❤️';
-///             ^~
+/// error: character literal may only contain one codepoint: 'é'
+/// let c = 'é';
+///         ^^^^
 /// ```
 ///
 /// Another implication of the 4-byte fixed size of a `char` is that