about summary refs log tree commit diff
diff options
context:
space:
mode:
authorbors <bors@rust-lang.org>2014-11-21 01:06:47 +0000
committerbors <bors@rust-lang.org>2014-11-21 01:06:47 +0000
commit98300516072c6afd0e93654b325f5924b60dea53 (patch)
treef7f46d5b72a9df36b7296fc014bf80e3702a1e16
parent770378a313a573776b16237a46b75bafa49072c1 (diff)
parent16bb4e6400607ae51a929fe1c034fe709a57bb92 (diff)
downloadrust-98300516072c6afd0e93654b325f5924b60dea53.tar.gz
rust-98300516072c6afd0e93654b325f5924b60dea53.zip
auto merge of #18441 : mdinger/rust/literals, r=steveklabnik
Closes #18415

This links [`std::str`](http://doc.rust-lang.org/std/str/index.html) documentation to [literals](http://doc.rust-lang.org/reference.html#literals) in the reference guide and collects examples of literals into one group at the beginning of the section. ~~The new tables are not exhaustive (some escapes were skipped) and so I try to link back to the respective sections where more detail is located.~~ The tables are are mostly exhaustive. I misunderstood some of the whitespace codes.

I don't think the tables actually look that nice if that's important and I'm not sure how it could be improved. I think it does do a good job of collecting available options together. I think listing the escapes together is particularly helpful because they vary with type and are embedded in paragraphs.

[EDIT]
The [ascii table](http://man-ascii.com/) is here and may be useful.
-rw-r--r--src/doc/reference.md60
-rw-r--r--src/libcollections/str.rs6
2 files changed, 63 insertions, 3 deletions
diff --git a/src/doc/reference.md b/src/doc/reference.md
index 4f0c9a50422..4bba6bef5cf 100644
--- a/src/doc/reference.md
+++ b/src/doc/reference.md
@@ -225,6 +225,52 @@ reserved for future extension, that is, the above gives the lexical
 grammar, but a Rust parser will reject everything but the 12 special
 cases mentioned in [Number literals](#number-literals) below.
 
+#### Examples
+
+##### Characters and strings
+
+|   | Example | Number of `#` pairs allowed | Available characters | Escapes | Equivalent to |
+|---|---------|-----------------------------|----------------------|---------|---------------|
+| [Character](#character-literals) | `'H'` | `N/A` | All unicode | `\'` & [Byte escapes](#byte-escapes) & [Unicode escapes](#unicode-escapes) | `N/A` |
+| [String](#string-literals) | `"hello"` | `N/A` | All unicode | `\"` & [Byte escapes](#byte-escapes) & [Unicode escapes](#unicode-escapes) | `N/A` |
+| [Raw](#raw-string-literals) | `r##"hello"##`  | `0...` | All unicode | `N/A` | `N/A` |
+| [Byte](#byte-literals) | `b'H'` | `N/A` | All ASCII | `\'` & [Byte escapes](#byte-escapes) | `u8` |
+| [Byte string](#byte-string-literals) | `b"hello"` | `N/A`  | All ASCII | `\"` & [Byte escapes](#byte-escapes) | `&'static [u8]` |
+| [Raw byte string](#raw-byte-string-literals) | `br##"hello"##` | `0...` | All ASCII | `N/A` | `&'static [u8]` (unsure...not stated) |
+
+##### Byte escapes
+
+|   | Name |
+|---|------|
+| `\x7F` | 8-bit character code (exactly 2 digits) |
+| `\n` | Newline |
+| `\r` | Carriage return |
+| `\t` | Tab |
+| `\\` | Backslash |
+
+##### Unicode escapes
+|   | Name |
+|---|------|
+| `\u7FFF` | 16-bit character code (exactly 4 digits) |
+| `\U7EEEFFFF` | 32-bit character code (exactly 8 digits) |
+
+##### Numbers
+
+| [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes |
+|----------------------------------------|---------|----------------|----------|
+| Decimal integer | `98_222i` | `N/A` | Integer suffixes |
+| Hex integer | `0xffi` | `N/A` | Integer suffixes |
+| Octal integer | `0o77i` | `N/A` | Integer suffixes |
+| Binary integer | `0b1111_0000i` | `N/A` | Integer suffixes |
+| Floating-point | `123.0E+77f64` | `Optional` | Floating-point suffixes |
+
+`*` All number literals allow `_` as a visual separator: `1_234.0E+18f64`
+
+##### Suffixes
+| Integer | Floating-point |
+|---------|----------------|
+| `i` (`int`), `u` (`uint`), `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64` | `f32`, `f64` |
+
 #### Character and string literals
 
 ```{.ebnf .gram}
@@ -253,15 +299,21 @@ nonzero_dec: '1' | '2' | '3' | '4'
            | '5' | '6' | '7' | '8' | '9' ;
 ```
 
+##### Character literals
+
 A _character literal_ is a single Unicode character enclosed within two
 `U+0027` (single-quote) characters, with the exception of `U+0027` itself,
 which must be _escaped_ by a preceding U+005C character (`\`).
 
+##### String literals
+
 A _string literal_ is a sequence of any Unicode characters enclosed within two
 `U+0022` (double-quote) characters, with the exception of `U+0022` itself,
 which must be _escaped_ by a preceding `U+005C` character (`\`), or a _raw
 string literal_.
 
+##### Character escapes
+
 Some additional _escapes_ are available in either character or non-raw string
 literals. An escape starts with a `U+005C` (`\`) and continues with one of the
 following forms:
@@ -281,6 +333,8 @@ following forms:
 * The _backslash escape_ is the character `U+005C` (`\`) which must be
   escaped in order to denote *itself*.
 
+##### Raw string literals
+
 Raw string literals do not process any escapes. They start with the character
 `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a
 `U+0022` (double-quote) character. The _raw string body_ is not defined in the
@@ -322,12 +376,16 @@ raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
 
 ```
 
+##### Byte literals
+
 A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F`
 range) enclosed within two `U+0027` (single-quote) characters, with the
 exception of `U+0027` itself, which must be _escaped_ by a preceding U+005C
 character (`\`), or a single _escape_. It is equivalent to a `u8` unsigned
 8-bit integer _number literal_.
 
+##### Byte string literals
+
 A _byte string literal_ is a sequence of ASCII characters and _escapes_
 enclosed within two `U+0022` (double-quote) characters, with the exception of
 `U+0022` itself, which must be _escaped_ by a preceding `U+005C` character
@@ -347,6 +405,8 @@ following forms:
 * The _backslash escape_ is the character `U+005C` (`\`) which must be
   escaped in order to denote its ASCII encoding `0x5C`.
 
+##### Raw byte string literals
+
 Raw byte string literals do not process any escapes. They start with the
 character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more
 of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
diff --git a/src/libcollections/str.rs b/src/libcollections/str.rs
index aaa7da312f2..a7df5f4644a 100644
--- a/src/libcollections/str.rs
+++ b/src/libcollections/str.rs
@@ -42,9 +42,9 @@
 //! # Representation
 //!
 //! Rust's string type, `str`, is a sequence of Unicode scalar values encoded as a
-//! stream of UTF-8 bytes. All strings are guaranteed to be validly encoded UTF-8
-//! sequences. Additionally, strings are not null-terminated and can thus contain
-//! null bytes.
+//! stream of UTF-8 bytes. All [strings](../../reference.html#literals) are
+//! guaranteed to be validly encoded UTF-8 sequences. Additionally, strings are
+//! not null-terminated and can thus contain null bytes.
 //!
 //! The actual representation of strings have direct mappings to slices: `&str`
 //! is the same as `&[u8]`.