summary refs log tree commit diff
path: root/src/doc/grammar.md
blob: 78432b6a9659370cac1ae373770e795f0b8469b7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
% Grammar

# Introduction

This document is the primary reference for the Rust programming language grammar. It
provides only one kind of material:

  - Chapters that formally define the language grammar.

This document does not serve as an introduction to the language. Background
familiarity with the language is assumed. A separate [guide] is available to
help acquire such background.

This document also does not serve as a reference to the [standard] library
included in the language distribution. Those libraries are documented
separately by extracting documentation attributes from their source code. Many
of the features that one might expect to be language features are library
features in Rust, so what you're looking for may be there, not here.

[guide]: guide.html
[standard]: std/index.html

# Notation

Rust's grammar is defined over Unicode codepoints, each conventionally denoted
`U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is
confined to the ASCII range of Unicode, and is described in this document by a
dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
supported by common automated LL(k) parsing tools such as `llgen`, rather than
the dialect given in ISO 14977. The dialect can be defined self-referentially
as follows:

```antlr
grammar : rule + ;
rule    : nonterminal ':' productionrule ';' ;
productionrule : production [ '|' production ] * ;
production : term * ;
term : element repeats ;
element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
```

Where:

- Whitespace in the grammar is ignored.
- Square brackets are used to group rules.
- `LITERAL` is a single printable ASCII character, or an escaped hexadecimal
  ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding
  Unicode codepoint `U+00QQ`.
- `IDENTIFIER` is a nonempty string of ASCII letters and underscores.
- The `repeat` forms apply to the adjacent `element`, and are as follows:
  - `?` means zero or one repetition
  - `*` means zero or more repetitions
  - `+` means one or more repetitions
  - NUMBER trailing a repeat symbol gives a maximum repetition count
  - NUMBER on its own gives an exact repetition count

This EBNF dialect should hopefully be familiar to many readers.

## Unicode productions

A few productions in Rust's grammar permit Unicode codepoints outside the ASCII
range. We define these productions in terms of character properties specified
in the Unicode standard, rather than in terms of ASCII-range codepoints. The
section [Special Unicode Productions](#special-unicode-productions) lists these
productions.

## String table productions

Some rules in the grammar — notably [unary
operators](#unary-operator-expressions), [binary
operators](#binary-operator-expressions), and [keywords](#keywords) — are
given in a simplified form: as a listing of a table of unquoted, printable
whitespace-separated strings. These cases form a subset of the rules regarding
the [token](#tokens) rule, and are assumed to be the result of a
lexical-analysis phase feeding the parser, driven by a DFA, operating over the
disjunction of all such string table entries.

When such a string enclosed in double-quotes (`"`) occurs inside the grammar,
it is an implicit reference to a single member of such a string table
production. See [tokens](#tokens) for more information.

# Lexical structure

## Input format

Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8.
Most Rust grammar rules are defined in terms of printable ASCII-range
codepoints, but a small number are defined in terms of Unicode properties or
explicit codepoint lists. [^inputformat]

[^inputformat]: Substitute definitions for the special Unicode productions are
  provided to the grammar verifier, restricted to ASCII range, when verifying the
  grammar in this document.

## Special Unicode Productions

The following productions in the Rust grammar are defined in terms of Unicode
properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and
`non_double_quote`.

### Identifiers

The `ident` production is any nonempty Unicode[^non_ascii_idents] string of
the following form:

[^non_ascii_idents]: Non-ASCII characters in identifiers are currently feature
  gated. This is expected to improve soon.

- The first character has property `XID_start`
- The remaining characters have property `XID_continue`

that does _not_ occur in the set of [keywords](#keywords).

> **Note**: `XID_start` and `XID_continue` as character properties cover the
> character ranges used to form the more familiar C and Java language-family
> identifiers.

### Delimiter-restricted productions

Some productions are defined by exclusion of particular Unicode characters:

- `non_null` is any single Unicode character aside from `U+0000` (null)
- `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`)
- `non_single_quote` is `non_null` restricted to exclude `U+0027`  (`'`)
- `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`)

## Comments

```antlr
comment : block_comment | line_comment ;
block_comment : "/*" block_comment_body * "*/" ;
block_comment_body : [block_comment | character] * ;
line_comment : "//" non_eol * ;
```

**FIXME:** add doc grammar?

## Whitespace

```antlr
whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
whitespace : [ whitespace_char | comment ] + ;
```

## Tokens

```antlr
simple_token : keyword | unop | binop ;
token : simple_token | ident | literal | symbol | whitespace token ;
```

### Keywords

<p id="keyword-table-marker"></p>

|          |          |          |          |          |
|----------|----------|----------|----------|----------|
| _        | abstract | alignof  | as       | become   |
| box      | break    | const    | continue | crate    |
| do       | else     | enum     | extern   | false    |
| final    | fn       | for      | if       | impl     |
| in       | let      | loop     | macro    | match    |
| mod      | move     | mut      | offsetof | override |
| priv     | proc     | pub      | pure     | ref      |
| return   | Self     | self     | sizeof   | static   |
| struct   | super    | trait    | true     | type     |
| typeof   | unsafe   | unsized  | use      | virtual  |
| where    | while    | yield    |          |          |


Each of these keywords has special meaning in its grammar, and all of them are
excluded from the `ident` rule.

Not all of these keywords are used by the language. Some of them were used
before Rust 1.0, and were left reserved once their implementations were
removed. Some of them were reserved before 1.0 to make space for possible
future features.

### Literals

```antlr
lit_suffix : ident;
literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
```

The optional `lit_suffix` production is only used for certain numeric literals,
but is reserved for future extension. That is, the above gives the lexical
grammar, but a Rust parser will reject everything but the 12 special cases
mentioned in [Number literals](reference/tokens.html#number-literals) in the
reference.

#### Character and string literals

```antlr
char_lit : '\x27' char_body '\x27' ;
string_lit : '"' string_body * '"' | 'r' raw_string ;

char_body : non_single_quote
          | '\x5c' [ '\x27' | common_escape | unicode_escape ] ;

string_body : non_double_quote
            | '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;

common_escape : '\x5c'
              | 'n' | 'r' | 't' | '0'
              | 'x' hex_digit 2
unicode_escape : 'u' '{' hex_digit+ 6 '}';

hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
          | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
          | dec_digit ;
oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
dec_digit : '0' | nonzero_dec ;
nonzero_dec: '1' | '2' | '3' | '4'
           | '5' | '6' | '7' | '8' | '9' ;
```

#### Byte and byte string literals

```antlr
byte_lit : "b\x27" byte_body '\x27' ;
byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;

byte_body : ascii_non_single_quote
          | '\x5c' [ '\x27' | common_escape ] ;

byte_string_body : ascii_non_double_quote
            | '\x5c' [ '\x22' | common_escape ] ;
raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;

```

#### Number literals

```antlr
num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
        | '0' [       [ dec_digit | '_' ] * float_suffix ?
              | 'b'   [ '1' | '0' | '_' ] +
              | 'o'   [ oct_digit | '_' ] +
              | 'x'   [ hex_digit | '_' ] +  ] ;

float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;

exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
dec_lit : [ dec_digit | '_' ] + ;
```

#### Boolean literals

```antlr
bool_lit : [ "true" | "false" ] ;
```

The two values of the boolean type are written `true` and `false`.

### Symbols

```antlr
symbol : "::" | "->"
       | '#' | '[' | ']' | '(' | ')' | '{' | '}'
       | ',' | ';' ;
```

Symbols are a general class of printable [tokens](#tokens) that play structural
roles in a variety of grammar productions. They are cataloged here for
completeness as the set of remaining miscellaneous printable tokens that do not
otherwise appear as [unary operators](#unary-operator-expressions), [binary
operators](#binary-operator-expressions), or [keywords](#keywords).

## Paths

```antlr
expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
               | expr_path ;

type_path : ident [ type_path_tail ] + ;
type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
               | "::" type_path ;
```

# Syntax extensions

## Macros

```antlr
expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';'
                 | "macro_rules" '!' ident '{' macro_rule * '}' ;
macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
matcher : '(' matcher * ')' | '[' matcher * ']'
        | '{' matcher * '}' | '$' ident ':' ident
        | '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
        | non_special_token ;
transcriber : '(' transcriber * ')' | '[' transcriber * ']'
            | '{' transcriber * '}' | '$' ident
            | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
            | non_special_token ;
```

# Crates and source files

**FIXME:** grammar? What production covers #![crate_id = "foo"] ?

# Items and attributes

**FIXME:** grammar?

## Items

```antlr
item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
     | const_item | static_item | trait_item | impl_item | extern_block_item ;
```

### Type Parameters

**FIXME:** grammar?

### Modules

```antlr
mod_item : "mod" ident ( ';' | '{' mod '}' );
mod : [ view_item | item ] * ;
```

#### View items

```antlr
view_item : extern_crate_decl | use_decl ';' ;
```

##### Extern crate declarations

```antlr
extern_crate_decl : "extern" "crate" crate_name
crate_name: ident | ( ident "as" ident )
```

##### Use declarations

```antlr
use_decl : vis ? "use" [ path "as" ident
                        | path_glob ] ;

path_glob : ident [ "::" [ path_glob
                          | '*' ] ] ?
          | '{' path_item [ ',' path_item ] * '}' ;

path_item : ident | "self" ;
```

### Functions

**FIXME:** grammar?

#### Generic functions

**FIXME:** grammar?

#### Unsafety

**FIXME:** grammar?

##### Unsafe functions

**FIXME:** grammar?

##### Unsafe blocks

**FIXME:** grammar?

#### Diverging functions

**FIXME:** grammar?

### Type definitions

**FIXME:** grammar?

### Structures

**FIXME:** grammar?

### Enumerations

**FIXME:** grammar?

### Constant items

```antlr
const_item : "const" ident ':' type '=' expr ';' ;
```

### Static items

```antlr
static_item : "static" ident ':' type '=' expr ';' ;
```

#### Mutable statics

**FIXME:** grammar?

### Traits

**FIXME:** grammar?

### Implementations

**FIXME:** grammar?

### External blocks

```antlr
extern_block_item : "extern" '{' extern_block '}' ;
extern_block : [ foreign_fn ] * ;
```

## Visibility and Privacy

```antlr
vis : "pub" ;
```
### Re-exporting and Visibility

See [Use declarations](#use-declarations).

## Attributes

```antlr
attribute : '#' '!' ? '[' meta_item ']' ;
meta_item : ident [ '=' literal
                  | '(' meta_seq ')' ] ? ;
meta_seq : meta_item [ ',' meta_seq ] ? ;
```

# Statements and expressions

## Statements

```antlr
stmt : decl_stmt | expr_stmt | ';' ;
```

### Declaration statements

```antlr
decl_stmt : item | let_decl ;
```

#### Item declarations

See [Items](#items).

#### Variable declarations

```antlr
let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
init : [ '=' ] expr ;
```

### Expression statements

```antlr
expr_stmt : expr ';' ;
```

## Expressions

```antlr
expr : literal | path | tuple_expr | unit_expr | struct_expr
     | block_expr | method_call_expr | field_expr | array_expr
     | idx_expr | range_expr | unop_expr | binop_expr
     | paren_expr | call_expr | lambda_expr | while_expr
     | loop_expr | break_expr | continue_expr | for_expr
     | if_expr | match_expr | if_let_expr | while_let_expr
     | return_expr ;
```

#### Lvalues, rvalues and temporaries

**FIXME:** grammar?

#### Moved and copied types

**FIXME:** Do we want to capture this in the grammar as different productions?

### Literal expressions

See [Literals](#literals).

### Path expressions

See [Paths](#paths).

### Tuple expressions

```antlr
tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
```

### Unit expressions

```antlr
unit_expr : "()" ;
```

### Structure expressions

```antlr
struct_expr_field_init : ident | ident ':' expr ;
struct_expr : expr_path '{' struct_expr_field_init
                      [ ',' struct_expr_field_init ] *
                      [ ".." expr ] '}' |
              expr_path '(' expr
                      [ ',' expr ] * ')' |
              expr_path ;
```

### Block expressions

```antlr
block_expr : '{' [ stmt | item ] *
                 [ expr ] '}' ;
```

### Method-call expressions

```antlr
method_call_expr : expr '.' ident paren_expr_list ;
```

### Field expressions

```antlr
field_expr : expr '.' ident ;
```

### Array expressions

```antlr
array_expr : '[' "mut" ? array_elems? ']' ;

array_elems : [expr [',' expr]*] | [expr ';' expr] ;
```

### Index expressions

```antlr
idx_expr : expr '[' expr ']' ;
```

### Range expressions

```antlr
range_expr : expr ".." expr |
             expr ".." |
             ".." expr |
             ".." ;
```

### Unary operator expressions

```antlr
unop_expr : unop expr ;
unop : '-' | '*' | '!' ;
```

### Binary operator expressions

```antlr
binop_expr : expr binop expr | type_cast_expr
           | assignment_expr | compound_assignment_expr ;
binop : arith_op | bitwise_op | lazy_bool_op | comp_op
```

#### Arithmetic operators

```antlr
arith_op : '+' | '-' | '*' | '/' | '%' ;
```

#### Bitwise operators

```antlr
bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
```

#### Lazy boolean operators

```antlr
lazy_bool_op : "&&" | "||" ;
```

#### Comparison operators

```antlr
comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
```

#### Type cast expressions

```antlr
type_cast_expr : value "as" type ;
```

#### Assignment expressions

```antlr
assignment_expr : expr '=' expr ;
```

#### Compound assignment expressions

```antlr
compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
```

### Grouped expressions

```antlr
paren_expr : '(' expr ')' ;
```

### Call expressions

```antlr
expr_list : [ expr [ ',' expr ]* ] ? ;
paren_expr_list : '(' expr_list ')' ;
call_expr : expr paren_expr_list ;
```

### Lambda expressions

```antlr
ident_list : [ ident [ ',' ident ]* ] ? ;
lambda_expr : '|' ident_list '|' expr ;
```

### While loops

```antlr
while_expr : [ lifetime ':' ] ? "while" no_struct_literal_expr '{' block '}' ;
```

### Infinite loops

```antlr
loop_expr : [ lifetime ':' ] ? "loop" '{' block '}';
```

### Break expressions

```antlr
break_expr : "break" [ lifetime ] ?;
```

### Continue expressions

```antlr
continue_expr : "continue" [ lifetime ] ?;
```

### For expressions

```antlr
for_expr : [ lifetime ':' ] ? "for" pat "in" no_struct_literal_expr '{' block '}' ;
```

### If expressions

```antlr
if_expr : "if" no_struct_literal_expr '{' block '}'
          else_tail ? ;

else_tail : "else" [ if_expr | if_let_expr
                   | '{' block '}' ] ;
```

### Match expressions

```antlr
match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;

match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;

match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
```

### If let expressions

```antlr
if_let_expr : "if" "let" pat '=' expr '{' block '}'
               else_tail ? ;
```

### While let loops

```antlr
while_let_expr : [ lifetime ':' ] ? "while" "let" pat '=' expr '{' block '}' ;
```

### Return expressions

```antlr
return_expr : "return" expr ? ;
```

# Type system

**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?

## Types

### Primitive types

**FIXME:** grammar?

#### Machine types

**FIXME:** grammar?

#### Machine-dependent integer types

**FIXME:** grammar?

### Textual types

**FIXME:** grammar?

### Tuple types

**FIXME:** grammar?

### Array, and Slice types

**FIXME:** grammar?

### Structure types

**FIXME:** grammar?

### Enumerated types

**FIXME:** grammar?

### Pointer types

**FIXME:** grammar?

### Function types

**FIXME:** grammar?

### Closure types

```antlr
closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
                [ ':' bound-list ] [ '->' type ]
lifetime-list := lifetime | lifetime ',' lifetime-list
arg-list := ident ':' type | ident ':' type ',' arg-list
```

### Never type
An empty type

```antlr
never_type : "!" ;
```

### Object types

**FIXME:** grammar?

### Type parameters

**FIXME:** grammar?

### Type parameter bounds

```antlr
bound-list := bound | bound '+' bound-list '+' ?
bound := ty_bound | lt_bound
lt_bound := lifetime
ty_bound := ty_bound_noparen | (ty_bound_noparen)
ty_bound_noparen := [?] [ for<lt_param_defs> ] simple_path
```

### Self types

**FIXME:** grammar?

## Type kinds

**FIXME:** this is probably not relevant to the grammar...

# Memory and concurrency models

**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?

## Memory model

### Memory allocation and lifetime

### Memory ownership

### Variables

### Boxes

## Threads

### Communication between threads

### Thread lifecycle