about summary refs log tree commit diff
path: root/compiler/rustc_parse/src/lexer/tokentrees.rs
AgeCommit message (Collapse)AuthorLines
2025-07-22Clean code for `rustc_parse/src/lexer`xizheyin-55/+51
1. Rename `make_unclosed_delims_error` and return `Vec<Diag>` 2. change magic number `unclosed_delimiter_show_limit` to const 3. move `eof_err` below parsing logic 4. Add `calculate_spacing` for `bump` and `bump_minimal` Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-07-18Deduplicate `unmatched_delims` in `rustc_parse` to reduce confusionxizheyin-2/+13
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-04-22Rename `open_brace` to `open_delimiters`xizheyin-21/+24
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-04-21Remove `token::{Open,Close}Delim`.Nicholas Nethercote-46/+39
By replacing them with `{Open,Close}{Param,Brace,Bracket,Invisible}`. PR #137902 made `ast::TokenKind` more like `lexer::TokenKind` by replacing the compound `BinOp{,Eq}(BinOpToken)` variants with fieldless variants `Plus`, `Minus`, `Star`, etc. This commit does a similar thing with delimiters. It also makes `ast::TokenKind` more similar to `parser::TokenType`. This requires a few new methods: - `TokenKind::is_{,open_,close_}delim()` replace various kinds of pattern matches. - `Delimiter::as_{open,close}_token_kind` are used to convert `Delimiter` values to `TokenKind`. Despite these additions, it's a net reduction in lines of code. This is because e.g. `token::OpenParen` is so much shorter than `token::OpenDelim(Delimiter::Parenthesis)` that many multi-line forms reduce to single line forms. And many places where the number of lines doesn't change are still easier to read, just because the names are shorter, e.g.: ``` - } else if self.token != token::CloseDelim(Delimiter::Brace) { + } else if self.token != token::CloseBrace { ```
2024-12-13Remove `Lexer`'s dependency on `Parser`.Nicholas Nethercote-80/+14
Lexing precedes parsing, as you'd expect: `Lexer` creates a `TokenStream` and `Parser` then parses that `TokenStream`. But, in a horrendous violation of layering abstractions and common sense, `Lexer` depends on `Parser`! The `Lexer::unclosed_delim_err` method does some error recovery that relies on creating a `Parser` to do some post-processing of the `TokenStream` that the `Lexer` just created. This commit just removes `unclosed_delim_err`. This change removes `Lexer`'s dependency on `Parser`, and also means that `lex_token_tree`'s return value can have a more typical form. The cost is slightly worse error messages in two obscure cases, as shown in these tests: - tests/ui/parser/brace-in-let-chain.rs: there is slightly less explanation in this case involving an extra `{`. - tests/ui/parser/diff-markers/unclosed-delims{,-in-macro}.rs: the diff marker detection is no longer supported (because that detection is implemented in the parser). In my opinion this cost is outweighed by the magnitude of the code cleanup.
2024-12-12Remove `PErr`.Nicholas Nethercote-7/+7
It's just a synonym for `Diag` that adds no value and is only used in a few places.
2024-11-25Split `Lexer::bump`.Nicholas Nethercote-7/+27
It has two different ways of being called.
2024-11-25Merge `TokenTreesReader` into `StringReader`.Nicholas Nethercote-39/+12
There is a not-very-useful layering in the lexer, where `TokenTreesReader` contains a `StringReader`. This commit combines them and names the result `Lexer`, which is a more obvious name for it. The methods of `Lexer` are now split across `mod.rs` and `tokentrees.rs` which isn't ideal, but it doesn't seem worth moving a bunch of code to avoid it.
2024-11-21Prepare for invisible delimiters.Nicholas Nethercote-4/+12
Current places where `Interpolated` is used are going to change to instead use invisible delimiters. This prepares for that. - It adds invisible delimiter cases to the `can_begin_*`/`may_be_*` methods and the `failed_to_match_macro` that are equivalent to the existing `Interpolated` cases. - It adds panics/asserts in some places where invisible delimiters should never occur. - In `Parser::parse_struct_fields` it excludes an ident + invisible delimiter from special consideration in an error message, because that's quite different to an ident + paren/brace/bracket.
2024-09-22Reformat using the new identifier sorting from rustfmtMichael Goulet-1/+1
2024-09-09Remove needless returns detected by clippy in the compilerEduardo Sánchez Muñoz-1/+1
2024-08-14Use `impl PartialEq<TokenKind> for Token` more.Nicholas Nethercote-1/+1
This lets us compare a `Token` with a `TokenKind`. It's used a lot, but can be used even more, avoiding the need for some `.kind` uses.
2024-07-30Auto merge of #127955 - ↵bors-3/+18
chenyukang:yukang-fix-mismatched-delimiter-issue-12786, r=nnethercote Add limit for unclosed delimiters in lexer diagnostic Fixes #127868 The first commit shows the original diagnostic, and the second commit shows the changes.
2024-07-29Reformat `use` declarations.Nicholas Nethercote-5/+6
The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-07-25add limit for unclosed delimiters in lexer diagnosticyukang-3/+18
2024-06-18Prefer `dcx` methods over fields or fields' methodsOli Scherer-2/+2
2024-06-05Remove `stream_to_parser`.Nicholas Nethercote-1/+2
It's a zero-value wrapper of `Parser::new`.
2024-06-05Don't use the word "parse" for lexing operations.Nicholas Nethercote-25/+22
Lexing converts source text into a token stream. Parsing converts a token stream into AST fragments. This commit renames several lexing operations that have "parse" in the name. I think these names have been subtly confusing me for years. This is just a `s/parse/lex/` on function names, with one exception: `parse_stream_from_source_str` becomes `source_str_to_stream`, to make it consistent with the existing `source_file_to_stream`. The commit also moves that function's location in the file to be just above `source_file_to_stream`. The commit also cleans up a few comments along the way.
2024-05-17Clarify that the diff_marker is talking about version control systemardi-1/+1
conflicts specifically and a few more improvements.
2024-04-08parser: reduce visibility of unnecessary public `UnmatchedDelim`Yutaro Ohno-3/+1
`lexer::UnmatchedDelim` struct in `rustc_parse` is unnecessary public outside of the crate. This commit reduces the visibility to `pub(crate)`. Beside, this removes unnecessary field `expected_delim` that causes warnings after changing the visibility.
2024-03-05Rename all `ParseSess` variables/fields/lifetimes as `psess`.Nicholas Nethercote-17/+17
Existing names for values of this type are `sess`, `parse_sess`, `parse_session`, and `ps`. `sess` is particularly annoying because that's also used for `Session` values, which are often co-located, and it can be difficult to know which type a value named `sess` refers to. (That annoyance is the main motivation for this change.) `psess` is nice and short, which is good for a name used this much. The commit also renames some `parse_sess_created` values as `psess_created`.
2024-01-11Fix lifetimes in `StringReader`.Nicholas Nethercote-10/+14
Two different lifetimes are conflated. This doesn't matter right now, but needs to be fixed for the next commit to work. And the more descriptive lifetime names make the code easier to read.
2024-01-08Remove `DiagnosticBuilder::delay_as_bug_without_consuming`.Nicholas Nethercote-3/+3
The existing uses are replaced in one of three ways. - In a function that also has calls to `emit`, just rearrange the code so that exactly one of `delay_as_bug` or `emit` is called on every path. - In a function returning a `DiagnosticBuilder`, use `downgrade_to_delayed_bug`. That's good enough because it will get emitted later anyway. - In `unclosed_delim_err`, one set of errors is being replaced with another set, so just cancel the original errors.
2024-01-08Make `DiagnosticBuilder::emit` consuming.Nicholas Nethercote-1/+1
This works for most of its call sites. This is nice, because `emit` very much makes sense as a consuming operation -- indeed, `DiagnosticBuilderState` exists to ensure no diagnostic is emitted twice, but it uses runtime checks. For the small number of call sites where a consuming emit doesn't work, the commit adds `DiagnosticBuilder::emit_without_consuming`. (This will be removed in subsequent commits.) Likewise, `emit_unless` becomes consuming. And `delay_as_bug` becomes consuming, while `delay_as_bug_without_consuming` is added (which will also be removed in subsequent commits.) All this requires significant changes to `DiagnosticBuilder`'s chaining methods. Currently `DiagnosticBuilder` method chaining uses a non-consuming `&mut self -> &mut Self` style, which allows chaining to be used when the chain ends in `emit()`, like so: ``` struct_err(msg).span(span).emit(); ``` But it doesn't work when producing a `DiagnosticBuilder` value, requiring this: ``` let mut err = self.struct_err(msg); err.span(span); err ``` This style of chaining won't work with consuming `emit` though. For that, we need to use to a `self -> Self` style. That also would allow `DiagnosticBuilder` production to be chained, e.g.: ``` self.struct_err(msg).span(span) ``` However, removing the `&mut self -> &mut Self` style would require that individual modifications of a `DiagnosticBuilder` go from this: ``` err.span(span); ``` to this: ``` err = err.span(span); ``` There are *many* such places. I have a high tolerance for tedious refactorings, but even I gave up after a long time trying to convert them all. Instead, this commit has it both ways: the existing `&mut self -> Self` chaining methods are kept, and new `self -> Self` chaining methods are added, all of which have a `_mv` suffix (short for "move"). Changes to the existing `forward!` macro lets this happen with very little additional boilerplate code. I chose to add the suffix to the new chaining methods rather than the existing ones, because the number of changes required is much smaller that way. This doubled chainging is a bit clumsy, but I think it is worthwhile because it allows a *lot* of good things to subsequently happen. In this commit, there are many `mut` qualifiers removed in places where diagnostics are emitted without being modified. In subsequent commits: - chaining can be used more, making the code more concise; - more use of chaining also permits the removal of redundant diagnostic APIs like `struct_err_with_code`, which can be replaced easily with `struct_err` + `code_mv`; - `emit_without_diagnostic` can be removed, which simplifies a lot of machinery, removing the need for `DiagnosticBuilderState`.
2023-12-24Remove `Parser` methods that duplicate `DiagCtxt` methods.Nicholas Nethercote-1/+1
2023-12-18Rename `ParseSess::span_diagnostic` as `ParseSess::dcx`.Nicholas Nethercote-2/+2
2023-12-11Add spacing information to delimiters.Nicholas Nethercote-34/+53
This is an extension of the previous commit. It means the output of something like this: ``` stringify!(let a: Vec<u32> = vec![];) ``` goes from this: ``` let a: Vec<u32> = vec![] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![]; ```
2023-12-11Improve `print_tts` by changing `tokenstream::Spacing`.Nicholas Nethercote-2/+7
`tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-11-21Fix `clippy::needless_borrow` in the compilerNilstrieb-3/+3
`x clippy compiler -Aclippy::all -Wclippy::needless_borrow --fix`. Then I had to remove a few unnecessary parens and muts that were exposed now.
2023-11-11Move unclosed delim errors to separate functionsjwang05-53/+58
2023-11-10Correctly handle while-let-chainssjwang05-1/+1
2023-11-09Catch an edge casesjwang05-1/+5
2023-11-09Catch stray { in let-chainssjwang05-1/+33
2023-10-30When encountering unclosed delimiters during parsing, check for diff markersEsteban Küber-18/+46
Fix #116252.
2023-10-12Reorder an expression to improve readability.Nicholas Nethercote-12/+7
2023-10-12Rename `Token::is_op` as `Token::is_punct`.Nicholas Nethercote-2/+5
For consistency with `proc_macro::Punct`.
2023-07-30inline format!() args up to and including rustc_middleMatthias Krüger-1/+1
2023-05-03Restrict `From<S>` for `{D,Subd}iagnosticMessage`.Nicholas Nethercote-2/+1
Currently a `{D,Subd}iagnosticMessage` can be created from any type that impls `Into<String>`. That includes `&str`, `String`, and `Cow<'static, str>`, which are reasonable. It also includes `&String`, which is pretty weird, and results in many places making unnecessary allocations for patterns like this: ``` self.fatal(&format!(...)) ``` This creates a string with `format!`, takes a reference, passes the reference to `fatal`, which does an `into()`, which clones the reference, doing a second allocation. Two allocations for a single string, bleh. This commit changes the `From` impls so that you can only create a `{D,Subd}iagnosticMessage` from `&str`, `String`, or `Cow<'static, str>`. This requires changing all the places that currently create one from a `&String`. Most of these are of the `&format!(...)` form described above; each one removes an unnecessary static `&`, plus an allocation when executed. There are also a few places where the existing use of `&String` was more reasonable; these now just use `clone()` at the call site. As well as making the code nicer and more efficient, this is a step towards possibly using `Cow<'static, str>` in `{D,Subd}iagnosticMessage::{Str,Eager}`. That would require changing the `From<&'a str>` impls to `From<&'static str>`, which is doable, but I'm not yet sure if it's worthwhile.
2023-04-10Fix typos in compilerDaniPopes-2/+2
2023-02-28refactor parse_token_trees to not return unmatched_delimsyukang-1/+0
2023-02-28rename unmatched_braces to unmatched_delimsyukang-5/+6
2023-02-28remove duplicated diagnostic for unclosed delimiteryukang-8/+9
2023-01-27Improve unexpected close and mismatch delimiter hint in TokenTreesReaderyukang-87/+42
2022-10-03Invert `is_top_level` to avoid negation.Nicholas Nethercote-5/+5
2022-10-03Remove `TokenStreamBuilder`.Nicholas Nethercote-37/+20
It's now only used in one function. Also, the "should we glue the tokens?" check is only necessary when pushing a `TokenTree::Token`, not when pushing a `TokenTree::Delimited`. As part of this, we now do the "should we glue the tokens?" check immediately, which avoids having look back at the previous token. It also puts all the logic dealing with token gluing in a single place.
2022-10-03Inline and remove `parse_token_tree_non_delim_non_eof`.Nicholas Nethercote-16/+14
It has a single call site.
2022-10-03Merge `parse_token_trees_until_close_delim` and `parse_all_token_trees`.Nicholas Nethercote-23/+16
Because they're very similar, and this will allow some follow-up changes.
2022-09-28Address review comments.Nicholas Nethercote-7/+7
2022-09-27Rename some variables.Nicholas Nethercote-10/+10
These make the delimiter processing clearer.
2022-09-27Minor improvements.Nicholas Nethercote-3/+5
Add some comments, and mark one path as unreachable.