about summary refs log tree commit diff
path: root/compiler/rustc_parse/src/lexer
AgeCommit message (Collapse)AuthorLines
2025-09-09Strip frontmatter in fewer placesLeón Orell Valerian Liehr-1/+1
2025-09-06Disallow shebang in `--cfg` and `--check-cfg` argumentsUrgau-5/+30
2025-09-01fix(lexer): Only allow horizontal whitespace in frontmatterEd Page-3/+3
In writing up the reference for frontmatter, I realized that we probably shouldn't be accepting Unicode Line Ending characters between the code fence and infostring or trailing after the infostring or a code fence. In digging into the unicode specification we use for Whitespace, it divides it up into categories, so I'm deferring to what it says for horizontal whitespace for what should be used within a line. Note, I am leaving out support for Unicde Default Ignorable characters. I figure that can be discussed outside of this change within the reference and tracking issue.
2025-08-22Migrate `BuiltinLintDiag::HiddenUnicodeCodepoints` to use `LintDiagnostic` ↵Josh Triplett-9/+9
directly
2025-08-18ignore frontmatters in `TokenStream::new`Deadbeef-1/+2
2025-07-22Clean code for `rustc_parse/src/lexer`xizheyin-85/+83
1. Rename `make_unclosed_delims_error` and return `Vec<Diag>` 2. change magic number `unclosed_delimiter_show_limit` to const 3. move `eof_err` below parsing logic 4. Add `calculate_spacing` for `bump` and `bump_minimal` Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-07-18Deduplicate `unmatched_delims` in `rustc_parse` to reduce confusionxizheyin-9/+26
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-06-23update to literal-escaper 0.0.4 for better API without `unreachable` and ↵Marijn Schouten-62/+32
faster string parsing
2025-05-27Report text_direction_codepoint_in_literal when parsingMatthew Jasper-2/+87
- The lint is now reported in code that gets removed/modified/duplicated by macro expansion. - Spans are more accurate - Fixes #140281
2025-05-05Implement RFC 3503: frontmattersDeadbeef-7/+99
Supercedes #137193
2025-04-23Rollup merge of #140175 - Kivooeo:new-fix-one, r=compiler-errorsChris Denton-2/+2
`rc""` more clear error message here is small fix that provides better error message when user is trying to use `rc""` the same way it was made for `rb""` example of it's work ```rust | 2 | rc"\n"; | ^^ unknown prefix | = note: prefixed identifiers and literals are reserved since Rust 2021 help: use `cr` for a raw C-string | 2 - rc"\n"; 2 + cr"\n"; | ``` **related issue** fixes #140170 cc `@cyrgani` (issue author)
2025-04-23rc and cr more clear error messageKivooeo-2/+2
2025-04-22Rename `open_brace` to `open_delimiters`xizheyin-23/+26
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-04-22Move make_unclosed_delims_error to lexer/diagonostics.rsxizheyin-1/+26
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-04-21Remove `token::{Open,Close}Delim`.Nicholas Nethercote-59/+52
By replacing them with `{Open,Close}{Param,Brace,Bracket,Invisible}`. PR #137902 made `ast::TokenKind` more like `lexer::TokenKind` by replacing the compound `BinOp{,Eq}(BinOpToken)` variants with fieldless variants `Plus`, `Minus`, `Star`, etc. This commit does a similar thing with delimiters. It also makes `ast::TokenKind` more similar to `parser::TokenType`. This requires a few new methods: - `TokenKind::is_{,open_,close_}delim()` replace various kinds of pattern matches. - `Delimiter::as_{open,close}_token_kind` are used to convert `Delimiter` values to `TokenKind`. Despite these additions, it's a net reduction in lines of code. This is because e.g. `token::OpenParen` is so much shorter than `token::OpenDelim(Delimiter::Parenthesis)` that many multi-line forms reduce to single line forms. And many places where the number of lines doesn't change are still easier to read, just because the names are shorter, e.g.: ``` - } else if self.token != token::CloseDelim(Delimiter::Brace) { + } else if self.token != token::CloseBrace { ```
2025-04-14Auto merge of #124141 - ↵bors-1/+1
nnethercote:rm-Nonterminal-and-TokenKind-Interpolated, r=petrochenkov Remove `Nonterminal` and `TokenKind::Interpolated` A third attempt at this; the first attempt was #96724 and the second was #114647. r? `@ghost`
2025-04-04Replace `rustc_lexer/unescape` with `rustc-literal-escaper` crateGuillaume Gomez-8/+4
2025-04-02Impl `Copy` for `Token` and `TokenKind`.Nicholas Nethercote-1/+1
2025-03-18Revert "Rollup merge of #136355 - ↵Ralf Jung-4/+4
GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu" This reverts commit 08dfbf49e30d917c89e49eb14cb3f1e8b8a1c9ef, reversing changes made to 10bcdad7df0de3cfb95c7bdb7b16908e73cafc09.
2025-03-17Rollup merge of #136355 - ↵Jacob Pratt-4/+4
GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu Add `*_value` methods to proc_macro lib This is the implementation of https://github.com/rust-lang/libs-team/issues/459. It allows to get the actual value (unescaped) of the different string literals. Part of https://github.com/rust-lang/rust/issues/136652. r? libs-api
2025-03-03Rename `ast::TokenKind::Not` as `ast::TokenKind::Bang`.Nicholas Nethercote-2/+2
For consistency with `rustc_lexer::TokenKind::Bang`, and because other `ast::TokenKind` variants generally have syntactic names instead of semantic names (e.g. `Star` and `DotDot` instead of `Mul` and `Range`).
2025-03-03Replace `ast::TokenKind::BinOp{,Eq}` and remove `BinOpToken`.Nicholas Nethercote-13/+13
`BinOpToken` is badly named, because it only covers the assignable binary ops and excludes comparisons and `&&`/`||`. Its use in `ast::TokenKind` does allow a small amount of code sharing, but it's a clumsy factoring. This commit removes `ast::TokenKind::BinOp{,Eq}`, replacing each one with 10 individual variants. This makes `ast::TokenKind` more similar to `rustc_lexer::TokenKind`, which has individual variants for all operators. Although the number of lines of code increases, the number of chars decreases due to the frequent use of shorter names like `token::Plus` instead of `token::BinOp(BinOpToken::Plus)`.
2025-02-10Extract `unescape` from `rustc_lexer` into its own crateGuillaume Gomez-4/+4
2025-01-28replaces few consts with statics to reduce readonly sectionklensy-1/+1
2024-12-18Re-export more `rustc_span::symbol` things from `rustc_span`.Nicholas Nethercote-4/+2
`rustc_span::symbol` defines some things that are re-exported from `rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some closely related things such as `Ident` and `kw`. So you can do `use rustc_span::{Symbol, sym}` but you have to do `use rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good reason. This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`, and changes many `rustc_span::symbol::` qualifiers in `compiler/` to `rustc_span::`. This is a 200+ net line of code reduction, mostly because many files with two `use rustc_span` items can be reduced to one.
2024-12-13Remove `Lexer`'s dependency on `Parser`.Nicholas Nethercote-97/+37
Lexing precedes parsing, as you'd expect: `Lexer` creates a `TokenStream` and `Parser` then parses that `TokenStream`. But, in a horrendous violation of layering abstractions and common sense, `Lexer` depends on `Parser`! The `Lexer::unclosed_delim_err` method does some error recovery that relies on creating a `Parser` to do some post-processing of the `TokenStream` that the `Lexer` just created. This commit just removes `unclosed_delim_err`. This change removes `Lexer`'s dependency on `Parser`, and also means that `lex_token_tree`'s return value can have a more typical form. The cost is slightly worse error messages in two obscure cases, as shown in these tests: - tests/ui/parser/brace-in-let-chain.rs: there is slightly less explanation in this case involving an extra `{`. - tests/ui/parser/diff-markers/unclosed-delims{,-in-macro}.rs: the diff marker detection is no longer supported (because that detection is implemented in the parser). In my opinion this cost is outweighed by the magnitude of the code cleanup.
2024-12-12Remove `PErr`.Nicholas Nethercote-7/+7
It's just a synonym for `Diag` that adds no value and is only used in a few places.
2024-12-09Fix typo in RFC mention 3598 -> 3593Esteban Küber-1/+1
https://github.com/rust-lang/rfcs/blob/master/text/3593-unprefixed-guarded-strings.md
2024-12-01Only error raw lifetime followed by \' in edition 2021+Michael Goulet-2/+21
2024-11-28Rollup merge of #133487 - pitaj:reserve-guarded-strings, r=fee1-deadGuillaume Gomez-6/+10
fix confusing diagnostic for reserved `##` Closes #131615
2024-11-25fix confusing diagnostic for reserved `##`Peter Jaszkowiak-6/+10
2024-11-25Streamline `lex_token_trees` error handling.Nicholas Nethercote-20/+14
- Use iterators instead of `for` loops. - Use `if`/`else` instead of `match`.
2024-11-25Fix some formatting.Nicholas Nethercote-5/+15
Must be one of those cases where the function is too long and rustfmt bails out.
2024-11-25Split `Lexer::bump`.Nicholas Nethercote-7/+27
It has two different ways of being called.
2024-11-25Merge `TokenTreesReader` into `StringReader`.Nicholas Nethercote-49/+31
There is a not-very-useful layering in the lexer, where `TokenTreesReader` contains a `StringReader`. This commit combines them and names the result `Lexer`, which is a more obvious name for it. The methods of `Lexer` are now split across `mod.rs` and `tokentrees.rs` which isn't ideal, but it doesn't seem worth moving a bunch of code to avoid it.
2024-11-21Prepare for invisible delimiters.Nicholas Nethercote-4/+12
Current places where `Interpolated` is used are going to change to instead use invisible delimiters. This prepares for that. - It adds invisible delimiter cases to the `can_begin_*`/`may_be_*` methods and the `failed_to_match_macro` that are equivalent to the existing `Interpolated` cases. - It adds panics/asserts in some places where invisible delimiters should never occur. - In `Parser::parse_struct_fields` it excludes an ident + invisible delimiter from special consideration in an error message, because that's quite different to an ident + paren/brace/bracket.
2024-11-19Remove `TokenKind::InvalidPrefix`.Nicholas Nethercote-3/+2
It was added in #123752 to handle some cases involving emoji, but it isn't necessary because it's always treated the same as `TokenKind::InvalidIdent`. This commit removes it, which makes things a little simpler.
2024-10-30Enforce that raw lifetime identifiers must be valid raw identifiersMichael Goulet-4/+10
2024-10-23"innermost", "outermost", "leftmost", and "rightmost" don't need hyphensJosh Triplett-1/+1
These are all standard dictionary words and don't require hyphenation.
2024-10-08Reserve guarded string literals (RFC 3593)Peter Jaszkowiak-1/+83
2024-09-22Reformat using the new identifier sorting from rustfmtMichael Goulet-5/+5
2024-09-17Store raw ident span for raw lifetimeMichael Goulet-0/+3
2024-09-09Remove needless returns detected by clippy in the compilerEduardo Sánchez Muñoz-1/+1
2024-09-06Add some more testsMichael Goulet-1/+1
2024-09-06Add initial support for raw lifetimesMichael Goulet-3/+36
2024-09-06Format lexerMichael Goulet-19/+22
2024-09-06Reserve prefix lifetimes tooMichael Goulet-0/+10
2024-08-14Use `impl PartialEq<TokenKind> for Token` more.Nicholas Nethercote-1/+1
This lets us compare a `Token` with a `TokenKind`. It's used a lot, but can be used even more, avoiding the need for some `.kind` uses.
2024-07-30Auto merge of #127955 - ↵bors-3/+18
chenyukang:yukang-fix-mismatched-delimiter-issue-12786, r=nnethercote Add limit for unclosed delimiters in lexer diagnostic Fixes #127868 The first commit shows the original diagnostic, and the second commit shows the changes.
2024-07-29Reformat `use` declarations.Nicholas Nethercote-19/+23
The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.