about summary refs log tree commit diff
path: root/compiler/rustc_parse/src/lexer/mod.rs
AgeCommit message (Collapse)AuthorLines
2025-09-09Strip frontmatter in fewer placesLeón Orell Valerian Liehr-1/+1
2025-09-06Disallow shebang in `--cfg` and `--check-cfg` argumentsUrgau-5/+30
2025-09-01fix(lexer): Only allow horizontal whitespace in frontmatterEd Page-3/+3
In writing up the reference for frontmatter, I realized that we probably shouldn't be accepting Unicode Line Ending characters between the code fence and infostring or trailing after the infostring or a code fence. In digging into the unicode specification we use for Whitespace, it divides it up into categories, so I'm deferring to what it says for horizontal whitespace for what should be used within a line. Note, I am leaving out support for Unicde Default Ignorable characters. I figure that can be discussed outside of this change within the reference and tracking issue.
2025-08-22Migrate `BuiltinLintDiag::HiddenUnicodeCodepoints` to use `LintDiagnostic` ↵Josh Triplett-9/+9
directly
2025-08-18ignore frontmatters in `TokenStream::new`Deadbeef-1/+2
2025-07-22Clean code for `rustc_parse/src/lexer`xizheyin-11/+7
1. Rename `make_unclosed_delims_error` and return `Vec<Diag>` 2. change magic number `unclosed_delimiter_show_limit` to const 3. move `eof_err` below parsing logic 4. Add `calculate_spacing` for `bump` and `bump_minimal` Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-06-23update to literal-escaper 0.0.4 for better API without `unreachable` and ↵Marijn Schouten-62/+32
faster string parsing
2025-05-27Report text_direction_codepoint_in_literal when parsingMatthew Jasper-2/+87
- The lint is now reported in code that gets removed/modified/duplicated by macro expansion. - Spans are more accurate - Fixes #140281
2025-05-05Implement RFC 3503: frontmattersDeadbeef-7/+99
Supercedes #137193
2025-04-23Rollup merge of #140175 - Kivooeo:new-fix-one, r=compiler-errorsChris Denton-2/+2
`rc""` more clear error message here is small fix that provides better error message when user is trying to use `rc""` the same way it was made for `rb""` example of it's work ```rust | 2 | rc"\n"; | ^^ unknown prefix | = note: prefixed identifiers and literals are reserved since Rust 2021 help: use `cr` for a raw C-string | 2 - rc"\n"; 2 + cr"\n"; | ``` **related issue** fixes #140170 cc `@cyrgani` (issue author)
2025-04-23rc and cr more clear error messageKivooeo-2/+2
2025-04-22Move make_unclosed_delims_error to lexer/diagonostics.rsxizheyin-1/+2
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-04-21Remove `token::{Open,Close}Delim`.Nicholas Nethercote-6/+6
By replacing them with `{Open,Close}{Param,Brace,Bracket,Invisible}`. PR #137902 made `ast::TokenKind` more like `lexer::TokenKind` by replacing the compound `BinOp{,Eq}(BinOpToken)` variants with fieldless variants `Plus`, `Minus`, `Star`, etc. This commit does a similar thing with delimiters. It also makes `ast::TokenKind` more similar to `parser::TokenType`. This requires a few new methods: - `TokenKind::is_{,open_,close_}delim()` replace various kinds of pattern matches. - `Delimiter::as_{open,close}_token_kind` are used to convert `Delimiter` values to `TokenKind`. Despite these additions, it's a net reduction in lines of code. This is because e.g. `token::OpenParen` is so much shorter than `token::OpenDelim(Delimiter::Parenthesis)` that many multi-line forms reduce to single line forms. And many places where the number of lines doesn't change are still easier to read, just because the names are shorter, e.g.: ``` - } else if self.token != token::CloseDelim(Delimiter::Brace) { + } else if self.token != token::CloseBrace { ```
2025-04-04Replace `rustc_lexer/unescape` with `rustc-literal-escaper` crateGuillaume Gomez-7/+3
2025-03-18Revert "Rollup merge of #136355 - ↵Ralf Jung-3/+3
GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu" This reverts commit 08dfbf49e30d917c89e49eb14cb3f1e8b8a1c9ef, reversing changes made to 10bcdad7df0de3cfb95c7bdb7b16908e73cafc09.
2025-03-17Rollup merge of #136355 - ↵Jacob Pratt-3/+3
GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu Add `*_value` methods to proc_macro lib This is the implementation of https://github.com/rust-lang/libs-team/issues/459. It allows to get the actual value (unescaped) of the different string literals. Part of https://github.com/rust-lang/rust/issues/136652. r? libs-api
2025-03-03Rename `ast::TokenKind::Not` as `ast::TokenKind::Bang`.Nicholas Nethercote-1/+1
For consistency with `rustc_lexer::TokenKind::Bang`, and because other `ast::TokenKind` variants generally have syntactic names instead of semantic names (e.g. `Star` and `DotDot` instead of `Mul` and `Range`).
2025-03-03Replace `ast::TokenKind::BinOp{,Eq}` and remove `BinOpToken`.Nicholas Nethercote-8/+8
`BinOpToken` is badly named, because it only covers the assignable binary ops and excludes comparisons and `&&`/`||`. Its use in `ast::TokenKind` does allow a small amount of code sharing, but it's a clumsy factoring. This commit removes `ast::TokenKind::BinOp{,Eq}`, replacing each one with 10 individual variants. This makes `ast::TokenKind` more similar to `rustc_lexer::TokenKind`, which has individual variants for all operators. Although the number of lines of code increases, the number of chars decreases due to the frequent use of shorter names like `token::Plus` instead of `token::BinOp(BinOpToken::Plus)`.
2025-02-10Extract `unescape` from `rustc_lexer` into its own crateGuillaume Gomez-3/+3
2024-12-18Re-export more `rustc_span::symbol` things from `rustc_span`.Nicholas Nethercote-2/+1
`rustc_span::symbol` defines some things that are re-exported from `rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some closely related things such as `Ident` and `kw`. So you can do `use rustc_span::{Symbol, sym}` but you have to do `use rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good reason. This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`, and changes many `rustc_span::symbol::` qualifiers in `compiler/` to `rustc_span::`. This is a 200+ net line of code reduction, mostly because many files with two `use rustc_span` items can be reduced to one.
2024-12-13Remove `Lexer`'s dependency on `Parser`.Nicholas Nethercote-17/+23
Lexing precedes parsing, as you'd expect: `Lexer` creates a `TokenStream` and `Parser` then parses that `TokenStream`. But, in a horrendous violation of layering abstractions and common sense, `Lexer` depends on `Parser`! The `Lexer::unclosed_delim_err` method does some error recovery that relies on creating a `Parser` to do some post-processing of the `TokenStream` that the `Lexer` just created. This commit just removes `unclosed_delim_err`. This change removes `Lexer`'s dependency on `Parser`, and also means that `lex_token_tree`'s return value can have a more typical form. The cost is slightly worse error messages in two obscure cases, as shown in these tests: - tests/ui/parser/brace-in-let-chain.rs: there is slightly less explanation in this case involving an extra `{`. - tests/ui/parser/diff-markers/unclosed-delims{,-in-macro}.rs: the diff marker detection is no longer supported (because that detection is implemented in the parser). In my opinion this cost is outweighed by the magnitude of the code cleanup.
2024-12-09Fix typo in RFC mention 3598 -> 3593Esteban Küber-1/+1
https://github.com/rust-lang/rfcs/blob/master/text/3593-unprefixed-guarded-strings.md
2024-12-01Only error raw lifetime followed by \' in edition 2021+Michael Goulet-2/+21
2024-11-28Rollup merge of #133487 - pitaj:reserve-guarded-strings, r=fee1-deadGuillaume Gomez-6/+10
fix confusing diagnostic for reserved `##` Closes #131615
2024-11-25fix confusing diagnostic for reserved `##`Peter Jaszkowiak-6/+10
2024-11-25Streamline `lex_token_trees` error handling.Nicholas Nethercote-20/+14
- Use iterators instead of `for` loops. - Use `if`/`else` instead of `match`.
2024-11-25Fix some formatting.Nicholas Nethercote-5/+15
Must be one of those cases where the function is too long and rustfmt bails out.
2024-11-25Merge `TokenTreesReader` into `StringReader`.Nicholas Nethercote-6/+15
There is a not-very-useful layering in the lexer, where `TokenTreesReader` contains a `StringReader`. This commit combines them and names the result `Lexer`, which is a more obvious name for it. The methods of `Lexer` are now split across `mod.rs` and `tokentrees.rs` which isn't ideal, but it doesn't seem worth moving a bunch of code to avoid it.
2024-11-19Remove `TokenKind::InvalidPrefix`.Nicholas Nethercote-3/+2
It was added in #123752 to handle some cases involving emoji, but it isn't necessary because it's always treated the same as `TokenKind::InvalidIdent`. This commit removes it, which makes things a little simpler.
2024-10-30Enforce that raw lifetime identifiers must be valid raw identifiersMichael Goulet-4/+10
2024-10-08Reserve guarded string literals (RFC 3593)Peter Jaszkowiak-1/+83
2024-09-22Reformat using the new identifier sorting from rustfmtMichael Goulet-2/+2
2024-09-17Store raw ident span for raw lifetimeMichael Goulet-0/+3
2024-09-06Add some more testsMichael Goulet-1/+1
2024-09-06Add initial support for raw lifetimesMichael Goulet-3/+36
2024-09-06Format lexerMichael Goulet-19/+22
2024-09-06Reserve prefix lifetimes tooMichael Goulet-0/+10
2024-07-29Reformat `use` declarations.Nicholas Nethercote-7/+8
The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-06-18Use a dedicated type instead of a reference for the diagnostic contextOli Scherer-2/+2
This paves the way for tracking more state (e.g. error tainting) in the diagnostic context handle
2024-06-18Prefer `dcx` methods over fields or fields' methodsOli Scherer-5/+4
2024-06-05Don't use the word "parse" for lexing operations.Nicholas Nethercote-2/+2
Lexing converts source text into a token stream. Parsing converts a token stream into AST fragments. This commit renames several lexing operations that have "parse" in the name. I think these names have been subtly confusing me for years. This is just a `s/parse/lex/` on function names, with one exception: `parse_stream_from_source_str` becomes `source_str_to_stream`, to make it consistent with the existing `source_file_to_stream`. The commit also moves that function's location in the file to be just above `source_file_to_stream`. The commit also cleans up a few comments along the way.
2024-05-23Remove `#[macro_use] extern crate tracing` from `rustc_parse`.Nicholas Nethercote-0/+1
2024-05-21Rename buffer_lint_with_diagnostic to buffer_lintXiretza-2/+2
2024-05-21Generate lint diagnostic message from BuiltinLintDiagXiretza-3/+1
Translation of the lint message happens when the actual diagnostic is created, not when the lint is buffered. Generating the message from BuiltinLintDiag ensures that all required data to construct the message is preserved in the LintBuffer, eventually allowing the messages to be moved to fluent. Remove the `msg` field from BufferedEarlyLint, it is either generated from the data in the BuiltinLintDiag or stored inside BuiltinLintDiag::Normal.
2024-05-07narrow down visibilities in `rustc_parse::lexer`Lin Yihai-1/+1
2024-04-18Rollup merge of #123752 - estebank:emoji-prefix, r=wesleywiserJubilee-1/+4
Properly handle emojis as literal prefix in macros Do not accept the following ```rust macro_rules! lexes {($($_:tt)*) => {}} lexes!(🐛"foo"); ``` Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes. Fix #123696.
2024-04-18Simplify `static_assert_size`s.Nicholas Nethercote-1/+1
We want to run them on all 64-bit platforms.
2024-04-12Rollup merge of #123223 - estebank:issue-123079, r=pnkfelixMatthias Krüger-13/+7
Fix invalid silencing of parsing error Given ```rust macro_rules! a { ( ) => { impl<'b> c for d { e::<f'g> } }; } ``` ensure an error is emitted. Fix #123079.
2024-04-10Properly handle emojis as literal prefix in macrosEsteban Küber-1/+4
Do not accept the following ```rust macro_rules! lexes {($($_:tt)*) => {}} lexes!(🐛"foo"); ``` Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro expansion of literal prefixes. Fix #123696.
2024-04-08parser: reduce visibility of unnecessary public `UnmatchedDelim`Yutaro Ohno-2/+1
`lexer::UnmatchedDelim` struct in `rustc_parse` is unnecessary public outside of the crate. This commit reduces the visibility to `pub(crate)`. Beside, this removes unnecessary field `expected_delim` that causes warnings after changing the visibility.