about summary refs log tree commit diff
path: root/compiler/rustc_ast/src/util/literal.rs
AgeCommit message (Collapse)AuthorLines
2025-07-31Deduplicate `IntTy`/`UintTy`/`FloatTy`.Nicholas Nethercote-3/+3
There are identical definitions in `rustc_type_ir` and `rustc_ast`. This commit removes them and places a single definition in `rustc_ast_ir`. This requires adding `rust_span` as a dependency of `rustc_ast_ir`, but means a bunch of silly conversion functions can be removed. The one annoying wrinkle is that the old version had differences in their `Debug` impls, e.g. one printed `u32` while the other printed `U32`. Some compiler error messages rely on the former (yuk), and some clippy output depends on the latter. So the commit also changes clippy to not rely on `Debug` and just implement what it needs itself.
2025-07-08update to literal-escaper-0.0.5Marijn Schouten-3/+3
2025-06-30Introduce `ByteSymbol`.Nicholas Nethercote-13/+12
It's like `Symbol` but for byte strings. The interner is now used for both `Symbol` and `ByteSymbol`. E.g. if you intern `"dog"` and `b"dog"` you'll get a `Symbol` and a `ByteSymbol` with the same index and the characters will only be stored once. The motivation for this is to eliminate the `Arc`s in `ast::LitKind`, to make `ast::LitKind` impl `Copy`, and to avoid the need to arena-allocate `ast::LitKind` in HIR. The latter change reduces peak memory by a non-trivial amount on literal-heavy benchmarks such as `deep-vector` and `tuple-stress`. `Encoder`, `Decoder`, `SpanEncoder`, and `SpanDecoder` all get some changes so that they can handle normal strings and byte strings. This change does slow down compilation of programs that use `include_bytes!` on large files, because the contents of those files are now interned (hashed). This makes `include_bytes!` more similar to `include_str!`, though `include_bytes!` contents still aren't escaped, and hashing is still much cheaper than escaping.
2025-06-23update to literal-escaper 0.0.4 for better API without `unreachable` and ↵Marijn Schouten-8/+7
faster string parsing
2025-04-04Replace `rustc_lexer/unescape` with `rustc-literal-escaper` crateGuillaume Gomez-1/+1
2025-03-18Revert "Rollup merge of #136355 - ↵Ralf Jung-1/+1
GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu" This reverts commit 08dfbf49e30d917c89e49eb14cb3f1e8b8a1c9ef, reversing changes made to 10bcdad7df0de3cfb95c7bdb7b16908e73cafc09.
2025-02-10Extract `unescape` from `rustc_lexer` into its own crateGuillaume Gomez-1/+1
2025-02-03tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`Askar Safin-2/+2
2024-12-18Re-export more `rustc_span::symbol` things from `rustc_span`.Nicholas Nethercote-2/+1
`rustc_span::symbol` defines some things that are re-exported from `rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some closely related things such as `Ident` and `kw`. So you can do `use rustc_span::{Symbol, sym}` but you have to do `use rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good reason. This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`, and changes many `rustc_span::symbol::` qualifiers in `compiler/` to `rustc_span::`. This is a 200+ net line of code reduction, mostly because many files with two `use rustc_span` items can be reduced to one.
2024-09-22Reformat using the new identifier sorting from rustfmtMichael Goulet-2/+2
2024-07-29Reformat `use` declarations.Nicholas Nethercote-3/+5
The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-04-30Remove `extern crate tracing` from numerous crates.Nicholas Nethercote-0/+1
2024-03-14Add compiler support for parsing `f16` and `f128`Trevor Gross-0/+2
2024-02-15Add suffixes to `LitError`.Nicholas Nethercote-11/+13
To avoid some unwrapping.
2024-02-15Add `ErrorGuaranteed` to `ast::LitKind::Err`, `token::LitKind::Err`.Nicholas Nethercote-3/+3
This mostly works well, and eliminates a couple of delayed bugs. One annoying thing is that we should really also add an `ErrorGuaranteed` to `proc_macro::bridge::LitKind::Err`. But that's difficult because `proc_macro` doesn't have access to `ErrorGuaranteed`, so we have to fake it.
2024-02-15Remove `LitError::LexerError`.Nicholas Nethercote-8/+3
`cook_lexer_literal` can emit an error about an invalid int literal but then return a non-`Err` token. And then `integer_lit` has to account for this to avoid printing a redundant error message. This commit changes `cook_lexer_literal` to return `Err` in that case. Then `integer_lit` doesn't need the special case, and `LitError::LexerError` can be removed.
2024-01-25Rename the unescaping functions.Nicholas Nethercote-5/+4
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string` becomes `unescape_mixed`. Because rfc3349 will mean that C string literals will no longer be the only mixed utf8 literals.
2024-01-25Rework `CStrUnit`.Nicholas Nethercote-3/+3
- Rename it as `MixedUnit`, because it will soon be used in more than just C string literals. - Change the `Byte` variant to `HighByte` and use it only for `\x80`..`\xff` cases. This fixes the old inexactness where ASCII chars could be encoded with either `Byte` or `Char`. - Add useful comments. - Remove `is_ascii`, in favour of `u8::is_ascii`.
2024-01-25Avoid useless checking in `from_token_lit`.Nicholas Nethercote-62/+21
The parser already does a check-only unescaping which catches all errors. So the checking done in `from_token_lit` never hits. But literals causing warnings can still occur in `from_token_lit`. So the commit changes `str-escape.rs` to use byte string literals and C string literals as well, to give better coverage and ensure the new assertions in `from_token_lit` are correct.
2024-01-19Pack the u128 in LitKind::IntJosh Stone-1/+1
2024-01-12Detect `NulInCStr` error earlier.Nicholas Nethercote-10/+2
By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars. NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together. One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.
2023-12-13Unify single-char and multi-char `CStrUnit::Char` handling.Nicholas Nethercote-1/+0
The two cases are equivalent. C string literals aren't common so there is no performance need here.
2023-12-13Don't rebuild raw strings when unescaping.Nicholas Nethercote-43/+30
Raw strings don't have escape sequences, so for them "unescaping" just means checking for invalid chars like bare CR. Which means there is no need to rebuild them one char or byte at a time while escaping, because the unescaped version will be the same. This commit removes that rebuilding. Also, the commit changes things so that "unescaping" is unconditional for raw strings and raw byte strings. That's simpler and they're rare enough that the perf effect is negligible.
2023-05-24Use `Option::is_some_and` and `Result::is_ok_and` in the compilerMaybe Waffle-2/+1
2023-05-02make it semantic errorDeadbeef-0/+2
2023-05-02fix TODO commentsDeadbeef-2/+8
2023-05-02update and add a few testsDeadbeef-2/+2
2023-05-02initial step towards implementing C string literalsDeadbeef-1/+54
2023-01-05Fix `uninlined_format_args` for some compiler cratesnils-3/+3
Convert all the crates that have had their diagnostic migration completed (except save_analysis because that will be deleted soon and apfloat because of the licensing problem).
2023-01-02Print correct base for too-large literalsclubby789-2/+2
Also update tests
2022-12-12Auto merge of #105160 - nnethercote:rm-Lit-token_lit, r=petrochenkovbors-48/+94
Remove `token::Lit` from `ast::MetaItemLit`. Currently `ast::MetaItemLit` represents the literal kind twice. This PR removes that redundancy. Best reviewed one commit at a time. r? `@petrochenkov`
2022-12-05Remove `LitKind::synthesize_token_lit`.Nicholas Nethercote-39/+44
It has a single call site in the HIR pretty printer, where the resulting token lit is immediately converted to a string. This commit replaces `LitKind::synthesize_token_lit` with a `Display` impl for `LitKind`, which can be used by the HIR pretty printer.
2022-12-05Remove `ExtCtxt::expr_lit`.Nicholas Nethercote-11/+23
2022-12-02Remove `token::Lit` from `ast::MetaItemLit`.Nicholas Nethercote-2/+25
`token::Lit` contains a `kind` field that indicates what kind of literal it is. `ast::MetaItemLit` currently wraps a `token::Lit` but also has its own `kind` field. This means that `ast::MetaItemLit` encodes the literal kind in two different ways. This commit changes `ast::MetaItemLit` so it no longer wraps `token::Lit`. It now contains the `symbol` and `suffix` fields from `token::Lit`, but not the `kind` field, eliminating the redundancy.
2022-12-02Add `StrStyle` to `ast::LitKind::ByteStr`.Nicholas Nethercote-5/+11
This is required to distinguish between cooked and raw byte string literals in an `ast::LitKind`, without referring to an adjacent `token::Lit`. It's a prerequisite for the next commit.
2022-12-02Rename `LitKind::to_token_lit` as `LitKind::synthesize_token_lit`.Nicholas Nethercote-2/+2
This makes it clearer that it's not a lossless conversion, which I find helpful.
2022-12-01Remove useless borrows and derefsMaybe Waffle-5/+5
2022-11-29Avoid more `MetaItem`-to-`Attribute` conversions.Nicholas Nethercote-9/+0
There is code for converting `Attribute` (syntactic) to `MetaItem` (semantic). There is also code for the reverse direction. The reverse direction isn't really necessary; it's currently only used when generating attributes, e.g. in `derive` code. This commit adds some new functions for creating `Attributes`s directly, without involving `MetaItem`s: `mk_attr_word`, `mk_attr_name_value_str`, `mk_attr_nested_word`, and `ExtCtxt::attr_{word,name_value_str,nested_word}`. These new methods replace the old functions for creating `Attribute`s: `mk_attr_inner`, `mk_attr_outer`, and `ExtCtxt::attribute`. Those functions took `MetaItem`s as input, and relied on many other functions that created `MetaItems`, which are also removed: `mk_name_value_item`, `mk_list_item`, `mk_word_item`, `mk_nested_word_item`, `{MetaItem,MetaItemKind,NestedMetaItem}::token_trees`, `MetaItemKind::attr_args`, `MetaItemLit::{from_lit_kind,to_token}`, `ExtCtxt::meta_word`. Overall this cuts more than 100 lines of code and makes thing simpler.
2022-11-29Inline and remove `MetaItemLit::from_lit_kind`.Nicholas Nethercote-7/+0
It has a single call site.
2022-11-28Rename `ast::Lit` as `ast::MetaItemLit`.Nicholas Nethercote-12/+12
2022-11-28Remove `Lit::from_included_bytes`.Nicholas Nethercote-8/+0
`Lit::from_included_bytes` calls `Lit::from_lit_kind`, but the two call sites only need the resulting `token::Lit`, not the full `ast::Lit`. This commit changes those call sites to use `LitKind::to_token_lit`, which means `from_included_bytes` can be removed.
2022-11-16Use `token::Lit` in `ast::ExprKind::Lit`.Nicholas Nethercote-22/+5
Instead of `ast::Lit`. Literal lowering now happens at two different times. Expression literals are lowered when HIR is crated. Attribute literals are lowered during parsing. This commit changes the language very slightly. Some programs that used to not compile now will compile. This is because some invalid literals that are removed by `cfg` or attribute macros will no longer trigger errors. See this comment for more details: https://github.com/rust-lang/rust/pull/102944#issuecomment-1277476773
2022-11-11Introduce `ExprKind::IncludedBytes`clubby789-0/+8
2022-11-05Remove `unescape_byte_literal`.Nicholas Nethercote-18/+11
It's easy to just use `unescape_literal` + `byte_from_char`.
2022-09-12Rollup merge of #100767 - kadiwa4:escape_ascii, r=jackh726Dylan DPC-6/+1
Remove manual <[u8]>::escape_ascii `@rustbot` label: +C-cleanup
2022-09-01Always import all tracing macros for the entire crate instead of piecemeal ↵Oli Scherer-1/+0
by module
2022-08-25Handle `Err` in `ast::LitKind::to_token_lit`.Nicholas Nethercote-1/+3
Fixes #100948.
2022-08-23Remove the symbol from `ast::LitKind::Err`.Nicholas Nethercote-2/+2
Because it's never used meaningfully.
2022-08-19use <[u8]>::escape_ascii instead of core::ascii::escape_defaultKaDiWa-6/+1
2022-08-16Rename some things related to literals.Nicholas Nethercote-9/+9
- Rename `ast::Lit::token` as `ast::Lit::token_lit`, because its type is `token::Lit`, which is not a token. (This has been confusing me for a long time.) reasonable because we have an `ast::token::Lit` inside an `ast::Lit`. - Rename `LitKind::{from,to}_lit_token` as `LitKind::{from,to}_token_lit`, to match the above change and `token::Lit`.