summary refs log tree commit diff
path: root/compiler/rustc_parse/src/lexer
AgeCommit message (Collapse)AuthorLines
2024-01-29Stop using `String` for error codes.Nicholas Nethercote-8/+8
Error codes are integers, but `String` is used everywhere to represent them. Gross! This commit introduces `ErrCode`, an integral newtype for error codes, replacing `String`. It also introduces a constant for every error code, e.g. `E0123`, and removes the `error_code!` macro. The constants are imported wherever used with `use rustc_errors::codes::*`. With the old code, we have three different ways to specify an error code at a use point: ``` error_code!(E0123) // macro call struct_span_code_err!(dcx, span, E0123, "msg"); // bare ident arg to macro call \#[diag(name, code = "E0123")] // string struct Diag; ``` With the new code, they all use the `E0123` constant. ``` E0123 // constant struct_span_code_err!(dcx, span, E0123, "msg"); // constant \#[diag(name, code = E0123)] // constant struct Diag; ``` The commit also changes the structure of the error code definitions: - `rustc_error_codes` now just defines a higher-order macro listing the used error codes and nothing else. - Because that's now the only thing in the `rustc_error_codes` crate, I moved it into the `lib.rs` file and removed the `error_codes.rs` file. - `rustc_errors` uses that macro to define everything, e.g. the error code constants and the `DIAGNOSTIC_TABLES`. This is in its new `codes.rs` file.
2024-01-25Use `unescape_unicode` for raw C string literals.Nicholas Nethercote-1/+1
They can't contain `\x` escapes, which means they can't contain high bytes, which means we can used `unescape_unicode` instead of `unescape_mixed` to unescape them. This avoids unnecessary used of `MixedUnit`.
2024-01-25Rename the unescaping functions.Nicholas Nethercote-12/+12
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string` becomes `unescape_mixed`. Because rfc3349 will mean that C string literals will no longer be the only mixed utf8 literals.
2024-01-12Detect `NulInCStr` error earlier.Nicholas Nethercote-0/+3
By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars. NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together. One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.
2024-01-11Stop using `DiagnosticBuilder::buffer` in the parser.Nicholas Nethercote-4/+4
One consequence is that errors returned by `maybe_new_parser_from_source_str` now must be consumed, so a bunch of places that previously ignored those errors now cancel them. (Most of them explicitly dropped the errors before. I guess that was to indicate "we are explicitly ignoring these", though I'm not 100% sure.)
2024-01-11Fix lifetimes in `StringReader`.Nicholas Nethercote-23/+27
Two different lifetimes are conflated. This doesn't matter right now, but needs to be fixed for the next commit to work. And the more descriptive lifetime names make the code easier to read.
2024-01-10Rename consuming chaining methods on `DiagnosticBuilder`.Nicholas Nethercote-6/+6
In #119606 I added them and used a `_mv` suffix, but that wasn't great. A `with_` prefix has three different existing uses. - Constructors, e.g. `Vec::with_capacity`. - Wrappers that provide an environment to execute some code, e.g. `with_session_globals`. - Consuming chaining methods, e.g. `Span::with_{lo,hi,ctxt}`. The third case is exactly what we want, so this commit changes `DiagnosticBuilder::foo_mv` to `DiagnosticBuilder::with_foo`. Thanks to @compiler-errors for the suggestion.
2024-01-10Rename `{create,emit}_warning` as `{create,emit}_warn`.Nicholas Nethercote-2/+2
For consistency with `warn`/`struct_warn`, and also `{create,emit}_err`, all of which use an abbreviated form.
2024-01-08Remove `DiagnosticBuilder::delay_as_bug_without_consuming`.Nicholas Nethercote-4/+4
The existing uses are replaced in one of three ways. - In a function that also has calls to `emit`, just rearrange the code so that exactly one of `delay_as_bug` or `emit` is called on every path. - In a function returning a `DiagnosticBuilder`, use `downgrade_to_delayed_bug`. That's good enough because it will get emitted later anyway. - In `unclosed_delim_err`, one set of errors is being replaced with another set, so just cancel the original errors.
2024-01-08Remove all eight `DiagnosticBuilder::*_with_code` methods.Nicholas Nethercote-36/+37
These all have relatively low use, and can be perfectly emulated with a simpler construction method combined with `code` or `code_mv`.
2024-01-08Use chaining in `DiagnosticBuilder` construction.Nicholas Nethercote-3/+3
To avoid the use of a mutable local variable, and because it reads more nicely.
2024-01-08Make `DiagnosticBuilder::emit` consuming.Nicholas Nethercote-1/+1
This works for most of its call sites. This is nice, because `emit` very much makes sense as a consuming operation -- indeed, `DiagnosticBuilderState` exists to ensure no diagnostic is emitted twice, but it uses runtime checks. For the small number of call sites where a consuming emit doesn't work, the commit adds `DiagnosticBuilder::emit_without_consuming`. (This will be removed in subsequent commits.) Likewise, `emit_unless` becomes consuming. And `delay_as_bug` becomes consuming, while `delay_as_bug_without_consuming` is added (which will also be removed in subsequent commits.) All this requires significant changes to `DiagnosticBuilder`'s chaining methods. Currently `DiagnosticBuilder` method chaining uses a non-consuming `&mut self -> &mut Self` style, which allows chaining to be used when the chain ends in `emit()`, like so: ``` struct_err(msg).span(span).emit(); ``` But it doesn't work when producing a `DiagnosticBuilder` value, requiring this: ``` let mut err = self.struct_err(msg); err.span(span); err ``` This style of chaining won't work with consuming `emit` though. For that, we need to use to a `self -> Self` style. That also would allow `DiagnosticBuilder` production to be chained, e.g.: ``` self.struct_err(msg).span(span) ``` However, removing the `&mut self -> &mut Self` style would require that individual modifications of a `DiagnosticBuilder` go from this: ``` err.span(span); ``` to this: ``` err = err.span(span); ``` There are *many* such places. I have a high tolerance for tedious refactorings, but even I gave up after a long time trying to convert them all. Instead, this commit has it both ways: the existing `&mut self -> Self` chaining methods are kept, and new `self -> Self` chaining methods are added, all of which have a `_mv` suffix (short for "move"). Changes to the existing `forward!` macro lets this happen with very little additional boilerplate code. I chose to add the suffix to the new chaining methods rather than the existing ones, because the number of changes required is much smaller that way. This doubled chainging is a bit clumsy, but I think it is worthwhile because it allows a *lot* of good things to subsequently happen. In this commit, there are many `mut` qualifiers removed in places where diagnostics are emitted without being modified. In subsequent commits: - chaining can be used more, making the code more concise; - more use of chaining also permits the removal of redundant diagnostic APIs like `struct_err_with_code`, which can be replaced easily with `struct_err` + `code_mv`; - `emit_without_diagnostic` can be removed, which simplifies a lot of machinery, removing the need for `DiagnosticBuilderState`.
2024-01-04Inline and remove `StringReader::struct_fatal_span_char`.Nicholas Nethercote-22/+11
It has a single call site.
2024-01-03Rename some `Diagnostic` setters.Nicholas Nethercote-1/+1
`Diagnostic` has 40 methods that return `&mut Self` and could be considered setters. Four of them have a `set_` prefix. This doesn't seem necessary for a type that implements the builder pattern. This commit removes the `set_` prefixes on those four methods.
2023-12-24Remove `Session` methods that duplicate `DiagCtxt` methods.Nicholas Nethercote-8/+8
Also add some `dcx` methods to types that wrap `TyCtxt`, for easier access.
2023-12-24Remove `ParseSess` methods that duplicate `DiagCtxt` methods.Nicholas Nethercote-11/+15
Also add missing `#[track_caller]` attributes to `DiagCtxt` methods as necessary to keep tests working.
2023-12-24Remove `Parser` methods that duplicate `DiagCtxt` methods.Nicholas Nethercote-1/+1
2023-12-19Add `EmitResult` associated type to `EmissionGuarantee`.Nicholas Nethercote-2/+4
This lets different error levels share the same return type from `emit_*`. - A lot of inconsistencies in the `DiagCtxt` API are removed. - `Noted` is removed. - `FatalAbort` is introduced for fatal errors (abort via `raise`), replacing the `EmissionGuarantee` impl for `!`. - `Bug` is renamed `BugAbort` (to avoid clashing with `Level::Bug` and to mirror `FatalAbort`), and modified to work in the new way with bug errors (abort via panic). - Various diagnostic creators and emitters updated to the new, better signatures. Note that `DiagCtxt::bug` no longer needs to call `panic_any`, because `emit` handles that. Also shorten the obnoxiously long `diagnostic_builder_emit_producing_guarantee` name.
2023-12-18Rename many `DiagCtxt` arguments.Nicholas Nethercote-25/+22
2023-12-18Rename `ParseSess::span_diagnostic` as `ParseSess::dcx`.Nicholas Nethercote-13/+13
2023-12-18Rename `Handler` as `DiagCtxt`.Nicholas Nethercote-2/+2
2023-12-16Auto merge of #118897 - nnethercote:more-unescaping-cleanups, r=fee1-deadbors-33/+42
More unescaping cleanups More minor improvements I found while working on #118699. r? `@fee1-dead`
2023-12-14Remove one use of `span_bug_no_panic`.Nicholas Nethercote-2/+1
It's unclear why this is used here. All entries in the third column of `UNICODE_ARRAY` are covered by `ASCII_ARRAY`, so if the lookup fails it's a genuine compiler bug. It was added way back in #29837, for no clear reason. This commit changes it to `span_bug`, which is more typical.
2023-12-13Rename the `span` args to `emit_unescape_error`.Nicholas Nethercote-33/+42
The `span` arg is described in a comment as "interior span of the literal, without quotes", which is incorrect. It's actually the span of the error part of the literal, corresponding to `range`. This commit renames `span` and `span_without_quotes` to make things clearer, and fixes the erroneous comment.
2023-12-11Add spacing information to delimiters.Nicholas Nethercote-35/+54
This is an extension of the previous commit. It means the output of something like this: ``` stringify!(let a: Vec<u32> = vec![];) ``` goes from this: ``` let a: Vec<u32> = vec![] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![]; ```
2023-12-11Improve `print_tts` by changing `tokenstream::Spacing`.Nicholas Nethercote-2/+7
`tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-12-01Auto merge of #117472 - jmillikin:stable-c-str-literals, r=Nilstriebbors-3/+0
Stabilize C string literals RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html Tracking issue: https://github.com/rust-lang/rust/issues/105723 Documentation PR (reference manual): https://github.com/rust-lang/reference/pull/1423 # Stabilization report Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later. ```rust const HELLO: &core::ffi::CStr = c"Hello, world!"; ``` C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`. ## Implementation Originally implemented by PR https://github.com/rust-lang/rust/pull/108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021. The current implementation landed in PR https://github.com/rust-lang/rust/pull/113476, which restricts C string literals to Rust edition >= 2021. ## Resolutions to open questions from the RFC * Adding C character literals (`c'.'`) of type `c_char` is not part of this feature. * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future. * C string literals should not be blocked on making `&CStr` a thin pointer. * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`. * The unstable `concat_bytes!` macro should not accept `c"..."` literals. * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous. * Adding a type to represent C strings containing valid UTF-8 is not part of this feature. * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
2023-11-21Fix `clippy::needless_borrow` in the compilerNilstrieb-6/+6
`x clippy compiler -Aclippy::all -Wclippy::needless_borrow --fix`. Then I had to remove a few unnecessary parens and muts that were exposed now.
2023-11-11Move unclosed delim errors to separate functionsjwang05-53/+58
2023-11-10Correctly handle while-let-chainssjwang05-1/+1
2023-11-09Catch an edge casesjwang05-1/+5
2023-11-09Catch stray { in let-chainssjwang05-1/+33
2023-11-01Stabilize C string literalsJohn Millikin-3/+0
2023-10-30When encountering unclosed delimiters during parsing, check for diff markersEsteban Küber-24/+54
Fix #116252.
2023-10-13Format all the let chains in compilerMichael Goulet-3/+4
2023-10-12Reorder an expression to improve readability.Nicholas Nethercote-12/+7
2023-10-12Rename `Token::is_op` as `Token::is_punct`.Nicholas Nethercote-2/+5
For consistency with `proc_macro::Punct`.
2023-08-16Fix suggestion for attempting to define a string with single quotesbeetrees-14/+8
2023-08-13Remove reached_eof from ParseSessbjorn3-1/+0
It was only ever set in a function which isn't called anywhere.
2023-07-30inline format!() args up to and including rustc_middleMatthias Krüger-11/+10
2023-07-25Auto merge of #113476 - fee1-dead-contrib:c-str-lit, r=petrochenkovbors-9/+32
Reimplement C-str literals This reverts #113334, cc `@fmease.` While converting lexer tokens to ast Tokens in `rustc_parse`, we check the edition of the span of the token. If the edition < 2021, we split the token into two, one being the identifier and other being the str literal.
2023-07-25extract common codeDeadbeef-11/+10
2023-07-23fix some clippy::style findingsMatthias Krüger-1/+1
comparison_to_empty iter_nth_zero for_kv_map manual_next_back redundant_pattern
2023-07-23reimplement C string literalsDeadbeef-1/+25
2023-06-10Use a better linkHankai Zhang-1/+1
2023-06-10Update links to Rust Reference page on literals in diagnosticHankai Zhang-1/+1
Instead of linking to the old Rust Reference site on static.rust-lang.org, link to the current website doc.rust-lang.org/stable/reference instead in diagnostic about incorrect literals.
2023-05-16Avoid `&format("...")` calls in error message code.Nicholas Nethercote-1/+1
Error message all end up passing into a function as an `impl Into<{D,Subd}iagnosticMessage>`. If an error message is creatd as `&format("...")` that means we allocate a string (in the `format!` call), then take a reference, and then clone (allocating again) the reference to produce the `{D,Subd}iagnosticMessage`, which is silly. This commit removes the leading `&` from a lot of these cases. This means the original `String` is moved into the `{D,Subd}iagnosticMessage`, avoiding the double allocations. This requires changing some function argument types from `&str` to `String` (when all arguments are `String`) or `impl Into<{D,Subd}iagnosticMessage>` (when some arguments are `String` and some are `&str`).
2023-05-05Rollup merge of #108801 - fee1-dead-contrib:c-str, r=compiler-errorsDylan DPC-11/+74
Implement RFC 3348, `c"foo"` literals RFC: https://github.com/rust-lang/rfcs/pull/3348 Tracking issue: #105723
2023-05-03Restrict `From<S>` for `{D,Subd}iagnosticMessage`.Nicholas Nethercote-12/+8
Currently a `{D,Subd}iagnosticMessage` can be created from any type that impls `Into<String>`. That includes `&str`, `String`, and `Cow<'static, str>`, which are reasonable. It also includes `&String`, which is pretty weird, and results in many places making unnecessary allocations for patterns like this: ``` self.fatal(&format!(...)) ``` This creates a string with `format!`, takes a reference, passes the reference to `fatal`, which does an `into()`, which clones the reference, doing a second allocation. Two allocations for a single string, bleh. This commit changes the `From` impls so that you can only create a `{D,Subd}iagnosticMessage` from `&str`, `String`, or `Cow<'static, str>`. This requires changing all the places that currently create one from a `&String`. Most of these are of the `&format!(...)` form described above; each one removes an unnecessary static `&`, plus an allocation when executed. There are also a few places where the existing use of `&String` was more reasonable; these now just use `clone()` at the call site. As well as making the code nicer and more efficient, this is a step towards possibly using `Cow<'static, str>` in `{D,Subd}iagnosticMessage::{Str,Eager}`. That would require changing the `From<&'a str>` impls to `From<&'static str>`, which is doable, but I'm not yet sure if it's worthwhile.
2023-05-02make cook genericDeadbeef-37/+27