about summary refs log tree commit diff
path: root/compiler/rustc_ast/src/tokenstream.rs
AgeCommit message (Collapse)AuthorLines
2023-12-11Add spacing information to delimiters.Nicholas Nethercote-20/+35
This is an extension of the previous commit. It means the output of something like this: ``` stringify!(let a: Vec<u32> = vec![];) ``` goes from this: ``` let a: Vec<u32> = vec![] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![]; ```
2023-12-11Improve `print_tts` by changing `tokenstream::Spacing`.Nicholas Nethercote-18/+64
`tokenstream::Spacing` appears on all `TokenTree::Token` instances, both punct and non-punct. Its current usage: - `Joint` means "can join with the next token *and* that token is a punct". - `Alone` means "cannot join with the next token *or* can join with the next token but that token is not a punct". The fact that `Alone` is used for two different cases is awkward. This commit augments `tokenstream::Spacing` with a new variant `JointHidden`, resulting in: - `Joint` means "can join with the next token *and* that token is a punct". - `JointHidden` means "can join with the next token *and* that token is a not a punct". - `Alone` means "cannot join with the next token". This *drastically* improves the output of `print_tts`. For example, this: ``` stringify!(let a: Vec<u32> = vec![];) ``` currently produces this string: ``` let a : Vec < u32 > = vec! [] ; ``` With this PR, it now produces this string: ``` let a: Vec<u32> = vec![] ; ``` (The space after the `]` is because `TokenTree::Delimited` currently doesn't have spacing information. The subsequent commit fixes this.) The new `print_tts` doesn't replicate original code perfectly. E.g. multiple space characters will be condensed into a single space character. But it's much improved. `print_tts` still produces the old, uglier output for code produced by proc macros. Because we have to translate the generated code from `proc_macro::Spacing` to the more expressive `token::Spacing`, which results in too much `proc_macro::Along` usage and no `proc_macro::JointHidden` usage. So `space_between` still exists and is used by `print_tts` in conjunction with the `Spacing` field. This change will also help with the removal of `Token::Interpolated`. Currently interpolated tokens are pretty-printed nicely via AST pretty printing. `Token::Interpolated` removal will mean they get printed with `print_tts`. Without this change, that would result in much uglier output for code produced by decl macro expansions. With this change, AST pretty printing and `print_tts` produce similar results. The commit also tweaks the comments on `proc_macro::Spacing`. In particular, it refers to "compound tokens" rather than "multi-char operators" because lifetimes aren't operators.
2023-11-16More detail when expecting expression but encountering bad macro argumentEsteban Küber-2/+2
Partially address #71039.
2023-10-13Format all the let chains in compilerMichael Goulet-1/+3
2023-09-04improve `AttrTokenStream`mojave2-11/+7
2023-07-31Remove `TokenTreeCursor::replace_prev_and_rewind`.Nicholas Nethercote-9/+0
It's no longer used.
2023-07-31Move doc comment desugaring out of `TokenCursor`.Nicholas Nethercote-3/+89
`TokenCursor` currently does doc comment desugaring on the fly, if the `desugar_doc_comment` field is set. This requires also modifying the token stream on the fly with `replace_prev_and_rewind`. This commit moves the doc comment desugaring out of `TokenCursor`, by introducing a new `TokenStream::desugar_doc_comment` method. This separation of desugaring and iterating makes the code nicer.
2023-07-27Remove `Iterator` impl for `TokenTreeCursor`.Nicholas Nethercote-13/+8
This is surprising, but the new comment explains why. It's a logical conclusion in the drive to avoid `TokenTree` clones. `TokenTreeCursor` is now only used within `Parser`. It's still needed due to `replace_prev_and_rewind`.
2023-07-27Make `TokenTree::uninterpolate` take `&self` and return a `Cow`.Nicholas Nethercote-5/+7
Making it similar to `Token::uninterpolate`. This avoids some more token tree cloning.
2023-07-03Use Lrc::make_mut instead of Lrc::get_mutThe 8472-10/+4
2023-07-03perform TokenStream replacement in-place when possible in expand_macroThe 8472-3/+18
2023-05-13refactor: add chunks method to TokenStream to obviate rustdoc clonesCaleb Cartwright-0/+4
2023-05-06correct literals for dyn thread safeSparrowLii-1/+1
2023-05-06introduce `DynSend` and `DynSync` auto traitSparrowLii-6/+7
2023-02-03Rename `Cursor`/`CursorRef` as `TokenTreeCursor`/`RefTokenTreeCursor`.Nicholas Nethercote-14/+16
This makes it clear they return token trees, and makes for a nice comparison against `TokenCursor` which returns tokens.
2023-02-03Make clear that `TokenTree::Token` shouldn't contain a delimiter.Nicholas Nethercote-1/+2
2023-02-03Improve doc comment desugaring.Nicholas Nethercote-0/+9
Sometimes the parser needs to desugar a doc comment into `#[doc = r"foo"]`. Currently it does this in a hacky way: by pushing a "fake" new frame (one without a delimiter) onto the `TokenCursor` stack. This commit changes things so that the token stream itself is modified in place. The nice thing about this is that it means `TokenCursorFrame::delim_sp` is now only `None` for the outermost frame.
2023-01-05Fix `uninlined_format_args` for some compiler cratesnils-2/+1
Convert all the crates that have had their diagnostic migration completed (except save_analysis because that will be deleted soon and apfloat because of the licensing problem).
2022-12-10compiler: remove unnecessary imports and qualified pathsKaDiWa-1/+1
2022-12-01Remove useless borrows and derefsMaybe Waffle-4/+4
2022-11-27Prefer doc comments over `//`-comments in compilerMaybe Waffle-10/+10
2022-10-12Use `tidy-alphabetical` in the compilerNilstrieb-1/+2
2022-10-05Remove `TokenStreamBuilder`.Nicholas Nethercote-72/+45
`TokenStreamBuilder` exists to concatenate multiple `TokenStream`s together. This commit removes it, and moves the concatenation functionality directly into `TokenStream`, via two new methods `push_tree` and `push_stream`. This makes things both simpler and faster. `push_tree` is particularly important. `TokenStreamBuilder` only had a single `push` method, which pushed a stream. But in practice most of the time we push a single token tree rather than a stream, and `push_tree` avoids the need to build a token stream with a single entry (which requires two allocations, one for the `Lrc` and one for the `Vec`). The main `push_tree` use arises from a change to one of the `ToInternal` impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of a `TokenStream`. This return value is then iterated over by `concat_trees`, which does `push_tree` on each element. Furthermore, the use of `SmallVec` avoids more allocations, because there is always only one or two token trees. Note: the removed `TokenStreamBuilder::push` method had some code to deal with a quadratic blowup case from #57735. This commit removes the code. I tried and failed to reproduce the blowup from that PR, before and after this change. Various other changes have happened to `TokenStreamBuilder` in the meantime, so I suspect the original problem is no longer relevant, though I don't have proof of this. Generally speaking, repeatedly extending a `Vec` without pre-determining its capacity is *not* quadratic. It's also incredibly common, within rustc and many other Rust programs, so if there were performance problems there you'd think it would show up in other places, too.
2022-10-03Add comments to `Spacing`.Nicholas Nethercote-0/+11
2022-10-01Group together more size assertions.Nicholas Nethercote-8/+13
Also add a few more assertions for some relevant token-related types. And fix an erroneous comment in `rustc_errors`.
2022-09-09Rename `{Create,Lazy}TokenStream` as `{To,Lazy}AttrTokenStream`.Nicholas Nethercote-21/+21
`To` is better than `Create` for indicating that this is a non-consuming conversion, rather than creating something out of nothing. And the addition of `Attr` is because the current names makes them sound like they relate to `TokenStream`, but really they relate to `AttrTokenStream`.
2022-09-09Inline and remove `TokenStream::opt_from_ast`.Nicholas Nethercote-15/+12
2022-09-09Tweak some formatting.Nicholas Nethercote-9/+5
2022-09-09Change return type of `Attribute::tokens`.Nicholas Nethercote-2/+2
The `AttrTokenStream` is always immediately turned into a `TokenStream`.
2022-09-09Rename `AttrAnnotatedToken{Stream,Tree}`.Nicholas Nethercote-20/+20
These two type names are long and have long matching prefixes. I find them hard to read, especially in combinations like `AttrAnnotatedTokenStream::new(vec![AttrAnnotatedTokenTree::Token(..)])`. This commit renames them as `AttrToken{Stream,Tree}`.
2022-09-09Move `Spacing` out of `AttrAnnotatedTokenStream`.Nicholas Nethercote-16/+7
And into `AttrAnnotatedTokenTree::Token`. PR #99887 did the same thing for `TokenStream`.
2022-09-01Auto merge of #100869 - nnethercote:replace-ThinVec, r=spastorinobors-1/+2
Replace `rustc_data_structures::thin_vec::ThinVec` with `thin_vec::ThinVec` `rustc_data_structures::thin_vec::ThinVec` looks like this: ``` pub struct ThinVec<T>(Option<Box<Vec<T>>>); ``` It's just a zero word if the vector is empty, but requires two allocations if it is non-empty. So it's only usable in cases where the vector is empty most of the time. This commit removes it in favour of `thin_vec::ThinVec`, which is also word-sized, but stores the length and capacity in the same allocation as the elements. It's good in a wider variety of situation, e.g. in enum variants where the vector is usually/always non-empty. The commit also: - Sorts some `Cargo.toml` dependency lists, to make additions easier. - Sorts some `use` item lists, to make additions easier. - Changes `clean_trait_ref_with_bindings` to take a `ThinVec<TypeBinding>` rather than a `&[TypeBinding]`, because this avoid some unnecessary allocations. r? `@spastorino`
2022-08-30Use more `into_iter` rather than `drain(..)`Donough Liu-1/+1
2022-08-29Replace `rustc_data_structures::thin_vec::ThinVec` with `thin_vec::ThinVec`.Nicholas Nethercote-1/+2
`rustc_data_structures::thin_vec::ThinVec` looks like this: ``` pub struct ThinVec<T>(Option<Box<Vec<T>>>); ``` It's just a zero word if the vector is empty, but requires two allocations if it is non-empty. So it's only usable in cases where the vector is empty most of the time. This commit removes it in favour of `thin_vec::ThinVec`, which is also word-sized, but stores the length and capacity in the same allocation as the elements. It's good in a wider variety of situation, e.g. in enum variants where the vector is usually/always non-empty. The commit also: - Sorts some `Cargo.toml` dependency lists, to make additions easier. - Sorts some `use` item lists, to make additions easier. - Changes `clean_trait_ref_with_bindings` to take a `ThinVec<TypeBinding>` rather than a `&[TypeBinding]`, because this avoid some unnecessary allocations.
2022-07-29Remove `TreeAndSpacing`.Nicholas Nethercote-78/+74
A `TokenStream` contains a `Lrc<Vec<(TokenTree, Spacing)>>`. But this is not quite right. `Spacing` makes sense for `TokenTree::Token`, but does not make sense for `TokenTree::Delimited`, because a `TokenTree::Delimited` cannot be joined with another `TokenTree`. This commit fixes this problem, by adding `Spacing` to `TokenTree::Token`, changing `TokenStream` to contain a `Lrc<Vec<TokenTree>>`, and removing the `TreeAndSpacing` typedef. The commit removes these two impls: - `impl From<TokenTree> for TokenStream` - `impl From<TokenTree> for TreeAndSpacing` These were useful, but also resulted in code with many `.into()` calls that was hard to read, particularly for anyone not highly familiar with the relevant types. This commit makes some other changes to compensate: - `TokenTree::token()` becomes `TokenTree::token_{alone,joint}()`. - `TokenStream::token_{alone,joint}()` are added. - `TokenStream::delimited` is added. This results in things like this: ```rust TokenTree::token(token::Semi, stmt.span).into() ``` changing to this: ```rust TokenStream::token_alone(token::Semi, stmt.span) ``` This makes the type of the result, and its spacing, clearer. These changes also simplifies `Cursor` and `CursorRef`, because they no longer need to distinguish between `next` and `next_with_spacing`.
2022-06-20Merge `TokenStreamBuilder::push` into `TokenStreamBuilder::build`.Nicholas Nethercote-53/+32
Both functions do some modifying of streams using `make_mut`: - `push` sometimes glues the first token of the next stream to the last token of the first stream. - `build` appends tokens to the first stream. By doing all of this in the one place, things are simpler. The first stream can be modified in both ways (if necessary) in the one place, and any next stream with the first token removed doesn't need to be stored.
2022-06-20Remove `TokenStream::from_streams`.Nicholas Nethercote-40/+37
By inlining it into the only non-test call site. The one test call site is changed to use `TokenStreamBuilder`.
2022-06-20Remove `Cursor::index`.Nicholas Nethercote-4/+0
It's unused.
2022-06-20Remove `Cursor::append`.Nicholas Nethercote-11/+1
It's a weird function: it lets you modify the token stream in the middle of iteration. There is only one call site, and it is only used for the rare `ProceduralMasquerade` legacy case.
2022-06-08Use delayed error handling for `Encodable` and `Encoder` infallible.Nicholas Nethercote-2/+2
There are two impls of the `Encoder` trait: `opaque::Encoder` and `opaque::FileEncoder`. The former encodes into memory and is infallible, the latter writes to file and is fallible. Currently, standard `Result`/`?`/`unwrap` error handling is used, but this is a bit verbose and has non-trivial cost, which is annoying given how rare failures are (especially in the infallible `opaque::Encoder` case). This commit changes how `Encoder` fallibility is handled. All the `emit_*` methods are now infallible. `opaque::Encoder` requires no great changes for this. `opaque::FileEncoder` now implements a delayed error handling strategy. If a failure occurs, it records this via the `res` field, and all subsequent encoding operations are skipped if `res` indicates an error has occurred. Once encoding is complete, the new `finish` method is called, which returns a `Result`. In other words, there is now a single `Result`-producing method instead of many of them. This has very little effect on how any file errors are reported if `opaque::FileEncoder` has any failures. Much of this commit is boring mechanical changes, removing `Result` return values and `?` or `unwrap` from expressions. The more interesting parts are as follows. - serialize.rs: The `Encoder` trait gains an `Ok` associated type. The `into_inner` method is changed into `finish`, which returns `Result<Vec<u8>, !>`. - opaque.rs: The `FileEncoder` adopts the delayed error handling strategy. Its `Ok` type is a `usize`, returning the number of bytes written, replacing previous uses of `FileEncoder::position`. - Various methods that take an encoder now consume it, rather than being passed a mutable reference, e.g. `serialize_query_result_cache`.
2022-05-22rustc_parse: Move AST -> TokenStream conversion logic to `rustc_ast`Vadim Petrochenkov-7/+86
2022-05-18use `CursorRef` more, to not to clone `Tree`sklensy-2/+11
2022-04-28rustc_ast: Harmonize delimiter naming with `proc_macro::Delimiter`Vadim Petrochenkov-4/+4
2022-04-21Introduced `Cursor::next_with_spacing_ref`.Nicholas Nethercote-0/+8
This lets us clone just the parts within a `TokenTree` that need cloning, rather than the entire thing. This is a surprisingly large performance win, up to 4% on `async-std-1.10.0`.
2022-04-20Inline `Cursor::next_with_spacing`.Nicholas Nethercote-0/+1
2022-04-19Inline and remove `TokenTree::{open_tt,close_tt}`.Nicholas Nethercote-10/+0
They both have a single call site.
2022-04-19Tweak `Cursor::next_with_spacing`.Nicholas Nethercote-5/+3
This makes it more like `CursorRef::next_with_spacing`. There is no performance effect, just a consistency improvement.
2022-03-30Spellchecking compiler commentsYuri Astrakhan-1/+1
This PR cleans up the rest of the spelling mistakes in the compiler comments. This PR does not change any literal or code spelling issues.
2022-02-262 - Make more use of let_chainsCaio-35/+33
Continuation of #94376. cc #53667
2022-01-22Make `Decodable` and `Decoder` infallible.Nicholas Nethercote-1/+1
`Decoder` has two impls: - opaque: this impl is already partly infallible, i.e. in some places it currently panics on failure (e.g. if the input is too short, or on a bad `Result` discriminant), and in some places it returns an error (e.g. on a bad `Option` discriminant). The number of places where either happens is surprisingly small, just because the binary representation has very little redundancy and a lot of input reading can occur even on malformed data. - json: this impl is fully fallible, but it's only used (a) for the `.rlink` file production, and there's a `FIXME` comment suggesting it should change to a binary format, and (b) in a few tests in non-fundamental ways. Indeed #85993 is open to remove it entirely. And the top-level places in the compiler that call into decoding just abort on error anyway. So the fallibility is providing little value, and getting rid of it leads to some non-trivial performance improvements. Much of this commit is pretty boring and mechanical. Some notes about a few interesting parts: - The commit removes `Decoder::{Error,error}`. - `InternIteratorElement::intern_with`: the impl for `T` now has the same optimization for small counts that the impl for `Result<T, E>` has, because it's now much hotter. - Decodable impls for SmallVec, LinkedList, VecDeque now all use `collect`, which is nice; the one for `Vec` uses unsafe code, because that gave better perf on some benchmarks.