summary refs log tree commit diff
path: root/compiler/rustc_ast/src/tokenstream.rs
AgeCommit message (Collapse)AuthorLines
2023-01-05Fix `uninlined_format_args` for some compiler cratesnils-2/+1
Convert all the crates that have had their diagnostic migration completed (except save_analysis because that will be deleted soon and apfloat because of the licensing problem).
2022-12-10compiler: remove unnecessary imports and qualified pathsKaDiWa-1/+1
2022-12-01Remove useless borrows and derefsMaybe Waffle-4/+4
2022-11-27Prefer doc comments over `//`-comments in compilerMaybe Waffle-10/+10
2022-10-12Use `tidy-alphabetical` in the compilerNilstrieb-1/+2
2022-10-05Remove `TokenStreamBuilder`.Nicholas Nethercote-72/+45
`TokenStreamBuilder` exists to concatenate multiple `TokenStream`s together. This commit removes it, and moves the concatenation functionality directly into `TokenStream`, via two new methods `push_tree` and `push_stream`. This makes things both simpler and faster. `push_tree` is particularly important. `TokenStreamBuilder` only had a single `push` method, which pushed a stream. But in practice most of the time we push a single token tree rather than a stream, and `push_tree` avoids the need to build a token stream with a single entry (which requires two allocations, one for the `Lrc` and one for the `Vec`). The main `push_tree` use arises from a change to one of the `ToInternal` impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of a `TokenStream`. This return value is then iterated over by `concat_trees`, which does `push_tree` on each element. Furthermore, the use of `SmallVec` avoids more allocations, because there is always only one or two token trees. Note: the removed `TokenStreamBuilder::push` method had some code to deal with a quadratic blowup case from #57735. This commit removes the code. I tried and failed to reproduce the blowup from that PR, before and after this change. Various other changes have happened to `TokenStreamBuilder` in the meantime, so I suspect the original problem is no longer relevant, though I don't have proof of this. Generally speaking, repeatedly extending a `Vec` without pre-determining its capacity is *not* quadratic. It's also incredibly common, within rustc and many other Rust programs, so if there were performance problems there you'd think it would show up in other places, too.
2022-10-03Add comments to `Spacing`.Nicholas Nethercote-0/+11
2022-10-01Group together more size assertions.Nicholas Nethercote-8/+13
Also add a few more assertions for some relevant token-related types. And fix an erroneous comment in `rustc_errors`.
2022-09-09Rename `{Create,Lazy}TokenStream` as `{To,Lazy}AttrTokenStream`.Nicholas Nethercote-21/+21
`To` is better than `Create` for indicating that this is a non-consuming conversion, rather than creating something out of nothing. And the addition of `Attr` is because the current names makes them sound like they relate to `TokenStream`, but really they relate to `AttrTokenStream`.
2022-09-09Inline and remove `TokenStream::opt_from_ast`.Nicholas Nethercote-15/+12
2022-09-09Tweak some formatting.Nicholas Nethercote-9/+5
2022-09-09Change return type of `Attribute::tokens`.Nicholas Nethercote-2/+2
The `AttrTokenStream` is always immediately turned into a `TokenStream`.
2022-09-09Rename `AttrAnnotatedToken{Stream,Tree}`.Nicholas Nethercote-20/+20
These two type names are long and have long matching prefixes. I find them hard to read, especially in combinations like `AttrAnnotatedTokenStream::new(vec![AttrAnnotatedTokenTree::Token(..)])`. This commit renames them as `AttrToken{Stream,Tree}`.
2022-09-09Move `Spacing` out of `AttrAnnotatedTokenStream`.Nicholas Nethercote-16/+7
And into `AttrAnnotatedTokenTree::Token`. PR #99887 did the same thing for `TokenStream`.
2022-09-01Auto merge of #100869 - nnethercote:replace-ThinVec, r=spastorinobors-1/+2
Replace `rustc_data_structures::thin_vec::ThinVec` with `thin_vec::ThinVec` `rustc_data_structures::thin_vec::ThinVec` looks like this: ``` pub struct ThinVec<T>(Option<Box<Vec<T>>>); ``` It's just a zero word if the vector is empty, but requires two allocations if it is non-empty. So it's only usable in cases where the vector is empty most of the time. This commit removes it in favour of `thin_vec::ThinVec`, which is also word-sized, but stores the length and capacity in the same allocation as the elements. It's good in a wider variety of situation, e.g. in enum variants where the vector is usually/always non-empty. The commit also: - Sorts some `Cargo.toml` dependency lists, to make additions easier. - Sorts some `use` item lists, to make additions easier. - Changes `clean_trait_ref_with_bindings` to take a `ThinVec<TypeBinding>` rather than a `&[TypeBinding]`, because this avoid some unnecessary allocations. r? `@spastorino`
2022-08-30Use more `into_iter` rather than `drain(..)`Donough Liu-1/+1
2022-08-29Replace `rustc_data_structures::thin_vec::ThinVec` with `thin_vec::ThinVec`.Nicholas Nethercote-1/+2
`rustc_data_structures::thin_vec::ThinVec` looks like this: ``` pub struct ThinVec<T>(Option<Box<Vec<T>>>); ``` It's just a zero word if the vector is empty, but requires two allocations if it is non-empty. So it's only usable in cases where the vector is empty most of the time. This commit removes it in favour of `thin_vec::ThinVec`, which is also word-sized, but stores the length and capacity in the same allocation as the elements. It's good in a wider variety of situation, e.g. in enum variants where the vector is usually/always non-empty. The commit also: - Sorts some `Cargo.toml` dependency lists, to make additions easier. - Sorts some `use` item lists, to make additions easier. - Changes `clean_trait_ref_with_bindings` to take a `ThinVec<TypeBinding>` rather than a `&[TypeBinding]`, because this avoid some unnecessary allocations.
2022-07-29Remove `TreeAndSpacing`.Nicholas Nethercote-78/+74
A `TokenStream` contains a `Lrc<Vec<(TokenTree, Spacing)>>`. But this is not quite right. `Spacing` makes sense for `TokenTree::Token`, but does not make sense for `TokenTree::Delimited`, because a `TokenTree::Delimited` cannot be joined with another `TokenTree`. This commit fixes this problem, by adding `Spacing` to `TokenTree::Token`, changing `TokenStream` to contain a `Lrc<Vec<TokenTree>>`, and removing the `TreeAndSpacing` typedef. The commit removes these two impls: - `impl From<TokenTree> for TokenStream` - `impl From<TokenTree> for TreeAndSpacing` These were useful, but also resulted in code with many `.into()` calls that was hard to read, particularly for anyone not highly familiar with the relevant types. This commit makes some other changes to compensate: - `TokenTree::token()` becomes `TokenTree::token_{alone,joint}()`. - `TokenStream::token_{alone,joint}()` are added. - `TokenStream::delimited` is added. This results in things like this: ```rust TokenTree::token(token::Semi, stmt.span).into() ``` changing to this: ```rust TokenStream::token_alone(token::Semi, stmt.span) ``` This makes the type of the result, and its spacing, clearer. These changes also simplifies `Cursor` and `CursorRef`, because they no longer need to distinguish between `next` and `next_with_spacing`.
2022-06-20Merge `TokenStreamBuilder::push` into `TokenStreamBuilder::build`.Nicholas Nethercote-53/+32
Both functions do some modifying of streams using `make_mut`: - `push` sometimes glues the first token of the next stream to the last token of the first stream. - `build` appends tokens to the first stream. By doing all of this in the one place, things are simpler. The first stream can be modified in both ways (if necessary) in the one place, and any next stream with the first token removed doesn't need to be stored.
2022-06-20Remove `TokenStream::from_streams`.Nicholas Nethercote-40/+37
By inlining it into the only non-test call site. The one test call site is changed to use `TokenStreamBuilder`.
2022-06-20Remove `Cursor::index`.Nicholas Nethercote-4/+0
It's unused.
2022-06-20Remove `Cursor::append`.Nicholas Nethercote-11/+1
It's a weird function: it lets you modify the token stream in the middle of iteration. There is only one call site, and it is only used for the rare `ProceduralMasquerade` legacy case.
2022-06-08Use delayed error handling for `Encodable` and `Encoder` infallible.Nicholas Nethercote-2/+2
There are two impls of the `Encoder` trait: `opaque::Encoder` and `opaque::FileEncoder`. The former encodes into memory and is infallible, the latter writes to file and is fallible. Currently, standard `Result`/`?`/`unwrap` error handling is used, but this is a bit verbose and has non-trivial cost, which is annoying given how rare failures are (especially in the infallible `opaque::Encoder` case). This commit changes how `Encoder` fallibility is handled. All the `emit_*` methods are now infallible. `opaque::Encoder` requires no great changes for this. `opaque::FileEncoder` now implements a delayed error handling strategy. If a failure occurs, it records this via the `res` field, and all subsequent encoding operations are skipped if `res` indicates an error has occurred. Once encoding is complete, the new `finish` method is called, which returns a `Result`. In other words, there is now a single `Result`-producing method instead of many of them. This has very little effect on how any file errors are reported if `opaque::FileEncoder` has any failures. Much of this commit is boring mechanical changes, removing `Result` return values and `?` or `unwrap` from expressions. The more interesting parts are as follows. - serialize.rs: The `Encoder` trait gains an `Ok` associated type. The `into_inner` method is changed into `finish`, which returns `Result<Vec<u8>, !>`. - opaque.rs: The `FileEncoder` adopts the delayed error handling strategy. Its `Ok` type is a `usize`, returning the number of bytes written, replacing previous uses of `FileEncoder::position`. - Various methods that take an encoder now consume it, rather than being passed a mutable reference, e.g. `serialize_query_result_cache`.
2022-05-22rustc_parse: Move AST -> TokenStream conversion logic to `rustc_ast`Vadim Petrochenkov-7/+86
2022-05-18use `CursorRef` more, to not to clone `Tree`sklensy-2/+11
2022-04-28rustc_ast: Harmonize delimiter naming with `proc_macro::Delimiter`Vadim Petrochenkov-4/+4
2022-04-21Introduced `Cursor::next_with_spacing_ref`.Nicholas Nethercote-0/+8
This lets us clone just the parts within a `TokenTree` that need cloning, rather than the entire thing. This is a surprisingly large performance win, up to 4% on `async-std-1.10.0`.
2022-04-20Inline `Cursor::next_with_spacing`.Nicholas Nethercote-0/+1
2022-04-19Inline and remove `TokenTree::{open_tt,close_tt}`.Nicholas Nethercote-10/+0
They both have a single call site.
2022-04-19Tweak `Cursor::next_with_spacing`.Nicholas Nethercote-5/+3
This makes it more like `CursorRef::next_with_spacing`. There is no performance effect, just a consistency improvement.
2022-03-30Spellchecking compiler commentsYuri Astrakhan-1/+1
This PR cleans up the rest of the spelling mistakes in the compiler comments. This PR does not change any literal or code spelling issues.
2022-02-262 - Make more use of let_chainsCaio-35/+33
Continuation of #94376. cc #53667
2022-01-22Make `Decodable` and `Decoder` infallible.Nicholas Nethercote-1/+1
`Decoder` has two impls: - opaque: this impl is already partly infallible, i.e. in some places it currently panics on failure (e.g. if the input is too short, or on a bad `Result` discriminant), and in some places it returns an error (e.g. on a bad `Option` discriminant). The number of places where either happens is surprisingly small, just because the binary representation has very little redundancy and a lot of input reading can occur even on malformed data. - json: this impl is fully fallible, but it's only used (a) for the `.rlink` file production, and there's a `FIXME` comment suggesting it should change to a binary format, and (b) in a few tests in non-fundamental ways. Indeed #85993 is open to remove it entirely. And the top-level places in the compiler that call into decoding just abort on error anyway. So the fallibility is providing little value, and getting rid of it leads to some non-trivial performance improvements. Much of this commit is pretty boring and mechanical. Some notes about a few interesting parts: - The commit removes `Decoder::{Error,error}`. - `InternIteratorElement::intern_with`: the impl for `T` now has the same optimization for small counts that the impl for `Result<T, E>` has, because it's now much hotter. - Decodable impls for SmallVec, LinkedList, VecDeque now all use `collect`, which is nice; the one for `Vec` uses unsafe code, because that gave better perf on some benchmarks.
2021-10-25fix: inner attribute followed by outer attribute causing ICEEliseZeroTwo-6/+0
2021-05-30don't clone attrsklensy-3/+2
2021-04-19fix few typosklensy-1/+1
2021-04-11Implement token-based handling of attributes during expansionAaron Hill-5/+155
This PR modifies the macro expansion infrastructure to handle attributes in a fully token-based manner. As a result: * Derives macros no longer lose spans when their input is modified by eager cfg-expansion. This is accomplished by performing eager cfg-expansion on the token stream that we pass to the derive proc-macro * Inner attributes now preserve spans in all cases, including when we have multiple inner attributes in a row. This is accomplished through the following changes: * New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced. These are very similar to a normal `TokenTree`, but they also track the position of attributes and attribute targets within the stream. They are built when we collect tokens during parsing. An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when we invoke a macro. * Token capturing and `LazyTokenStream` are modified to work with `AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which is created during the parsing of a nested AST node to make the 'outer' AST node aware of the attributes and attribute target stored deeper in the token stream. * When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`), we tokenize and reparse our target, capturing additional information about the locations of `#[cfg]` and `#[cfg_attr]` attributes at any depth within the target. This is a performance optimization, allowing us to perform less work in the typical case where captured tokens never have eager cfg-expansion run.
2021-04-05Fix typo in TokenStream documentationGuillaume Gomez-1/+1
2021-03-27Remove (lots of) dead codeJoshua Nelson-20/+0
Found with https://github.com/est31/warnalyzer. Dubious changes: - Is anyone else using rustc_apfloat? I feel weird completely deleting x87 support. - Maybe some of the dead code in rustc_data_structures, in case someone wants to use it in the future? - Don't change rustc_serialize I plan to scrap most of the json module in the near future (see https://github.com/rust-lang/compiler-team/issues/418) and fixing the tests needed more work than I expected. TODO: check if any of the comments on the deleted code should be kept.
2021-03-26Use iter::zip in compiler/Josh Stone-1/+1
2021-03-06Change x64 size checks to not apply to x32.Harald van Dijk-1/+1
Rust contains various size checks conditional on target_arch = "x86_64", but these checks were never intended to apply to x86_64-unknown-linux-gnux32. Add target_pointer_width = "64" to the conditions.
2021-01-22Refactor token collection to capture trailing token immediatelyAaron Hill-11/+0
2021-01-03Edit rustc_ast::tokenstream docspierwill-16/+17
Fix some punctuation and wording, and add intra-documentation links.
2020-12-29Remove pretty-print/reparse hack, and add derive-specific hackAaron Hill-0/+6
2020-11-26Properly handle attributes on statementsAaron Hill-0/+11
We now collect tokens for the underlying node wrapped by `StmtKind` instead of storing tokens directly in `Stmt`. `LazyTokenStream` now supports capturing a trailing semicolon after it is initially constructed. This allows us to avoid refactoring statement parsing to wrap the parsing of the semicolon in `parse_tokens`. Attributes on item statements (e.g. `fn foo() { #[bar] struct MyStruct; }`) are now treated as item attributes, not statement attributes, which is consistent with how we handle attributes on other kinds of statements. The feature-gating code is adjusted so that proc-macro attributes are still allowed on item statements on stable. Two built-in macros (`#[global_allocator]` and `#[test]`) needed to be adjusted to support being passed `Annotatable::Stmt`.
2020-11-13Reserve space in advanceDániel Buga-1/+1
2020-11-03Expand `NtExpr` tokens only in key-value attributesVadim Petrochenkov-0/+34
2020-11-01Do not remove tokens before AST json serializationVadim Petrochenkov-2/+3
2020-10-31parser: Cleanup `LazyTokenStream` and avoid some clonesVadim Petrochenkov-43/+22
by using a named struct instead of a closure.
2020-10-21Auto merge of #77250 - Aaron1011:feature/flat-token-collection, r=petrochenkovbors-1/+73
Rewrite `collect_tokens` implementations to use a flattened buffer Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.