| Age | Commit message (Collapse) | Author | Lines |
|
|
|
|
|
Rollup of 7 pull requests
Successful merges:
- #112931 (Enable zlib in LLVM on aarch64-apple-darwin)
- #113158 (tests: unset `RUSTC_LOG_COLOR` in a test)
- #113173 (CI: include workflow name in concurrency group)
- #113335 (Reveal opaques in new solver)
- #113390 (CGU formation tweaks)
- #113399 (Structurally normalize again for byte string lit pat checking)
- #113412 (Add basic types to SMIR)
r? `@ghost`
`@rustbot` modify labels: rollup
|
|
CGU formation tweaks
Minor improvements I found while trying out something bigger that didn't work out.
r? ``@wesleywiser``
|
|
It makes it sound like the `ExprKind` and `Rvalue` are supposed to represent all pointer related
casts, when in reality their just used to share a some enum variants. Make it clear there these
are only coercion to make it clear why only some pointer related "casts" are in the enum.
|
|
|
|
An assertion failure was reported in #112946. This extra information
will help diagnose the problem.
|
|
|
|
It's needless verbosity.
|
|
|
|
It's nicer this way.
|
|
For non-incremental builds on Unix, currently all the thread names look
like `opt regex.f10ba03eb5ec7975-cgu.0`. But they are truncated by
`pthread_setname` to `opt regex.f10ba`, hiding the numeric suffix that
distinguishes them. This is really annoying when using a profiler like
Samply.
This commit changes these thread names to a form like `opt cgu.0`, which
is much better.
|
|
Currently there are two problems.
First, the CGUS don't end up in size order. The merging loop does sort
by size on each iteration, but we don't sort after the final merge, so
typically there is one CGU out of place. (And sometimes we don't enter
the merging loop at all, in which case they end up in random order.)
Second, we then assign names that differ only by a numeric suffix, and
then we sort them lexicographically by name, giving us an order like
this:
regex.f10ba03eb5ec7975-cgu.1
regex.f10ba03eb5ec7975-cgu.10
regex.f10ba03eb5ec7975-cgu.11
regex.f10ba03eb5ec7975-cgu.12
regex.f10ba03eb5ec7975-cgu.13
regex.f10ba03eb5ec7975-cgu.14
regex.f10ba03eb5ec7975-cgu.15
regex.f10ba03eb5ec7975-cgu.2
regex.f10ba03eb5ec7975-cgu.3
regex.f10ba03eb5ec7975-cgu.4
regex.f10ba03eb5ec7975-cgu.5
regex.f10ba03eb5ec7975-cgu.6
regex.f10ba03eb5ec7975-cgu.7
regex.f10ba03eb5ec7975-cgu.8
regex.f10ba03eb5ec7975-cgu.9
These two problems are really annoying when debugging and profiling the
CGUs.
This commit ensures CGUs are sorted by name *and* reverse sorted by
size. This involves (a) one extra sort by size operation, and (b)
padding the numeric indices with zeroes, e.g.
`regex.f10ba03eb5ec7975-cgu.01`.
(Note that none of this applies for incremental builds, where a
different hash-based CGU naming scheme is used.)
|
|
- Rename `create_size_estimate` as `compute_size_estimate`, because that
makes more sense for the second and subsequent calls for each CGU.
- Change `CodegenUnit::size_estimate` from `Option<usize>` to `usize`.
We can still assert that `compute_size_estimate` is called first.
- Move the size estimation for `place_mono_items` inside the function,
for consistency with `merge_codegen_units`.
|
|
There's no longer any need for them to be separate, and putting them
together reduces the amount of code.
|
|
Because CGU merging relies on CGU sizes, but the CGU sizes before
inlining aren't accurate.
This requires tweaking how the sizes are updated during merging: if CGU
A and B both have an inlined function F, then `size(A + B)` will be a
little less than `size(A) + size(B)`, because `A + B` will only have one
copy of F. Also, the minimum CGU size is increased because it now has to
account for inlined functions.
This change doesn't have much effect on compile perf, but it makes
follow-on changes that involve more sophisticated reasoning about CGU
sizes much easier.
|
|
|
|
Remove `box_free` lang item
This PR removes the `box_free` lang item, replacing it with `Box`'s `Drop` impl. Box dropping is still slightly magic because the contained value is still dropped by the compiler.
|
|
|
|
|
|
Always put the `create_size_estimate` calls and `debug_dump` calls
within a timed scopes. This makes the four main steps look more similar
to each other.
|
|
The comment says "Find the smallest CGU that has exported symbols and
put the dead function stubs in that CGU". But the code sorts the CGUs by
size (smallest first) and then searches them in reverse order, which
means it will find the *largest* CGU that has exported symbols.
The erroneous code was introduced in #92142.
This commit changes it to use a simpler search, avoiding the sort, and
fixes the bug in the process.
|
|
The other major steps in `partition` have their own function, so it's
nice for this one to be likewise.
|
|
Because tiny CGUs make compilation less efficient *and* result in worse
generated code.
We don't do this when the number of CGUs is explicitly given, because
there are times when the requested number is very important, as
described in some comments within the commit. So the commit also
introduces a `CodegenUnits` type that distinguishes between default
values and user-specified values.
This change has a roughly neutral effect on walltimes across the
rustc-perf benchmarks; there are some speedups and some slowdowns. But
it has significant wins for most other metrics on numerous benchmarks,
including instruction counts, cycles, binary size, and max-rss. It also
reduces parallelism, which is good for reducing jobserver competition
when multiple rustc processes are running at the same time. It's smaller
benchmarks that benefit the most; larger benchmarks already have CGUs
that are all larger than the minimum size.
Here are some example before/after CGU sizes for opt builds.
- html5ever
- CGUs: 16, mean size: 1196.1, sizes: [3908, 2992, 1706, 1652, 1572,
1136, 1045, 948, 946, 938, 579, 471, 443, 327, 286, 189]
- CGUs: 4, mean size: 4396.0, sizes: [6706, 3908, 3490, 3480]
- libc
- CGUs: 12, mean size: 35.3, sizes: [163, 93, 58, 53, 37, 8, 2 (x6)]
- CGUs: 1, mean size: 424.0, sizes: [424]
- tt-muncher
- CGUs: 5, mean size: 1819.4, sizes: [8508, 350, 198, 34, 7]
- CGUs: 1, mean size: 9075.0, sizes: [9075]
Note that CGUs of size 100,000+ aren't unusual in larger programs.
|
|
For example, we go from this:
```
FINAL (4059 items, total_size=232342; 16 CGUs, max_size=39608,
min_size=5468, max_size/min_size=7.2):
- CGU[0] regex.f2ff11e98f8b05c7-cgu.0 (318 items, size=39608):
- fn ...
- fn ...
```
to this:
```
FINAL
- unique items: 2726 (1459 root + 1267 inlined), unique size: 201214 (146046 root + 55168 inlined)
- placed items: 4059 (1459 root + 2600 inlined), placed size: 232342 (146046 root + 86296 inlined)
- placed/unique items ratio: 1.49, placed/unique size ratio: 1.15
- CGUs: 16, mean size: 14521.4, sizes: [39608, 31122, 20318, 20236, 16268, 13777, 12310, 10531, 10205, 9810, 9250, 9065 (x2), 7785, 7524, 5468]
- CGU[0]
- regex.f2ff11e98f8b05c7-cgu.0, size: 39608
- items: 318, mean size: 124.6, sizes: [28395, 3418, 558, 485, 259, 228, 176, 166, 146, 118, 117 (x3), 114 (x5), 113 (x3), 101, 84, 82, 77, 76, 72, 71 (x2), 66, 65, 62, 61, 59 (x2), 57, 55, 54 (x2), 53 (x4), 52 (x5), 51 (x4), 50, 48, 47, 46, 45 (x3), 44, 43 (x5), 42, 40, 38 (x4), 37, 35, 34 (x2), 32 (x2), 31, 30, 28 (x2), 27 (x2), 26 (x3), 24 (x2), 23 (x3), 22 (x2), 21, 20, 16 (x4), 15, 13 (x7), 12 (x3), 11 (x6), 10, 9 (x2), 8 (x4), 7 (x8), 6 (x38), 5 (x21), 4 (x7), 3 (x45), 2 (x63), 1 (x13)]
- fn ...
- fn ...
```
This is a lot more information, distinguishing between root items and
inlined items, showing how much duplication there is of inlined items,
plus the full range of sizes for CGUs and items within CGUs. All of
which is really helpful when analyzing this stuff and trying different
CGU formation algorithms.
|
|
Because that value can be easily obtained from `Partitioning::tcx`.
|
|
It's currently created in `place_inlined_mono_items` and then used in
`internalize_symbols`. This commit moves the creation to
`internalize_symbols`.
|
|
It's no longer used.
|
|
This loop is doing two different things. For inlined items, it's adding
them to the CGU. For all items, it's recording them in
`mono_item_placements`.
This commit splits it into two separate loops. This avoids putting root
mono items into `reachable`, and removes the low-value check that
`roots` doesn't contain inlined mono items.
|
|
Because they have a lot of overlap.
|
|
Because the next commit will merge them.
|
|
Currently it sorts by symbol name, which is a mangled name like
`_ZN1a4main17hb29587cdb6db5f42E`, which leads to non-obvious orderings.
This commit changes it to use the existing
`items_in_deterministic_order`, which iterates in source code order.
|
|
|
|
I found this confusing because it includes the root item, plus the
inlined items reachable from the root item. The new formulation
separates the two parts more clearly.
|
|
Currently it overwrites all the CGUs with new CGUs. But those new CGUs
are just copies of the old CGUs, possibly with some things added. This
commit changes things so that each CGU just gets added to in place,
which makes things simpler and clearer.
|
|
It currently uses ranges, which index into `UsageMap::used_items`. This
commit changes it to just use `Vec`, which is much simpler to construct
and use. This change does result in more allocations, but it is few
enough that the perf impact is negligible.
|
|
`UsageMap` contains `used_map`, which maps from an item to the item it
uses. This commit add `user_map`, which is the inverse.
We already compute this inverse, but later on, and it is only held as a
local variable. Its simpler and nicer to put it next to `used_map`.
|
|
Currently, the code uses multiple words to describe when a mono item `f`
uses a mono item `g`, all of which have problems.
- `f` references `g`: confusing because there are multiple kinds of use,
e.g. "`f` calls `g`" is one, but "`f` takes a (`&T`-style) reference
of `g`" is another, and that's two subtly different meanings of
"reference" in play.
- `f` accesses `g`: meh, "accesses" makes me think of data, and this is
code.
- `g` is a neighbor (or neighbour) of `f`: is verbose, and doesn't
capture the directionality.
This commit changes the code to use "`f` uses `g`" everywhere. I think
it's better than the current terminology, and the consistency is
important.
Also, `InliningMap` is renamed `UsageMap` because (a) it was always
mostly about usage, and (b) the inlining information it did record was
removed in a recent commit.
|
|
Improve CGU debug printing.
- Add more total and per-CGU measurements.
- Ensure CGUs are sorted by name before the first `debug_dump` calls, for deterministic output.
- Print items within CGUs in sorted-by-name order, for deterministic output.
- Add some assertions and comments clarifying sortedness of CGUs at various points.
An example, before:
```
INITIAL PARTITIONING (5 CodegenUnits, max=29, min=1, max/min=29.0):
CodegenUnit scev95ysd7g4b0z estimated size 2:
- fn <() as std::process::Termination>::report [(External, Hidden)] [h082b15a6d07338dcE] estimated size 2
CodegenUnit 1j0frgtl72rsz24q estimated size 29:
- fn std::rt::lang_start::<()>::{closure#0} [(External, Hidden)] [h695c7b5d6a212565E] estimated size 17
- fn std::rt::lang_start::<()> [(External, Hidden)] [h4ca942948e9cb931E] estimated size 12
CodegenUnit 5dbzi1e5qm0d7kj2 estimated size 4:
- fn <[closure@std::rt::lang_start<()>::{closure#0}] as std::ops::FnOnce<()>>::call_once - shim [(External, Hidden)] [h24eaa44f03b2b233E] estimated size 1
- fn <fn() as std::ops::FnOnce<()>>::call_once - shim(fn()) [(External, Hidden)] [hf338f5339c3711acE] estimated size 1
- fn <[closure@std::rt::lang_start<()>::{closure#0}] as std::ops::FnOnce<()>>::call_once - shim(vtable) [(External, Hidden)] [h595d414cbb7651d5E] estimated size 1
- fn std::ptr::drop_in_place::<[closure@std::rt::lang_start<()>::{closure#0}]> - shim(None) [(External, Hidden)] [h17a19dcdb40600daE] estimated size 1
CodegenUnit 220m1mqa2mlbg7r3 estimated size 1:
- fn main [(External, Hidden)] [hb29587cdb6db5f42E] estimated size 1
CodegenUnit 4ulbh241f7tvyn7x estimated size 6:
- fn std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()> [(External, Hidden)] [h41dada2c21a1259dE] estimated size 6
```
and after:
```
INITIAL PARTITIONING (9 items, total_size=42; 5 CGUs, max_size=29, min_size=1, max_size/min_size=29.0):
- CGU[0] 1j0frgtl72rsz24q (2 items, size=29):
- fn std::rt::lang_start::<()> [(External, Hidden)] [h4ca942948e9cb931E] (size=12)
- fn std::rt::lang_start::<()>::{closure#0} [(External, Hidden)] [h695c7b5d6a212565E] (size=17)
- CGU[1] 220m1mqa2mlbg7r3 (1 items, size=1):
- fn main [(External, Hidden)] [hb29587cdb6db5f42E] (size=1)
- CGU[2] 4ulbh241f7tvyn7x (1 items, size=6):
- fn std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()> [(External, Hidden)] [h41dada2c21a1259dE] (size=6)
- CGU[3] 5dbzi1e5qm0d7kj2 (4 items, size=4):
- fn <[closure@std::rt::lang_start<()>::{closure#0}] as std::ops::FnOnce<()>>::call_once - shim(vtable) [(External, Hidden)] [h595d414cbb7651d5E] (size=1)
- fn <[closure@std::rt::lang_start<()>::{closure#0}] as std::ops::FnOnce<()>>::call_once - shim [(External, Hidden)] [h24eaa44f03b2b233E] (size=1)
- fn <fn() as std::ops::FnOnce<()>>::call_once - shim(fn()) [(External, Hidden)] [hf338f5339c3711acE] (size=1)
- fn std::ptr::drop_in_place::<[closure@std::rt::lang_start<()>::{closure#0}]> - shim(None) [(External, Hidden)] [h17a19dcdb40600daE] (size=1)
- CGU[4] scev95ysd7g4b0z (1 items, size=2):
- fn <() as std::process::Termination>::report [(External, Hidden)] [h082b15a6d07338dcE] (size=2)
```
r? ``@wesleywiser``
|
|
- Add more total and per-CGU measurements.
- Ensure CGUs are sorted by name before the first `debug_dump` calls,
for deterministic output.
- Print items within CGUs in sorted-by-name order, for deterministic
output.
- Add some assertions and comments clarifying sortedness of CGUs at
various points.
An example, before:
```
INITIAL PARTITIONING (5 CodegenUnits, max=29, min=1, max/min=29.0):
CodegenUnit scev95ysd7g4b0z estimated size 2:
- fn <() as std::process::Termination>::report [(External, Hidden)] [h082b15a6d07338dcE] estimated size 2
CodegenUnit 1j0frgtl72rsz24q estimated size 29:
- fn std::rt::lang_start::<()>::{closure#0} [(External, Hidden)] [h695c7b5d6a212565E] estimated size 17
- fn std::rt::lang_start::<()> [(External, Hidden)] [h4ca942948e9cb931E] estimated size 12
CodegenUnit 5dbzi1e5qm0d7kj2 estimated size 4:
- fn <[closure@std::rt::lang_start<()>::{closure#0}] as std::ops::FnOnce<()>>::call_once - shim [(External, Hidden)] [h24eaa44f03b2b233E] estimated size 1
- fn <fn() as std::ops::FnOnce<()>>::call_once - shim(fn()) [(External, Hidden)] [hf338f5339c3711acE] estimated size 1
- fn <[closure@std::rt::lang_start<()>::{closure#0}] as std::ops::FnOnce<()>>::call_once - shim(vtable) [(External, Hidden)] [h595d414cbb7651d5E] estimated size 1
- fn std::ptr::drop_in_place::<[closure@std::rt::lang_start<()>::{closure#0}]> - shim(None) [(External, Hidden)] [h17a19dcdb40600daE] estimated size 1
CodegenUnit 220m1mqa2mlbg7r3 estimated size 1:
- fn main [(External, Hidden)] [hb29587cdb6db5f42E] estimated size 1
CodegenUnit 4ulbh241f7tvyn7x estimated size 6:
- fn std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()> [(External, Hidden)] [h41dada2c21a1259dE] estimated size 6
```
and after:
```
INITIAL PARTITIONING (9 items, total_size=42; 5 CGUs, max_size=29, min_size=1, max_size/min_size=29.0):
- CGU[0] 1j0frgtl72rsz24q (2 items, size=29):
- fn std::rt::lang_start::<()> [(External, Hidden)] [h4ca942948e9cb931E] (size=12)
- fn std::rt::lang_start::<()>::{closure#0} [(External, Hidden)] [h695c7b5d6a212565E] (size=17)
- CGU[1] 220m1mqa2mlbg7r3 (1 items, size=1):
- fn main [(External, Hidden)] [hb29587cdb6db5f42E] (size=1)
- CGU[2] 4ulbh241f7tvyn7x (1 items, size=6):
- fn std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()> [(External, Hidden)] [h41dada2c21a1259dE] (size=6)
- CGU[3] 5dbzi1e5qm0d7kj2 (4 items, size=4):
- fn <[closure@std::rt::lang_start<()>::{closure#0}] as std::ops::FnOnce<()>>::call_once - shim(vtable) [(External, Hidden)] [h595d414cbb7651d5E] (size=1)
- fn <[closure@std::rt::lang_start<()>::{closure#0}] as std::ops::FnOnce<()>>::call_once - shim [(External, Hidden)] [h24eaa44f03b2b233E] (size=1)
- fn <fn() as std::ops::FnOnce<()>>::call_once - shim(fn()) [(External, Hidden)] [hf338f5339c3711acE] (size=1)
- fn std::ptr::drop_in_place::<[closure@std::rt::lang_start<()>::{closure#0}]> - shim(None) [(External, Hidden)] [h17a19dcdb40600daE] (size=1)
- CGU[4] scev95ysd7g4b0z (1 items, size=2):
- fn <() as std::process::Termination>::report [(External, Hidden)] [h082b15a6d07338dcE] (size=2)
```
|
|
We record inlining status for mono items in `MonoItems`, and then
transfer it to `InliningMap`, for later use in
`InliningMap::with_inlining_candidates`.
But we can just compute inlining status directly in
`InliningMap::with_inlining_candidates`, because the mono item is right
there. There's no need to compute it in advance.
This commit changes the code to do that, removing the need for
`MonoItems` and `InliningMap::inlines`. This does result in more calls
to `instantiation_mode` (one per static occurrence) but the performance
effect is negligible.
|
|
r=wesleywiser
Remove `-Zcgu-partitioning-strategy`.
This option was introduced three years ago, but it's never been meaningfully used, and `default` is the only acceptable value.
Also, I think the `Partition` trait presents an interface that is too closely tied to the existing strategy and would probably be wrong for other strategies. (My rule of thumb is to not make something generic until there are at least two instances of it, to avoid this kind of problem.)
Also, I don't think providing multiple partitioning strategies to the user is a good idea, because the compiler already has enough obscure knobs.
This commit removes the option, along with the `Partition` trait, and the `Partitioner` and `DefaultPartitioning` types. I left the existing code in `compiler/rustc_monomorphize/src/partitioning/default.rs`, though I could be persuaded that moving it into
`compiler/rustc_monomorphize/src/partitioning/mod.rs` is better.
r? ``@wesleywiser``
|
|
Because it's now the only file within
`compiler/rustc_monomorphize/src/partitioning/`.
|
|
Within `compiler/rustc_monomorphize/src/partitioning/`, because the
previous commit removed the need for `default.rs` to be a separate file.
|
|
This option was introduced three years ago, but it's never been
meaningfully used, and `default` is the only acceptable value.
Also, I think the `Partition` trait presents an interface that is too
closely tied to the existing strategy and would probably be wrong for
other strategies. (My rule of thumb is to not make something generic
until there are at least two instances of it, to avoid this kind of
problem.)
Also, I don't think providing multiple partitioning strategies to the
user is a good idea, because the compiler already has enough obscure
knobs.
This commit removes the option, along with the `Partition` trait, and
the `Partitioner` and `DefaultPartitioning` types. I left the existing
code in `compiler/rustc_monomorphize/src/partitioning/default.rs`,
though I could be persuaded that moving it into
`compiler/rustc_monomorphize/src/partitioning/mod.rs` is better.
|
|
|
|
|
|
As per review request.
|
|
|
|
This is something that took me some time to figure out.
|