| Age | Commit message (Collapse) | Author | Lines |
|
Signed-off-by: onur-ozkan <work@onurozkan.dev>
|
|
Optimize `librustc_driver.so` with BOLT
This PR optimizes `librustc_driver.so` on 64-bit Linux CI with BOLT.
### Code
One thing that's not clear yet to me how to resolve is how to best pass a linker flag that we need for BOLT (the second commit). It is currently passed unconditionally, which is not a good idea. We somehow have to:
1) Only pass it when we actually plan to use BOLT. How to best do that? `config.toml` entry? Environment variable? CLI flag for bootstrap? BOLT optimization is done by `opt-dist`, therefore bootstrap doesn't know about it by default.
2) Only pass it to `librustc_driver.so` (see performance below).
Some discussion of this flag already happened on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/326414-t-infra.2Fbootstrap/topic/Adding.20a.20one-off.20linker.20flag).
### Performance
Latest perf. results can be found [here](https://github.com/rust-lang/rust/pull/102487#issuecomment-1743469053). Note that instruction counts are not very interesting here, there are only regressions on hello world programs. Probably caused by a larger C++ libstd (?).
Summary:
- :heavy_check_mark: `-1.8%` mean improvement in cycle counts across many primary benchmarks.
- :heavy_check_mark: `-1.8%` mean Max-RSS improvement.
- :heavy_multiplication_x: 34 MiB (+48%) artifact size regression of `librustc_driver.so`.
- This is caused by building `librustc_driver.so` with relocations (which are required for BOLT). Hopefully, it will be [fixed](https://discourse.llvm.org/t/bolt-rfc-a-new-mode-to-rewrite-entire-binary/68674) in the future with BOLT improvements, but now trying to reduce this size increase is [tricky](https://github.com/rust-lang/rust/pull/114649).
- Note that the size of this file was recently reduced in https://github.com/rust-lang/rust/pull/115554 by pretty much the same amount (33 MiB). So the size after this PR is basically the same as it was for the last ~year.
- :heavy_multiplication_x: 1.4 MiB (+53%) artifact size regression of `rustc`.
- This is annoying and pretty much unnecessary. It is caused by the way relocations are currently applied in this PR, because they are applied both to `librustc_driver.so` (where they are needed) and for `rustc` (where they aren't needed), since both are built with a single cargo invocation. We might need e.g. some tricks in the bootstrap `rustc` shim to only apply the relocation flag for the shared library and not for `rustc`.
### CI time
CI (try build) got slower by ~5 minutes, which is fine, IMO. It can be further reduced by running LLVM and `librustc_driver` BOLT profile gathering at the same time (now they are gathered separately for LLVM and `librustc_driver`).
r? `@Mark-Simulacrum`
Also CC `@onur-ozkan,` primarily for the bootstrap linker flag issue.
|
|
|
|
|
|
|
|
Signed-off-by: onur-ozkan <work@onurozkan.dev>
|
|
|
|
cc https://rust-lang.zulipchat.com/#narrow/stream/326414-t-infra.2Fbootstrap/topic/Building.20.60coretests.60.20by.20hand
|
|
Nils had an excellent idea the other day: the same way that rustdoc is
able to load `rustc_driver` from the sysroot, ui-fulldeps tests should
also be able to load it from the sysroot. That allows us to run fulldeps
tests with stage1, without having to fully rebuild the compiler twice.
It does unfortunately have the downside that we're running the tests on
the *bootstrap* compiler, not the in-tree sources, but since most of the
fulldeps tests are for the *API* of the compiler, that seems ok.
I think it's possible to extend this to `run-make-fulldeps`, but I've
run out of energy for tonight.
- Move `plugin` tests into a subdirectory.
Plugins are loaded at runtime with `dlopen` and so require the ABI of
the running compile to match the ABI of the compiler linked with
`rustc_driver`. As a result they can't be supported in stage 1 and have
to use `// ignore-stage1`.
- Remove `ignore-stage1` from most non-plugin tests
- Ignore diagnostic tests in stage 1. Even though this requires a stage
2 build to load rustc_driver, it's primarily testing the error message
that the *running* compiler emits when the diagnostic struct is malformed.
- Pass `-Zforce-unstable-if-unmarked` in stage1, not just stage2. That
allows running `hash-stable-is-unstable` in stage1, since it now
suggests adding `rustc_private` to enable loading the crates.
- Add libLLVM.so to the stage0 target sysroot, to allow fulldeps tests
that act as custom drivers to load it at runtime.
- Pass `--sysroot stage0-sysroot` in compiletest so that we use the
correct version of std.
|
|
|
|
|
|
|
|
|
|
The compiler currently has `-Ztime` and `-Ztime-passes`. I've used
`-Ztime-passes` for years but only recently learned about `-Ztime`.
What's the difference? Let's look at the `-Zhelp` output:
```
-Z time=val -- measure time of rustc processes (default: no)
-Z time-passes=val -- measure time of each rustc pass (default: no)
```
The `-Ztime-passes` description is clear, but the `-Ztime` one is less so.
Sounds like it measures the time for the entire process?
No. The real difference is that `-Ztime-passes` prints out info about passes,
and `-Ztime` does the same, but only for a subset of those passes. More
specifically, there is a distinction in the profiling code between a "verbose
generic activity" and an "extra verbose generic activity". `-Ztime-passes`
prints both kinds, while `-Ztime` only prints the first one. (It took me
a close reading of the source code to determine this difference.)
In practice this distinction has low value. Perhaps in the past the "extra
verbose" output was more voluminous, but now that we only print stats for a
pass if it exceeds 5ms or alters the RSS, `-Ztime-passes` is less spammy. Also,
a lot of the "extra verbose" cases are for individual lint passes, and you need
to also use `-Zno-interleave-lints` to see those anyway.
Therefore, this commit removes `-Ztime` and the associated machinery. One thing
to note is that the existing "extra verbose" activities all have an extra
string argument, so the commit adds the ability to accept an extra argument to
the "verbose" activities.
|
|
This commit does three things:
* First, it passes --cfg=bootstrap on stage 0 for rustdoc
invocations on proc_macro crates. This mirrors what we
do already for rustc invocations of those, and is needed
because cargo doesn't respect RUSTFLAGS or RUSTDOCFLAGS
when confronted with a proc macro.
* Second, it marks the bootstrap config variable as expected.
This is needed both on later stages where it's not set,
but also on stage 0, where it is set.
* Third, it adjusts the comment in the rustc wrapper to better
reflect the reason why we set the bootstrap variable as
expected: due to recent changes, setting it as expected
is also required even if the cfg variable is passed: ebf4cc361e0d0f11a25b42372bd629953365d17e .
|
|
|
|
As was discovered in https://github.com/rust-lang/rust/pull/93628#issuecomment-1154697627 ,
adding #[cfg(bootstrap)] to a rust-internal proc macro crate
would yield an unexpected cfg name error, at least on later
stages wher the bootstrap cfg arg wasn't set.
rustc already passes arguments to mark bootstrap as expected,
however the means of delivery through the RUSTFLAGS env var
is unable to reach proc macro crates, as described
in the issue linked in the code this commit touches.
This wouldn't be an issue for cfg args that get passed through
RUSTFLAGS, as they would never become *active* either, so
any usage of one of these flags in a proc macro's code would
legitimately yield a lint warning. But since dc302587e2cf5105a3a864319d7e7bcb434bba20,
rust takes extra measures to pass --cfg=bootstrap even in
proc macros, by passing it via the wrapper. Thus, we need
to send the flags to mark bootstrap as expected also from the
wrapper, so that #[cfg(bootstrap)] also works from proc macros.
I want to thank Urgau and jplatte for helping me find the cause of this. ❤️
|
|
This reverts commit 6499c5e7fc173a3f55b7a3bd1e6a50e9edef782d, reversing
changes made to 78450d2d602b06d9b94349aaf8cece1a4acaf3a8.
|
|
This reverts commit e7cc3bddbe0d0e374d05e7003e662bba1742dbae, reversing
changes made to 734368a200904ef9c21db86c595dc04263c87be0.
|
|
pipeline without causing rebuilds
Useful for -Ztreat-err-as-bug
|
|
This slightly improves compilation time by reducing linking time
(saving about a 1/10 of the the total compilation time after
changing rustbuild) and slightly reduces disk usage (from 16MB for
the rustc wrapper to 4MB).
|
|
Currently the verbosity settings are:
- 2: RUSTC-SHIM envvars get spammed on every invocation, O(30) lines
cargo is passed -v which outputs CLI invocations, O(5) lines
- 3: cargo is passed -vv which outputs build script output, O(0-10) lines
This commit changes it to:
- 1: cargo is passed -v, O(5) lines
- 2: cargo is passed -vv, O(10) lines
- 3: RUSTC-SHIM envvars get spammed, O(30) lines
|
|
Cargo ignores RUSTFLAGS when building proc macro crates. However,
sometimes rustc_macro needs to have conditional compilation when there
are breaking changes to the `libproc_macro` API (see for example
tell the difference between stage 0 and stage 1.
Another alternative is to unconditionally build rustc_macros with the
master libstd instead of the beta one (i.e. use `--sysroot
stage0-sysroot`), but that led to strange and maddening errors:
```
error[E0460]: found possibly newer version of crate `std` which `proc_macro2` depends on
--> /home/joshua/.local/lib/cargo/registry/src/github.com-1ecc6299db9ec823/tracing-attributes-0.1.13/src/lib.rs:90:5
|
90 | use proc_macro2::TokenStream;
| ^^^^^^^^^^^
|
= note: perhaps that crate needs to be recompiled?
= note: the following crate versions were found:
crate `std`: /home/joshua/rustc2/build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-b3602c301b71cc3d.rmeta
crate `proc_macro2`: /home/joshua/rustc2/build/x86_64-unknown-linux-gnu/stage0-rustc/release/deps/libproc_macro2-a83c1f01610c129e.rlib
```
|
|
This bug was only visible on mac. Also, print_step_rusage is a relatively new
internal feature, that is not heavily used, and has no tests. All of these
factors contributed to how this went uncaught this long. Thanks to Josh Triplett
for pointing it out!
|
|
Attempt to gather similar stats as rusage on Windows
A follow up to #82532. This is a bit hacked in because I think we need to discuss this before merging, but this is an attempt to gather similar metrics as `libc::rusage` on Windows.
Some comments on differences:
* Currently, we're passing `RUSAGE_CHILDREN` to `rusage` which collects statistics on all children that have been waited on and terminated. I believe this is currently just the invocation of the real `rustc` that the shim is wrapping. Does `rustc` itself spawn children processes? The windows version gets the child processes handle when spawning it, and uses that to collect the statistics. For maxrss, `rusage` will return "the resident set size of the largest child, not the maximum resident set size of the process tree.", the Windows version will only collect statistics on the wrapped `rustc` child process directly even if some theoretical sub process has a larger memory footprint.
* There might be subtle differences between `rusage`'s "resident set" and Window's "working set". The "working set" and "resident set" should both be the number of pages that are in memory and which would not cause a page fault when accessed.
* I'm not yet sure how best to get the same information that `ru_minflt`, `ru_inblock`, `ru_oublock`, `ru_nivcsw ` and `ru_nvcsw` provide.
r? `@pnkfelix`
|
|
|
|
unconditionally.
1. I added `--test` based on review feedback from simulacrum: I decided I would
rather include such extra context than get confused later on by its absence.
(However, I chose to encode it differently than how `[RUSTC-TIMING]` does... I
don't have much basis for doing so, other than `--test` to me more directly
reflects what it came from.)
2. I also decided to include `[RUSTC-SHIM]` at start of all of these lines
driven by the verbosity level, to make to clear where these lines of text
originate from. (Basically, I skimmed over the output and realized that a casual
observer might not be able to tell where this huge set of new lines were coming
from.)
|
|
|
|
This change is mainly motivated by an issue with the environment
printing I added in PR 82403: multiple rustc invocations progress
in parallel, and the environment output, spanning multiple lines,
gets interleaved in ways make it difficult to extra the enviroment settings.
(This aforementioned difficulty is more of a hiccup than an outright
show-stopper, because the environment variables tend to be the same for all of
the rustc invocations, so it doesn't matter too much if one mixes up which lines
one is looking at. But still: Better to fix it.)
|
|
|
|
Add `build.print_step_rusage` to config.toml
Adds `build.print_step_rusage` to config.toml, which is meant to be an easy way to let compiler developers get feedback on the terminal during bootstrap about resource usage during each step.
The output is piggy-backed on `[PRINT-STEP-TIMINGS]`, mostly because the functionality seemed to naturally fit there in the overall control-flow and output structure (even if very little is shared between the implementations themselves).
Some sample output (from my Linux box, where I believe the `max rss` output to be somewhat trust-worthy...):
```
[...]
Compiling regex v1.4.3
[RUSTC-TIMING] tempfile test:false 0.323 user: 1.418662 sys: 0.81767 max rss (kb): 182084 page reclaims: 26615 page faults: 0 fs block inputs: 0 fs block outputs: 2160 voluntary ctxt switches: 798 involuntary ctxt switches: 131
Completed tempfile v3.1.0 in 0.3s
[RUSTC-TIMING] chalk_ir test:false 1.890 user: 1.893603 sys: 0.99663 max rss (kb): 239432 page reclaims: 32107 page faults: 0 fs block inputs: 0 fs block outputs: 25008 voluntary ctxt switches: 108 involuntary ctxt switches: 183
Completed chalk-ir v0.55.0 in 1.9s
Compiling rustc_data_structures v0.0.0 (/home/pnkfelix/Dev/Rust/rust.git/compiler/rustc_data_structures)
[RUSTC-TIMING] chrono test:false 1.244 user: 3.333198 sys: 0.134963 max rss (kb): 246612 page reclaims: 44857 page faults: 0 fs block inputs: 0 fs block outputs: 11704 voluntary ctxt switches: 1043 involuntary ctxt switches: 326
Completed chrono v0.4.15 in 1.3s
[RUSTC-TIMING] rustc_rayon test:false 1.332 user: 1.763912 sys: 0.75996 max rss (kb): 239076 page reclaims: 35285 page faults: 0 fs block inputs: 0 fs block outputs: 19576 voluntary ctxt switches: 359 involuntary ctxt switches: 168
Completed rustc-rayon v0.3.0 in 1.3s
Compiling matchers v0.0.1
[RUSTC-TIMING] matchers test:false 0.100 user: 0.94495 sys: 0.15119 max rss (kb): 140076 page reclaims: 8200 page faults: 0 fs block inputs: 0 fs block outputs: 392 voluntary ctxt switches: 43 involuntary ctxt switches: 12
Completed matchers v0.0.1 in 0.1s
[...]
```
|
|
On non-unix platforms, does not try to call `getrusage` (and does not attempt to
implement its own shim; that could be follow-on work, though its probably best
to not invest too much effort there, versus using separate dedicated tooling).
On unix platforms, calls libc::rusage and attempts to emit the subset of fields
that are supported on Linux and Mac OS X. Omits groups of related stats which
appear to be unsupported on the platform (due to them all remaining zero).
Adjusts output to compensate for Mac using bytes instead of kb (a well known
discrepancy on Mac OS X). However, so far I observe a lot of strange values
(orders of magnitude wrong) reported on Mac OS X in some cases, so I would not
trust this in that context currently.
|
|
Fix issue 38686.
(update: placated tidy.)
|
|
|
|
These are quite long, usually, and in most cases not interesting. On smaller
terminals they can take up more than a full page of output, hiding the error
diagnostics emitted.
|
|
|
|
|
|
justification
|
|
|
|
exec never returns, it replaces the current process. so anything after it is
unreachable. that's not how exec_cmd() is used in the surrounding code
|
|
|
|
This commit moves the compiler-builtins-specific build logic from
`src/bootstrap/bin/rustc.rs` into the workspace `Cargo.toml`'s
`[profile]` configuration. Now that rust-lang/cargo#7253 is fixed we can
ensure that Cargo knows about debug assertions settings, and it can also
be configured to specifically disable debug assertions unconditionally
for compiler-builtins. This should improve rebuild logic when
debug-assertions settings change and also improve build-std integration
where Cargo externally now has an avenue to learn how to build
compiler-builtins as well.
|
|
|
|
Add an option to use LLD to link the compiler on Windows platforms
Based on https://github.com/rust-lang/rust/pull/68609.
Using LLD is good way to improve compile times on Windows since `link.exe` is quite slow. The time for `x.py build --stage 1 src/libtest` goes from 0:12:00 to 0:08:29. Compile time for `rustc_driver` goes from 226.34s to 18.5s. `rustc_macros` goes from 28.69s to 7.7s. The size of `rustc_driver` is also reduced from 83.3 MB to 78.7 MB.
r? @Mark-Simulacrum
|
|
|
|
|
|
|
|
|
|
This logic is *super* old and can be tweaked and moved into `builder.rs`
|
|
Instead let's do this via `RUSTFLAGS` in `builder.rs`. Currently
requires a submodule update of `stdarch` to fix a problem with previous
compilers.
|