| Age | Commit message (Collapse) | Author | Lines |
|
passed."
This reverts commit b12dcdef4fae5e3856e6911fd6cfbeedadcf3821.
|
|
Enable TrapUnreachable in LLVM.
This patch enables LLVM's TrapUnreachable flag, which tells it to translate `unreachable` instructions into hardware trap instructions, rather than allowing control flow to "fall through" into whatever code happens to follow it in memory.
This follows up on https://github.com/rust-lang/rust/issues/28728#issuecomment-332581533. For example, for @zackw's testcase [here](https://github.com/rust-lang/rust/issues/42009#issue-228745924), the output function contains a `ud2` instead of no code, so it won't "fall through" into whatever happens to be next in memory.
(I'm also working on the problem of LLVM optimizing away infinite loops, but the patch here is useful independently.)
I tested this patch on a few different codebases, and the code size increase ranged from 0.0% to 0.1%.
|
|
Update LLVM to fix miscompiles with -Copt-level=z on Windows
Fixes #45034
|
|
Disable LLVM assertions on Nightly, enable them in "alt" builds.
Per IRC discussion https://mozilla.logbot.info/rust-infra/20171106#c13812170-c13812204
Background: https://internals.rust-lang.org/t/disabling-llvm-assertions-in-nightly-builds/5388/14
|
|
Fixes #45034
|
|
|
|
The return value in these tests is just being used to generate extra
code so that it can be detected in the test script, which is just
counting lines in the assembly output.
|
|
With rustc now emitting "ud2" on unreachable code, a "return nothing"
sequence may take the same number of lines as an "unreachable" sequence
in assembly code. This test is testing that intrinsics::unreachable()
works by testing that it reduces the number of lines in the assembly
code. Fix it by adding a return value, which requires an extra
instruction in the reachable case, which provides the test what it's
looking for.
|
|
|
|
Shorten paths to auxiliary files created by tests
I'm hitting issues with long file paths to object files created by the test suite, similar to https://github.com/rust-lang/rust/issues/45103#issuecomment-335622075.
If we look at the object file path in https://github.com/rust-lang/rust/issues/45103 we can see that the patch contains of few components:
```
specialization-cross-crate-defaults.stage2-x86_64-pc-windows-gnu.run-pass.libaux\specialization_cross_crate_defaults.specialization_cross_crate_defaults0.rust-cgu.o
```
=>
1. specialization-cross-crate-defaults // test name, required
2. stage2 // stage disambiguator, required
3. x86_64-pc-windows-gnu // target disambiguator, required
4. run-pass // mode disambiguator, rarely required
5. libaux // suffix, can be shortened
6. specialization_cross_crate_defaults // required, there may be several libraries in the directory
7. specialization_cross_crate_defaults0 // codegen unit name, can be shortened?
8. rust-cgu // suffix, can be shortened?
9. o // object file extension
This patch addresses items `4`, `5` and `8`.
`libaux` is shortened to `aux`, `rust-cgu` is shortened to `rcgu`, mode disambiguator is omitted unless it's necessary (for pretty-printing and debuginfo tests, see https://github.com/rust-lang/rust/pull/24537/commits/38d26d811a44ba93637c84ce77a58af88c47f0ac)
I haven't touched names of codegen units though (`specialization_cross_crate_defaults0`).
Is it useful for them to have descriptive names including the crate name, as opposed to just `0` or `cgu0` or something?
|
|
Right now symbol exports, particularly in a cdylib, are handled by
assuming that `pub extern` combined with `#[no_mangle]` means "export
this". This isn't actually what we want for some symbols that the
standard library uses to implement itself, for example symbols related
to allocation. Additionally other special symbols like
`rust_eh_personallity` have no need to be exported from cdylib crate
types (only needed in dylib crate types).
This commit updates how rustc handles these special symbols by adding to
the hardcoded logic of symbols like `rust_eh_personallity` but also
adding a new attribute, `#[rustc_std_internal_symbol]`, which forces the
export level to be considered the same as all other Rust functions
instead of looking like a C function.
The eventual goal here is to prevent functions like `__rdl_alloc` from
showing up as part of a Rust cdylib as it's just an internal
implementation detail. This then further allows such symbols to get gc'd
by the linker when creating a cdylib.
|
|
|
|
|
|
save-analysis: support unions
r? @eddyb
|
|
session.
|
|
|
|
|
|
rustc: Handle #[inline(always)] at -O0
This commit updates the handling of `#[inline(always)]` functions at -O0 to
ensure that it's always inlined regardless of the number of codegen units used.
Closes #45201
|
|
|
|
This commit updates the handling of `#[inline(always)]` functions at -O0 to
ensure that it's always inlined regardless of the number of codegen units used.
Closes #45201
|
|
Fix data-layout field in x86_64-unknown-linux-gnu.json test file
The current data-layout causes the following error:
> rustc: /checkout/src/llvm/lib/CodeGen/MachineFunction.cpp:151: void llvm::MachineFunction::init(): Assertion `Target.isCompatibleDataLayout(getDataLayout()) && "Can't create a MachineFunction using a Module with a " "Target-incompatible DataLayout attached\n"' failed.
The new value was generated according to [this comment by @japaric](https://github.com/rust-lang/rust/issues/31367#issuecomment-213595571).
|
|
It seems like the file wasn't actually used, since there is a built-in target with the same name. See https://github.com/rust-lang/rust/pull/45108#issuecomment-335173165 for more details.
|
|
The value was generated according to [this comment by @japaric](https://github.com/rust-lang/rust/issues/31367#issuecomment-213595571).
|
|
This commit tweaks the behavior of inlining functions into multiple codegen
units when rustc is compiling in debug mode. Today rustc will unconditionally
treat `#[inline]` functions by translating them into all codegen units that
they're needed within, marking the linkage as `internal`. This commit changes
the behavior so that in debug mode (compiling at `-O0`) rustc will instead only
translate `#[inline]` functions into *one* codegen unit, forcing all other
codegen units to reference this one copy.
The goal here is to improve debug compile times by reducing the amount of
translation that happens on behalf of multiple codegen units. It was discovered
in #44941 that increasing the number of codegen units had the adverse side
effect of increasing the overal work done by the compiler, and the suspicion
here was that the compiler was inlining, translating, and codegen'ing more
functions with more codegen units (for example `String` would be basically
inlined into all codegen units if used). The strategy in this commit should
reduce the cost of `#[inline]` functions to being equivalent to one codegen
unit, which is only translating and codegen'ing inline functions once.
Collected [data] shows that this does indeed improve the situation from [before]
as the overall cpu-clock time increases at a much slower rate and when pinned to
one core rustc does not consume significantly more wall clock time than with one
codegen unit.
One caveat of this commit is that the symbol names for inlined functions that
are only translated once needed some slight tweaking. These inline functions
could be translated into multiple crates and we need to make sure the symbols
don't collideA so the crate name/disambiguator is mixed in to the symbol name
hash in these situations.
[data]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334880911
[before]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334583384
|
|
This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows *incremental* LTO
which should enable even faster release builds without compromising on the
performance we have today.
This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:
* In one mode we'll *only* perform ThinLTO over the codegen units produced in a
single compilation. That is, we won't load upstream rlibs, but we'll instead
just perform ThinLTO amongst all codegen units produced by the compiler for
the local crate. This is intended to emulate a desired end point where we have
codegen units turned on by default for all crates and ThinLTO allows us to do
this without performance loss.
* In anther mode, like full LTO today, we'll optimize all upstream dependencies
in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
should finish much more quickly.
There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:
* Controlling parallelism means we can use the existing jobserver support to
avoid overloading machines.
* We will likely want a slightly different form of incremental caching which
integrates with our own incremental strategy, but this is yet to be
determined.
* This buys us some flexibility about when/where we run ThinLTO, as well as
having it tailored to fit our needs for the time being.
* Finally this allows us to reuse some artifacts such as our `TargetMachine`
creation, where all our options we used today aren't necessarily supported by
upstream LLVM yet.
My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!
|
|
Fix native main() signature on 64bit
Hello,
in LLVM-IR produced by rustc on x86_64-linux-gnu, the native main() function had incorrect types for the function result and argc parameter: i64, while it should be i32 (really c_int). See also #20064, #29633.
So I've attempted a fix here. I tested it by checking the LLVM IR produced with --target x86_64-unknown-linux-gnu and i686-unknown-linux-gnu. Also I tried running the tests (`./x.py test`), however I'm getting two failures with and without the patch, which I'm guessing is unrelated.
|
|
|
|
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
* Parallel translation and codegen enabling codegen units to get worked on even
more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
system.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
|
|
This commit removes the `dep_graph` field from the `Session` type according to
issue #44390. Most of the fallout here was relatively straightforward and the
`prepare_session_directory` function was rejiggered a bit to reuse the results
in the later-called `load_dep_graph` function.
Closes #44390
|
|
CrateStore access in tcx.
|
|
Add `TargetOptions::min_global_align`, with s390x at 16-bit
The SystemZ `LALR` instruction provides PC-relative addressing for globals,
but only to *even* addresses, so other compilers make sure that such
globals are always 2-byte aligned. In Clang, this is modeled with
`TargetInfo::MinGlobalAlign`, and `TargetOptions::min_global_align` now
serves the same purpose for rustc.
In Clang, the only targets that set this are SystemZ, Lanai, and NVPTX, and
the latter two don't have targets in rust master.
Fixes #44411.
r? @eddyb
|
|
Fix sanitizer tests on buggy kernels
Travis recently pushed an update to the Linux environments, namely the kernels
that we're running on. This in turn caused some of the sanitizer tests we run to
fail. We also apparently weren't the first to hit these failures! Detailed in
google/sanitizers#837 these tests were failing due to a specific commit in the
kernel which has since been backed out, but for now work around the buggy kernel
that's deployed on Travis and eventually we should be able to remove these
flags.
|
|
This should now be entirely tracked through queries, so no need to have a
`DepGraph` in the `CStore` object any more!
|
|
|
|
Travis recently pushed an update to the Linux environments, namely the kernels
that we're running on. This in turn caused some of the sanitizer tests we run to
fail. We also apparently weren't the first to hit these failures! Detailed in
google/sanitizers#837 these tests were failing due to a specific commit in the
kernel which has since been backed out, but for now work around the buggy kernel
that's deployed on Travis and eventually we should be able to remove these
flags.
|
|
This commit adds logic to the compiler to attempt to handle super long linker
invocations by falling back to the `@`-file syntax if the invoked command is too
large. Each OS has a limit on how many arguments and how large the arguments can
be when spawning a new process, and linkers tend to be one of those programs
that can hit the limit!
The logic implemented here is to unconditionally attempt to spawn a linker and
then if it fails to spawn with an error from the OS that indicates the command
line is too big we attempt a fallback. The fallback is roughly the same for all
linkers where an argument pointing to a file, prepended with `@`, is passed.
This file then contains all the various arguments that we want to pass to the
linker.
Closes #41190
|
|
|
|
|
|
These fixes all have to do with the 64-bit PowerPC ELF ABI for big-endian
targets. The ELF v2 ABI for powerpc64le already worked well.
- Return after marking return aggregates indirect. Fixes #42757.
- Pass one-member float aggregates as direct argument values.
- Aggregate arguments less than 64-bit must be written in the least-
significant bits of the parameter space.
- Larger aggregates are instead padded at the tail.
(i.e. filling MSBs, padding the remaining LSBs.)
New tests were also added for the single-float aggregate, and a 3-byte
aggregate to check that it's filled into LSBs. Overall, at least these
formerly-failing tests now pass on powerpc64:
- run-make/extern-fn-struct-passing-abi
- run-make/extern-fn-with-packed-struct
- run-pass/extern-pass-TwoU16s.rs
- run-pass/extern-pass-TwoU8s.rs
- run-pass/struct-return.rs
|
|
|
|
|
|
Fixes #41701.
|
|
Use the `span` field in PathSegment and TyParam instead.
Fix #43796. Close #41478.
|
|
That makes ./x.py test --stage 1 work on x86_64-unknown-linux-gnu.
|
|
This is a follow-up to f189d7a6937 and 9d11b089ad1. While `-z ignore`
is what needs to be passed to the Solaris linker, because gcc is used as
the default linker, both that form and `-Wl,-z -Wl,ignore` (including
extra double quotes) need to be taken into account, which explains the
more complex regular expression.
|
|
This is a follow-up to ea23e50f, which fixed it for the build.
|
|
Run translation and LLVM in parallel when compiling with multiple CGUs
This is still a work in progress but the bulk of the implementation is done, so I thought it would be good to get it in front of more eyes.
This PR makes the compiler start running LLVM while translation is still in progress, effectively allowing for more parallelism towards the end of the compilation pipeline. It also allows the main thread to switch between either translation or running LLVM, which allows to reduce peak memory usage since not all LLVM module have to be kept in memory until linking. This is especially good for incr. comp. but it works just as well when running with `-Ccodegen-units=N`.
In order to help tuning and debugging the work scheduler, the PR adds the `-Ztrans-time-graph` flag which spits out html files that show how work packages where scheduled:

(red is translation, green is llvm)
One side effect here is that `-Ztime-passes` might show something not quite correct because trans and LLVM are not strictly separated anymore. I plan to have some special handling there that will try to produce useful output.
One open question is how to determine whether the trans-thread should switch to intermediate LLVM processing.
TODO:
- [x] Restore `-Z time-passes` output for LLVM.
- [x] Update documentation, esp. for work package scheduling.
- [x] Tune the scheduling algorithm.
cc @alexcrichton @rust-lang/compiler
|
|
available in memory.
|
|
|
|
The function should accept feature strings that old LLVM might not
support.
Simplify the code using the same approach used by
LLVMRustPrintTargetFeatures.
Dummify the function for non 4.0 LLVM and update the tests accordingly.
|