| Age | Commit message (Collapse) | Author | Lines |
|
|
|
This commit removes a hack in our ThinLTO passes which removes available
externally functions manually. The [upstream bug][1] has long since been fixed,
so we should be able to rely on LLVM natively for this now!
[1]: https://bugs.llvm.org/show_bug.cgi?id=35736
|
|
This reverts commit 9df56ca0eea1a8f5af945df25ce23e276b1d48a7.
|
|
This reverts commit e045a6cd8c0235a26ef11e6cd9a13ebd817f1265.
|
|
Add the `amdgpu-kernel` ABI.
Technically, there are requirements imposed by the LLVM
`AMDGPUTargetMachine` on functions with this ABI (eg, the return type
must be void), but I'm unsure exactly where this should be enforced.
|
|
Technically, there are requirements imposed by the LLVM
`AMDGPUTargetMachine` on functions with this ABI (eg, the return type
must be void), but I'm unsure exactly where this should be enforced.
|
|
|
|
|
|
|
|
The LLVM PassManager has a PrepareForThinLTO flag, which is intended
when compilation occurs in conjunction with linking by ThinLTO. The
flag has two effects:
* The NameAnonGlobal pass is run after all other passes, which
ensures that all globals have a name.
* In optimized builds, a number of late passes (mainly related to
vectorization and unrolling) are disabled, on the rationale that
these a) will increase codesize of the intermediate artifacts
and b) will be run by ThinLTO again anyway.
This patch enables the use of PrepareForThinLTO if Thin or ThinLocal
linking is used.
The background for this change is the CI failure in #49479, which
we assume to be caused by the NameAnonGlobal pass not being run.
As this changes which passes LLVM runs, this might have performance
(or other) impact, so we want to land this separately.
|
|
implement minmax intrinsics
This adds the `simd_{fmin,fmax}` intrinsics, which do a vertical (lane-wise) `min`/`max` for floating point vectors that's equivalent to Rust's `min`/`max` for `f32`/`f64`.
It might make sense to make `{f32,f64}::{min,max}` use the `minnum` and `minmax` intrinsics as well.
---
~~HELP: I need some help with these. Either I should go to sleep or there must be something that I must be missing. AFAICT I am calling the `maxnum` builder correctly, yet rustc/LLVM seem to insert a call to `llvm.minnum` there instead...~~ EDIT: Rust's LLVM version is too old :/
|
|
Add basic PGO support.
This PR adds two mutually exclusive options for profile usage and generation using LLVM's instruction profile generation (the same as clang uses), `-C pgo-use` and `-C pgo-gen`.
See each commit for details.
|
|
|
|
|
|
Signed-off-by: Emilio Cobos Álvarez <emilio@crisal.io>
|
|
Signed-off-by: Emilio Cobos Álvarez <emilio@crisal.io>
|
|
Adds intrinsics::exact_div to take advantage of the unsafe, which reduces the implementation from
```asm
sub rcx, rdx
mov rax, rcx
sar rax, 63
shr rax, 62
lea rax, [rax + rcx]
sar rax, 2
ret
```
down to
```asm
sub rcx, rdx
sar rcx, 2
mov rax, rcx
ret
```
(for `*const i32`)
|
|
|
|
|
|
|
|
This commit updates our Fat LTO logic to tweak our custom wrapper around LLVM's
"link modules" functionality. Previously whenever the
`LLVMRustLinkInExternalBitcode` function was called it would call LLVM's
`Linker::linkModules` wrapper. Internally this would crate an instance of a
`Linker` which internally creates an instance of an `IRMover`. Unfortunately for
us the creation of `IRMover` is somewhat O(n) with the input module. This means
that every time we linked a module it was O(n) with respect to the entire module
we had built up!
Now the modules we build up during LTO are quite large, so this quickly started
creating an O(n^2) problem for us! Discovered in #48025 it turns out this has
always been a problem and we just haven't noticed it. It became particularly
worse recently though due to most libraries having 16x more object files than
they previously did (1 -> 16).
This commit fixes this performance issue by preserving the `Linker` instance
across all links into the main LLVM module. This means we only create one
`IRMover` and allows LTO to progress much speedier.
From the `cargo-cache` project in #48025 a **full build** locally when from
5m15s to 2m24s. Looking at the timing logs each object file was linked in in
single-digit millisecond rather than hundreds, clearly being a nice improvement!
Closes #48025
|
|
First round of LLVM 6.0.0 compatibility
This includes a number of commits for the first round of upgrading to LLVM 6. There are still [lingering bugs](https://github.com/rust-lang/rust/issues/47683) but I believe all of this will nonetheless be necessary!
|
|
Teach rustc about DW_AT_noreturn and a few more DIFlags
We achieve two small things with this PR:
1. We provide definitions for a few additional llvm debuginfo flags
1. We _use_ one of these new flags, `FlagNoReturn`, and add it to debuginfo for functions with the never return type (`!`).
|
|
It looks like LLVM also removed it in llvm-mirror/llvm@f45adc29d in favor of the
name "GNU64". This was added in the thought that we'd need such a variant when
adding mips64 support but we ended up not needing it! For now let's just
removing the various support on the Rust side of things.
|
|
LLVM has since removed the `CodeModel::Default` enum value in favor of an
`Optional` implementationg throughout LLVM. Let's mirror the same change in Rust
and update the various bindings we call accordingly.
Removed in llvm-mirror/llvm@9aafb854c
|
|
|
|
LLVM <= 4.0 used a non-standard interpretation of `DW_OP_plus`. In the
DWARF standard, this adds two items on the expressions stack. LLVM's
behavior was more like DWARF's `DW_OP_plus_uconst` -- adding a constant
that follows the op. The patch series starting with [D33892] switched
to the standard DWARF interpretation, so we need to follow.
[D33892]: https://reviews.llvm.org/D33892
|
|
|
|
Refs #46437 as it also removes LLVMRustWriteDebugLocToString()
|
|
Refs #46437
|
|
The same effect can be achieved using -Cllvm-args=-debug
Refs #46437 as it removes LLVMRustSetDebug()
|
|
Use name-discarding LLVM context
This is only applicable when neither of --emit=llvm-ir or --emit=llvm-bc are not
requested.
In case either of these outputs are wanted, but the benefits of such context are
desired as well, -Zfewer_names option provides the same functionality regardless
of the outputs requested.
Should be a viable fix for https://github.com/rust-lang/rust/issues/46449
|
|
This is only applicable when neither of --emit=llvm-ir or --emit=llvm-bc are not
requested.
In case either of these outputs are wanted, but the benefits of such context are
desired as well, -Zfewer_names option provides the same functionality regardless
of the outputs requested.
|
|
Refs #46437
|
|
The function was added as a wrapper to handle compatibility with older
LLVM versions that we no longer support, so it can be removed.
Refs #46437
|
|
This commit is the next attempt to enable multiple codegen units by default in
release mode, getting some of those sweet, sweet parallelism wins by running
codegen in parallel. Performance should not be lost due to ThinLTO being on by
default as well.
Closes #45320
|
|
This commit implements a workaround for #46346 which basically just
avoids triggering the situation that LLVM's bug
https://bugs.llvm.org/show_bug.cgi?id=35562 arises. More details can be
found in the code itself but this commit is also intended to ...
Closes #46346
|
|
...that were removed in 77c3bfa7429abf87b76ba84108df018d9e9d90e2.
|
|
This commit adds compiler support for two basic operations needed for binding
SIMD on x86 platforms:
* First, a `nontemporal_store` intrinsic was added for the `_mm_stream_ps`, seen
in rust-lang-nursery/stdsimd#114. This was relatively straightforward and is
quite similar to the volatile store intrinsic.
* Next, and much more intrusively, a new type to the backend was added. The
`x86_mmx` type is used in LLVM for a 64-bit vector register and is used in
various intrinsics like `_mm_abs_pi8` as seen in rust-lang-nursery/stdsimd#74.
This new type was added as a new layout option as well as having support added
to the trans backend. The type is enabled with the `#[repr(x86_mmx)]`
attribute which is intended to just be an implementation detail of SIMD in
Rust.
I'm not 100% certain about how the `x86_mmx` type was added, so any extra eyes
or thoughts on that would be greatly appreciated!
|
|
std: Add a new wasm32-unknown-unknown target
This commit adds a new target to the compiler: wasm32-unknown-unknown. This target is a reimagining of what it looks like to generate WebAssembly code from Rust. Instead of using Emscripten which can bring with it a weighty runtime this instead is a target which uses only the LLVM backend for WebAssembly and a "custom linker" for now which will hopefully one day be direct calls to lld.
Notable features of this target include:
* There is zero runtime footprint. The target assumes nothing exists other than the wasm32 instruction set.
* There is zero toolchain footprint beyond adding the target. No custom linker is needed, rustc contains everything.
* Very small wasm modules can be generated directly from Rust code using this target.
* Most of the standard library is stubbed out to return an error, but anything related to allocation works (aka `HashMap`, `Vec`, etc).
* Naturally, any `#[no_std]` crate should be 100% compatible with this new target.
This target is currently somewhat janky due to how linking works. The "linking" is currently unconditional whole program LTO (aka LLVM is being used as a linker). Naturally that means compiling programs is pretty slow! Eventually though this target should have a linker.
This target is also intended to be quite experimental. I'm hoping that this can act as a catalyst for further experimentation in Rust with WebAssembly. Breaking changes are very likely to land to this target, so it's not recommended to rely on it in any critical capacity yet. We'll let you know when it's "production ready".
### Building yourself
First you'll need to configure the build of LLVM and enable this target
```
$ ./configure --target=wasm32-unknown-unknown --set llvm.experimental-targets=WebAssembly
```
Next you'll want to remove any previously compiled LLVM as it needs to be rebuilt with WebAssembly support. You can do that with:
```
$ rm -rf build
```
And then you're good to go! A `./x.py build` should give you a rustc with the appropriate libstd target.
### Test support
Currently testing-wise this target is looking pretty good but isn't complete. I've got almost the entire `run-pass` test suite working with this target (lots of tests ignored, but many passing as well). The `core` test suite is [still getting LLVM bugs fixed](https://reviews.llvm.org/D39866) to get that working and will take some time. Relatively simple programs all seem to work though!
In general I've only tested this with a local fork that makes use of LLVM 5 rather than our current LLVM 4 on master. The LLVM 4 WebAssembly backend AFAIK isn't broken per se but is likely missing bug fixes available on LLVM 5. I'm hoping though that we can decouple the LLVM 5 upgrade and adding this wasm target!
### But the modules generated are huge!
It's worth nothing that you may not immediately see the "smallest possible wasm module" for the input you feed to rustc. For various reasons it's very difficult to get rid of the final "bloat" in vanilla rustc (again, a real linker should fix all this). For now what you'll have to do is:
cargo install --git https://github.com/alexcrichton/wasm-gc
wasm-gc foo.wasm bar.wasm
And then `bar.wasm` should be the smallest we can get it!
---
In any case for now I'd love feedback on this, particularly on the various integration points if you've got better ideas of how to approach them!
|
|
This commit adds a new target to the compiler: wasm32-unknown-unknown. This
target is a reimagining of what it looks like to generate WebAssembly code from
Rust. Instead of using Emscripten which can bring with it a weighty runtime this
instead is a target which uses only the LLVM backend for WebAssembly and a
"custom linker" for now which will hopefully one day be direct calls to lld.
Notable features of this target include:
* There is zero runtime footprint. The target assumes nothing exists other than
the wasm32 instruction set.
* There is zero toolchain footprint beyond adding the target. No custom linker
is needed, rustc contains everything.
* Very small wasm modules can be generated directly from Rust code using this
target.
* Most of the standard library is stubbed out to return an error, but anything
related to allocation works (aka `HashMap`, `Vec`, etc).
* Naturally, any `#[no_std]` crate should be 100% compatible with this new
target.
This target is currently somewhat janky due to how linking works. The "linking"
is currently unconditional whole program LTO (aka LLVM is being used as a
linker). Naturally that means compiling programs is pretty slow! Eventually
though this target should have a linker.
This target is also intended to be quite experimental. I'm hoping that this can
act as a catalyst for further experimentation in Rust with WebAssembly. Breaking
changes are very likely to land to this target, so it's not recommended to rely
on it in any critical capacity yet. We'll let you know when it's "production
ready".
---
Currently testing-wise this target is looking pretty good but isn't complete.
I've got almost the entire `run-pass` test suite working with this target (lots
of tests ignored, but many passing as well). The `core` test suite is still
getting LLVM bugs fixed to get that working and will take some time. Relatively
simple programs all seem to work though!
---
It's worth nothing that you may not immediately see the "smallest possible wasm
module" for the input you feed to rustc. For various reasons it's very difficult
to get rid of the final "bloat" in vanilla rustc (again, a real linker should
fix all this). For now what you'll have to do is:
cargo install --git https://github.com/alexcrichton/wasm-gc
wasm-gc foo.wasm bar.wasm
And then `bar.wasm` should be the smallest we can get it!
---
In any case for now I'd love feedback on this, particularly on the various
integration points if you've got better ideas of how to approach them!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows *incremental* LTO
which should enable even faster release builds without compromising on the
performance we have today.
This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:
* In one mode we'll *only* perform ThinLTO over the codegen units produced in a
single compilation. That is, we won't load upstream rlibs, but we'll instead
just perform ThinLTO amongst all codegen units produced by the compiler for
the local crate. This is intended to emulate a desired end point where we have
codegen units turned on by default for all crates and ThinLTO allows us to do
this without performance loss.
* In anther mode, like full LTO today, we'll optimize all upstream dependencies
in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
should finish much more quickly.
There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:
* Controlling parallelism means we can use the existing jobserver support to
avoid overloading machines.
* We will likely want a slightly different form of incremental caching which
integrates with our own incremental strategy, but this is yet to be
determined.
* This buys us some flexibility about when/where we run ThinLTO, as well as
having it tailored to fit our needs for the time being.
* Finally this allows us to reuse some artifacts such as our `TargetMachine`
creation, where all our options we used today aren't necessarily supported by
upstream LLVM yet.
My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!
|