| Age | Commit message (Collapse) | Author | Lines |
|
Various refactors to the LTO handling code
In particular reducing the sharing of code paths between fat and thin-LTO and making the fat LTO implementation more self-contained. This also moves some autodiff handling out of cg_ssa into cg_llvm given that Enzyme only works with LLVM anyway and an implementation for another backend may do things entirely differently. This will also make it a bit easier to split LTO handling out of the coordinator thread main loop into a separate loop, which should reduce the complexity of the coordinator thread.
|
|
|
|
Rollup of 9 pull requests
Successful merges:
- rust-lang/rust#143403 (Port several trait/coherence-related attributes the new attribute system)
- rust-lang/rust#143633 (fix: correct assertion to check for 'noinline' attribute presence before removal)
- rust-lang/rust#143647 (Clarify and expand documentation for std::sys_common dependency structure)
- rust-lang/rust#143716 (compiler: doc/comment some codegen-for-functions interfaces)
- rust-lang/rust#143747 (Add target maintainer information for aarch64-unknown-linux-musl)
- rust-lang/rust#143759 (Fix typos in function names in the `target_feature` test)
- rust-lang/rust#143767 (Bump `src/tools/x` to Edition 2024 and some cleanups)
- rust-lang/rust#143769 (Remove support for SwitchInt edge effects in backward dataflow)
- rust-lang/rust#143770 (build-helper: clippy fixes)
r? `@ghost`
`@rustbot` modify labels: rollup
|
|
Rollup of 8 pull requests
Successful merges:
- rust-lang/rust#142391 (rust: library: Add `setsid` method to `CommandExt` trait)
- rust-lang/rust#143302 (`tests/ui`: A New Order [27/N])
- rust-lang/rust#143303 (`tests/ui`: A New Order [28/28] FINAL PART)
- rust-lang/rust#143568 (std: sys: net: uefi: tcp4: Add timeout support)
- rust-lang/rust#143611 (Mention more APIs in `ParseIntError` docs)
- rust-lang/rust#143661 (chore: Improve how the other suggestions message gets rendered)
- rust-lang/rust#143708 (fix: Include frontmatter in -Zunpretty output )
- rust-lang/rust#143718 (Make UB transmutes really UB in LLVM)
r? `@ghost`
`@rustbot` modify labels: rollup
try-job: i686-gnu-nopt-1
try-job: test-various
|
|
workingjubilee:document-some-codegen-backend-stuff, r=bjorn3,fee1-dead
compiler: doc/comment some codegen-for-functions interfaces
An out-of-date comment gets updated and some underdocumented functions get documented.
|
|
Partially documents the situation due to LLVM CFI.
|
|
So places that need `unreachable` but in the middle of a basic block can call that instead of figuring out the best way to do it.
|
|
|
|
|
|
|
|
|
|
Most uses of it either contain a fat or thin lto module. Only
WorkItem::LTO could contain both, but splitting that enum variant
doesn't complicate things much.
|
|
|
|
|
|
CodeGen: rework Aggregate implemention for rvalue_creates_operand cases
A non-trivial refactor pulled out from rust-lang/rust#138759
r? workingjubilee
The previous implementation I'd written here based on `index_by_increasing_offset` is complicated to follow and difficult to extend to non-structs.
This changes the implementation, without actually changing any codegen (thus no test changes either), to be more like the existing `extract_field` (<https://github.com/rust-lang/rust/blob/2b0274c71dba0e24370ebf65593da450e2e91868/compiler/rustc_codegen_ssa/src/mir/operand.rs#L345-L425>) in that it allows setting a particular field directly.
Notably I've found this one much easier to get right, in particular because having the `OperandRef<Result<V, Scalar>>` gives a really useful thing to include in ICE messages if something did happen to go wrong.
|
|
Another refactor pulled out from 138759
The previous implementation I'd written here based on `index_by_increasing_offset` is complicated to follow and difficult to extend to non-structs.
This changes the implementation, without actually changing any codegen (thus no test changes either), to be more like the existing `extract_field` (<https://github.com/rust-lang/rust/blob/2b0274c71dba0e24370ebf65593da450e2e91868/compiler/rustc_codegen_ssa/src/mir/operand.rs#L345-L425>) in that it allows setting a particular field directly.
Notably I've found this one much easier to get right, in particular because having the `OperandRef<Result<V, Scalar>>` gives a really useful thing to include in ICE messages if something did happen to go wrong.
|
|
r=workingjubilee,saethlin
Move metadata object generation for dylibs to the linker code
This deduplicates some code between codegen backends and may in the future allow adding extra metadata that is only known at link time.
Prerequisite of https://github.com/rust-lang/rust/issues/96708.
|
|
Simplify implementation of Rust intrinsics by using type parameters in the cache
The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function.
(originally was part of rust-lang/rust#140763, split off later)
`@rustbot` label A-codegen A-LLVM
r? codegen
|
|
|
|
|
|
This deduplicates some code between codegen backends and may in the
future allow adding extra metadata that is only known at link time.
|
|
And move passing it to the linker to the driver code.
|
|
It is only used within cg_llvm.
|
|
It is only used within cg_llvm.
|
|
atomic_load intrinsic: use const generic parameter for ordering
We have a gazillion intrinsics for the atomics because we encode the ordering into the intrinsic name rather than making it a parameter. This is particularly bad for those operations that take two orderings. Let's fix that!
This PR only converts `load`, to see if there's any feedback that would fundamentally change the strategy we pursue for the const generic intrinsics.
The first two commits are preparation and could be a separate PR if you prefer.
`@BoxyUwU` -- I hope this is a use of const generics that is unlikely to explode? All we need is a const generic of enum type. We could funnel it through an integer if we had to but an enum is obviously nicer...
`@bjorn3` it seems like the cranelift backend entirely ignores the ordering?
|
|
|
|
|
|
There is no safety contract and I don't think any of them can actually
cause UB in more ways than passing malicious source code to rustc can.
While LtoModuleCodegen::optimize says that the returned ModuleCodegen
points into the LTO module, the LTO module has already been dropped by
the time this function returns, so if the returned ModuleCodegen indeed
points into the LTO module, we would have seen crashes on every LTO
compilation, which we don't. As such the comment is outdated.
|
|
|
|
|
|
It is only relevant when using cg_ssa for driving compilation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Share the naked asm impl between cg_ssa and cg_clif
This was introduced in https://github.com/rust-lang/rust/pull/128004.
|
|
This was missed as part of [1].
[1]: https://github.com/rust-lang/rust/pull/140323
|
|
Support for `f16` and `f128` is varied across targets, backends, and
backend versions. Eventually we would like to reach a point where all
backends support these approximately equally, but until then we have to
work around some of these nuances of support being observable.
Introduce the `cfg_target_has_reliable_f16_f128` internal feature, which
provides the following new configuration gates:
* `cfg(target_has_reliable_f16)`
* `cfg(target_has_reliable_f16_math)`
* `cfg(target_has_reliable_f128)`
* `cfg(target_has_reliable_f128_math)`
`reliable_f16` and `reliable_f128` indicate that basic arithmetic for
the type works correctly. The `_math` versions indicate that anything
relying on `libm` works correctly, since sometimes this hits a separate
class of codegen bugs.
These options match configuration set by the build script at [1]. The
logic for LLVM support is duplicated as-is from the same script. There
are a few possible updates that will come as a follow up.
The config introduced here is not planned to ever become stable, it is
only intended to replace the build scripts for `std` tests and
`compiler-builtins` that don't have any way to configure based on the
codegen backend.
MCP: https://github.com/rust-lang/compiler-team/issues/866
Closes: https://github.com/rust-lang/compiler-team/issues/866
[1]: https://github.com/rust-lang/rust/blob/555e1d0386f024a8359645c3217f4b3eae9be042/library/std/build.rs#L84-L186
|
|
|
|
Lower BinOp::Cmp to llvm.{s,u}cmp.* intrinsics
Lowers `mir::BinOp::Cmp` (`three_way_compare` intrinsic) to the corresponding LLVM `llvm.{s,u}cmp.i8.*` intrinsics.
These are the intrinsics mentioned in https://github.com/rust-lang/rust/pull/118310, which are now available in LLVM 19.
I couldn't find any follow-up PRs/discussions about this, please let me know if I missed something.
r? `@scottmcm`
|
|
Speed up target feature computation
The LLVM backend calls `LLVMRustHasFeature` twice for every feature. In short-running rustc invocations, this accounts for a surprising amount of work.
r? `@bjorn3`
|
|
Clean up various LLVM FFI things in codegen_llvm
cc ```@ZuseZ4``` I touched some autodiff parts
The major change of this PR is [bfd88ce](https://github.com/rust-lang/rust/pull/137549/commits/bfd88cead0dd79717f123ad7e9a26ecad88653cb) which makes `CodegenCx` generic just like `GenericBuilder`
The other commits mostly took advantage of the new feature of making extern functions safe, but also just used some wrappers that were already there and shrunk unsafe blocks.
best reviewed commit-by-commit
|
|
Lowers `mir::BinOp::Cmp` (`three_way_compare` intrinsic) to the corresponding
LLVM `llvm.{s,u}cmp.i8.*` intrinsics, added in LLVM 19.
|
|
Currently it is called twice, once with `allow_unstable` set to true and
once with it set to false. This results in some duplicated work. Most
notably, for the LLVM backend, `LLVMRustHasFeature` is called twice for
every feature, and it's moderately slow. For very short running
compilations on platforms with many features (e.g. a `check` build of
hello-world on x86) this is a significant fraction of runtime.
This commit changes `target_features_cfg` so it is only called once, and
it now returns a pair of feature sets. This halves the number of
`LLVMRustHasFeature` calls.
|
|
This reverts commit a7a6c64a657f68113301c2ffe0745b49a16442d1, reversing
changes made to ebbe63891f1fae21734cb97f2f863b08b1d44bf8.
|
|
The embedded bitcode should always be prepared for LTO/ThinLTO
Fixes #115344. Fixes #117220.
There are currently two methods for generating bitcode that used for LTO. One method involves using `-C linker-plugin-lto` to emit object files as bitcode, which is the typical setting used by cargo. The other method is through `-C embed-bitcode=yes`.
When using with `-C embed-bitcode=yes -C lto=no`, we run a complete non-LTO LLVM pipeline to obtain bitcode, then the bitcode is used for LTO. We run the Call Graph Profile Pass twice on the same module.
This PR is doing something similar to LLVM's `buildFatLTODefaultPipeline`, obtaining the bitcode for embedding after running `buildThinLTOPreLinkDefaultPipeline`.
r? nikic
|
|
|
|
|