| Age | Commit message (Collapse) | Author | Lines |
|
gpu offload host code generation
r? ghost
This will generate most of the host side code to use llvm's offload feature.
The first PR will only handle automatic mem-transfers to and from the device.
So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch.
Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. `LIBOMPTARGET_INFO=-1 ./my_rust_binary` should print that a memcpy to and later from the device is happening.
A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU.
A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues.
I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work.
This work will also be compatible with std::autodiff, so one can differentiate GPU kernels.
Tracking:
- https://github.com/rust-lang/rust/issues/131513
|
|
Fixes for LLVM 21
This fixes compatibility issues with LLVM 21 without performing the actual upgrade. Split out from https://github.com/rust-lang/rust/pull/143684.
This fixes three issues:
* Updates the AMDGPU data layout for address space 8.
* Makes emit-arity-indicator.rs a no_core test, so it doesn't fail on non-x86 hosts.
* Explicitly sets the exception model for wasm, as this is no longer implied by `-wasm-enable-eh`.
|
|
|
|
|
|
code generation
|
|
|
|
This is no longer implied by -wasm-enable-eh.
|
|
|
|
fix `-Zsanitizer=kcfi` on `#[naked]` functions
fixes https://github.com/rust-lang/rust/issues/143266
With `-Zsanitizer=kcfi`, indirect calls happen via generated intermediate shim that forwards the call. The generated shim preserves the attributes of the original, including `#[unsafe(naked)]`. The shim is not a naked function though, and violates its invariants (like having a body that consists of a single `naked_asm!` call).
My fix here is to match on the `InstanceKind`, and only use `codegen_naked_asm` when the instance is not a `ReifyShim`. That does beg the question whether there are other `InstanceKind`s that could come up. As far as I can tell the answer is no: calling via `dyn` seems to work find, and `#[track_caller]` is disallowed in combination with `#[naked]`.
r? codegen
````@rustbot```` label +A-naked
cc ````@maurer```` ````@rcvalle````
|
|
Various refactors to the LTO handling code
In particular reducing the sharing of code paths between fat and thin-LTO and making the fat LTO implementation more self-contained. This also moves some autodiff handling out of cg_ssa into cg_llvm given that Enzyme only works with LLVM anyway and an implementation for another backend may do things entirely differently. This will also make it a bit easier to split LTO handling out of the coordinator thread main loop into a separate loop, which should reduce the complexity of the coordinator thread.
|
|
|
|
Make more of codegen_llvm safe
Best reviewed commit-by-commit.
|
|
|
|
|
|
|
|
|
|
|
|
infrastructure
Signed-off-by: Jonathan Brouwer <jonathantbrouwer@gmail.com>
|
|
fix: correct assertion to check for 'noinline' attribute presence before removal
|
|
Remove support for dynamic allocas
Followup to rust-lang/rust#141811
|
|
Make some "safe" llvm ops actually sound
Noticed while doing other refactorings
it may cause some extra unnecessary allocations, but the current use sites are rare ones anyway
|
|
fix: correct parameter names in LLVMRustBuildMinNum and LLVMRustBuildMaxNum FFI declarations
|
|
emit `.att_syntax` when global/naked asm use that option
fixes https://github.com/rust-lang/rust/issues/143542
LLVM would error when using `-Cllvm-args=-x86-asm-syntax=intel` in combination with global/naked assembly with `att_syntax`. It turns out that for LLVM you do in this case need to emit `.att_syntax`.
r? `@Amanieu`
|
|
|
|
|
|
|
|
|
|
FFI declarations
|
|
|
|
Rollup of 9 pull requests
Successful merges:
- rust-lang/rust#132469 (Do not suggest borrow that is already there in fully-qualified call)
- rust-lang/rust#143340 (awhile -> a while where appropriate)
- rust-lang/rust#143438 (Fix the link in `rustdoc.md`)
- rust-lang/rust#143539 (Regression tests for repr ICEs)
- rust-lang/rust#143566 (Fix `x86_64-unknown-netbsd` platform support page)
- rust-lang/rust#143572 (Remove unused allow attrs)
- rust-lang/rust#143583 (`loop_match`: fix 'no terminator on block')
- rust-lang/rust#143584 (make `Machine::load_mir` infallible)
- rust-lang/rust#143591 (Fix missing words in future tracking issue)
r? `@ghost`
`@rustbot` modify labels: rollup
|
|
Remove unused allow attrs
These `#[allow]`s seem to be unused (at least according to `x check`, didn't run `x test` locally). Let's clean them up! 🧹
|
|
Allow custom default address spaces and parse `p-` specifications in the datalayout string
Some targets, such as CHERI, use as default an address space different from the "normal" default address space `0` (in the case of CHERI, [200 is used](https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-877.pdf)). Currently, `rustc` does not allow to specify custom address spaces and does not take into consideration [`p-` specifications in the datalayout string](https://llvm.org/docs/LangRef.html#langref-datalayout).
This patch tries to mitigate these problems by allowing targets to define a custom default address space (while keeping the default value to address space `0`) and adding the code to parse the `p-` specifications in `rustc_abi`. The main changes are that `TargetDataLayout` now uses functions to refer to pointer-related informations, instead of having specific fields for the size and alignment of pointers in the default address space; furthermore, the two `pointer_size` and `pointer_align` fields in `TargetDataLayout` are replaced with an `FxHashMap` that holds info for all the possible address spaces, as parsed by the `p-` specifications.
The potential performance drawbacks of not having ad-hoc fields for the default address space will be tested in this PR's CI run.
r? workingjubilee
|
|
|
|
There is one comment at a call site and one comment in the function
definition that are mostly saying the same thing. Fold the call site
comment into the function definition comment to reduce duplication.
There are actually some inaccuracies in the comments but let's
deduplicate before we address the inaccuracies.
|
|
Removing this reference was forgotten in eb4725fc54056. Grepping for
no_landing_pads returns no hits after this.
|
|
default data address space
|
|
|
|
|
|
Make __rust_alloc_error_handler_should_panic a function
Fixes rust-lang/rust#143253
`__rust_alloc_error_handler_should_panic` is a static but was being exported as a function.
For most targets this doesn't matter, but Arm64EC Windows uses different decorations for exported variables vs functions, hence it fails to link when `-Z oom=abort` is enabled.
We've had issues in the past with statics like this (see rust-lang/rust#141061) but the tldr; is that Arm64EC needs symbols correctly exported as either a function or data, and data MUST and MUST ONLY be marked `dllimport` when the symbol is being imported from another binary, which is non-trivial to calculate for these compiler-generated statics.
So, instead, the easiest thing to do is to make `__rust_alloc_error_handler_should_panic` a function instead.
Since `__rust_alloc_error_handler_should_panic` isn't involved in any linking shenanigans, I've marked it as `AlwaysInline` with the hopes that the various backends will see that it is just returning a constant and perform the same optimizations as the previous implementation.
r? `@bjorn3`
|
|
|
|
|
|
|
|
|
|
|
|
Most uses of it either contain a fat or thin lto module. Only
WorkItem::LTO could contain both, but splitting that enum variant
doesn't complicate things much.
|
|
|
|
Disable f16 on Aarch64 without neon for llvm < 20.1.1
This check was added unconditionally in c51b229140 ("Disable f16 on Aarch64 without `neon`") and reverted in 4a8d35709e ("Revert "Disable `f16` on Aarch64 without `neon`"") since it did not fail in Rust's build. However, it is still possible to hit this crash if using LLVM 19 built with assertions, so disable the type conditionally based on version here.
Note that for these builds, a similar patch is needed in the build script for `compiler-builtins` since it does not yet use `cfg(target_has_reliable_f16)` (hopefully to be resolved in the near future).
Report: https://github.com/rust-lang/rust/pull/139276#issuecomment-3014781652
Original LLVM issue: https://github.com/llvm/llvm-project/issues/129394
|
|
give Pointer::into_parts a more scary name and offer a safer alternative
`into_parts` is a bit too innocent of a name for a somewhat subtle operation.
r? `@oli-obk`
|
|
This check was added unconditionally in c51b229140 ("Disable f16 on
Aarch64 without `neon`") and reverted in 4a8d35709e ("Revert "Disable
`f16` on Aarch64 without `neon`"") since it did not fail in Rust's
build. However, it is still possible to hit this crash if using LLVM 19
built with assertions, so disable the type conditionally based on
version here.
Note that for these builds, a similar patch is needed in the build
script for `compiler-builtins` since it does not yet use
`cfg(target_has_reliable_f16)` (hopefully to be resolved in the near
future).
Report: https://www.github.com/rust-lang/rust/pull/139276#issuecomment-3014781652
Original LLVM issue: https://www.github.com/llvm/llvm-project/issues/129394
|
|
Add SIMD funnel shift and round-to-even intrinsics
This PR adds 3 new SIMD intrinsics
- `simd_funnel_shl` - funnel shift left
- `simd_funnel_shr` - funnel shift right
- `simd_round_ties_even` (vector version of `round_ties_even_fN`)
TODO (future PR): implement `simd_fsh{l,r}` in miri, cg_gcc and cg_clif (it is surprisingly hard to implement without branches, the common tricks that rotate uses doesn't work because we have 2 elements now. e.g, the `-n&31` trick used by cg_gcc to implement rotate doesn't work with this because then `fshl(a, b, 0)` will be `a | b`)
[#t-compiler > More SIMD intrinsics](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/More.20SIMD.20intrinsics/with/522130286)
`@rustbot` label T-compiler T-libs A-intrinsics F-core_intrinsics
r? `@workingjubilee`
|