rust/compiler/rustc_codegen_llvm/src, branch try

rust/compiler/rustc_codegen_llvm/src, branch try https://github.com/rust-lang/rust http://git.dreamy.place/mirrors/rust/atom?h=try 2025-07-21T16:54:24+00:00 Rollup merge of #142097 - ZuseZ4:offload-host1, r=oli-obk 2025-07-21T16:54:24+00:00 许杰友 Jieyou Xu (Joe) 39484203+jieyouxu@users.noreply.github.com 2025-07-21T16:54:24+00:00 urn:sha1:5e3eb2512591df0cef52404f0ea4202f58935a54 gpu offload host code generation r? ghost This will generate most of the host side code to use llvm's offload feature. The first PR will only handle automatic mem-transfers to and from the device. So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch. Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. `LIBOMPTARGET_INFO=-1 ./my_rust_binary` should print that a memcpy to and later from the device is happening. A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU. A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues. I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work. This work will also be compatible with std::autodiff, so one can differentiate GPU kernels. Tracking: - https://github.com/rust-lang/rust/issues/131513 Rollup merge of #144116 - nikic:llvm-21-fixes, r=dianqk 2025-07-20T06:56:08+00:00 Matthias Krüger 476013+matthiaskrgr@users.noreply.github.com 2025-07-20T06:56:08+00:00 urn:sha1:d24684ef4f78f25e559eec469a49834c0e3cccf5 Fixes for LLVM 21 This fixes compatibility issues with LLVM 21 without performing the actual upgrade. Split out from https://github.com/rust-lang/rust/pull/143684. This fixes three issues: * Updates the AMDGPU data layout for address space 8. * Makes emit-arity-indicator.rs a no_core test, so it doesn't fail on non-x86 hosts. * Explicitly sets the exception model for wasm, as this is no longer implied by `-wasm-enable-eh`. gpu host code generation 2025-07-18T23:30:42+00:00 Manuel Drehwald git@manuel.drehwald.info 2025-07-02T23:36:30+00:00 urn:sha1:4a1a5a42952d05533fd4309ad0f3fe290abbf57c add various wrappers for gpu code generation 2025-07-18T23:24:12+00:00 Manuel Drehwald git@manuel.drehwald.info 2025-07-02T23:35:57+00:00 urn:sha1:5958ebe829429e3595e8211e6cb1b0328d515ab7 add -Zoffload=Enable flag behind -Zunstable-options, to enable gpu (host) code generation 2025-07-18T23:24:00+00:00 Manuel Drehwald git@manuel.drehwald.info 2025-06-18T22:29:43+00:00 urn:sha1:634016478ec95c6ff933d32789e663ace78e8f82 make more builder functions generic 2025-07-18T23:23:54+00:00 Manuel Drehwald git@manuel.drehwald.info 2025-06-18T22:25:29+00:00 urn:sha1:42d6b0d8bcdc5a0dfd77fe2daac6f8a8f67ac6cd Pass wasm exception model to TargetOptions 2025-07-18T07:35:50+00:00 Nikita Popov npopov@redhat.com 2025-07-11T08:11:03+00:00 urn:sha1:12b19be741ea07934d7478bd8e450dca8f85afe5 This is no longer implied by -wasm-enable-eh. Update AMDGPU data layout 2025-07-18T07:35:11+00:00 Nikita Popov npopov@redhat.com 2025-07-09T12:18:37+00:00 urn:sha1:63e1074c97b60d248f86321f021871f93ba10c31 Rollup merge of #143293 - folkertdev:naked-function-kcfi, r=compiler-errors 2025-07-18T02:27:51+00:00 Matthias Krüger 476013+matthiaskrgr@users.noreply.github.com 2025-07-18T02:27:51+00:00 urn:sha1:accf61dd42548bd5ec61d43f246b3eb499e980dd fix `-Zsanitizer=kcfi` on `#[naked]` functions fixes https://github.com/rust-lang/rust/issues/143266 With `-Zsanitizer=kcfi`, indirect calls happen via generated intermediate shim that forwards the call. The generated shim preserves the attributes of the original, including `#[unsafe(naked)]`. The shim is not a naked function though, and violates its invariants (like having a body that consists of a single `naked_asm!` call). My fix here is to match on the `InstanceKind`, and only use `codegen_naked_asm` when the instance is not a `ReifyShim`. That does beg the question whether there are other `InstanceKind`s that could come up. As far as I can tell the answer is no: calling via `dyn` seems to work find, and `#[track_caller]` is disallowed in combination with `#[naked]`. r? codegen ````@rustbot```` label +A-naked cc ````@maurer```` ````@rcvalle```` Rollup merge of #143388 - bjorn3:lto_refactors, r=compiler-errors 2025-07-17T01:58:28+00:00 León Orell Valerian Liehr me@fmease.dev 2025-07-17T01:58:28+00:00 urn:sha1:be5f8f299dce5c04e2a644546e780d8a07b0b14f Various refactors to the LTO handling code In particular reducing the sharing of code paths between fat and thin-LTO and making the fat LTO implementation more self-contained. This also moves some autodiff handling out of cg_ssa into cg_llvm given that Enzyme only works with LLVM anyway and an implementation for another backend may do things entirely differently. This will also make it a bit easier to split LTO handling out of the coordinator thread main loop into a separate loop, which should reduce the complexity of the coordinator thread.