<feed xmlns='http://www.w3.org/2005/Atom'>
<title>rust/compiler/rustc_codegen_llvm/src, branch try</title>
<subtitle>https://github.com/rust-lang/rust
</subtitle>
<id>http://git.dreamy.place/mirrors/rust/atom?h=try</id>
<link rel='self' href='http://git.dreamy.place/mirrors/rust/atom?h=try'/>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/'/>
<updated>2025-07-21T16:54:24+00:00</updated>
<entry>
<title>Rollup merge of #142097 - ZuseZ4:offload-host1, r=oli-obk</title>
<updated>2025-07-21T16:54:24+00:00</updated>
<author>
<name>许杰友 Jieyou Xu (Joe)</name>
<email>39484203+jieyouxu@users.noreply.github.com</email>
</author>
<published>2025-07-21T16:54:24+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=5e3eb2512591df0cef52404f0ea4202f58935a54'/>
<id>urn:sha1:5e3eb2512591df0cef52404f0ea4202f58935a54</id>
<content type='text'>
gpu offload host code generation

r? ghost

This will generate most of the host side code to use llvm's offload feature.
The first PR will only handle automatic mem-transfers to and from the device.
So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch.
Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. `LIBOMPTARGET_INFO=-1 ./my_rust_binary` should print that a memcpy to and later from the device is happening.

A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU.
A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues.

I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work.
This work will also be compatible with std::autodiff, so one can differentiate GPU kernels.

Tracking:
- https://github.com/rust-lang/rust/issues/131513
</content>
</entry>
<entry>
<title>Rollup merge of #144116 - nikic:llvm-21-fixes, r=dianqk</title>
<updated>2025-07-20T06:56:08+00:00</updated>
<author>
<name>Matthias Krüger</name>
<email>476013+matthiaskrgr@users.noreply.github.com</email>
</author>
<published>2025-07-20T06:56:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=d24684ef4f78f25e559eec469a49834c0e3cccf5'/>
<id>urn:sha1:d24684ef4f78f25e559eec469a49834c0e3cccf5</id>
<content type='text'>
Fixes for LLVM 21

This fixes compatibility issues with LLVM 21 without performing the actual upgrade. Split out from https://github.com/rust-lang/rust/pull/143684.

This fixes three issues:
 * Updates the AMDGPU data layout for address space 8.
 * Makes emit-arity-indicator.rs a no_core test, so it doesn't fail on non-x86 hosts.
 * Explicitly sets the exception model for wasm, as this is no longer implied by `-wasm-enable-eh`.
</content>
</entry>
<entry>
<title>gpu host code generation</title>
<updated>2025-07-18T23:30:42+00:00</updated>
<author>
<name>Manuel Drehwald</name>
<email>git@manuel.drehwald.info</email>
</author>
<published>2025-07-02T23:36:30+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=4a1a5a42952d05533fd4309ad0f3fe290abbf57c'/>
<id>urn:sha1:4a1a5a42952d05533fd4309ad0f3fe290abbf57c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>add various wrappers for gpu code generation</title>
<updated>2025-07-18T23:24:12+00:00</updated>
<author>
<name>Manuel Drehwald</name>
<email>git@manuel.drehwald.info</email>
</author>
<published>2025-07-02T23:35:57+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=5958ebe829429e3595e8211e6cb1b0328d515ab7'/>
<id>urn:sha1:5958ebe829429e3595e8211e6cb1b0328d515ab7</id>
<content type='text'>
</content>
</entry>
<entry>
<title>add -Zoffload=Enable flag behind -Zunstable-options, to enable gpu (host) code generation</title>
<updated>2025-07-18T23:24:00+00:00</updated>
<author>
<name>Manuel Drehwald</name>
<email>git@manuel.drehwald.info</email>
</author>
<published>2025-06-18T22:29:43+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=634016478ec95c6ff933d32789e663ace78e8f82'/>
<id>urn:sha1:634016478ec95c6ff933d32789e663ace78e8f82</id>
<content type='text'>
</content>
</entry>
<entry>
<title>make more builder functions generic</title>
<updated>2025-07-18T23:23:54+00:00</updated>
<author>
<name>Manuel Drehwald</name>
<email>git@manuel.drehwald.info</email>
</author>
<published>2025-06-18T22:25:29+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=42d6b0d8bcdc5a0dfd77fe2daac6f8a8f67ac6cd'/>
<id>urn:sha1:42d6b0d8bcdc5a0dfd77fe2daac6f8a8f67ac6cd</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Pass wasm exception model to TargetOptions</title>
<updated>2025-07-18T07:35:50+00:00</updated>
<author>
<name>Nikita Popov</name>
<email>npopov@redhat.com</email>
</author>
<published>2025-07-11T08:11:03+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=12b19be741ea07934d7478bd8e450dca8f85afe5'/>
<id>urn:sha1:12b19be741ea07934d7478bd8e450dca8f85afe5</id>
<content type='text'>
This is no longer implied by -wasm-enable-eh.
</content>
</entry>
<entry>
<title>Update AMDGPU data layout</title>
<updated>2025-07-18T07:35:11+00:00</updated>
<author>
<name>Nikita Popov</name>
<email>npopov@redhat.com</email>
</author>
<published>2025-07-09T12:18:37+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=63e1074c97b60d248f86321f021871f93ba10c31'/>
<id>urn:sha1:63e1074c97b60d248f86321f021871f93ba10c31</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Rollup merge of #143293 - folkertdev:naked-function-kcfi, r=compiler-errors</title>
<updated>2025-07-18T02:27:51+00:00</updated>
<author>
<name>Matthias Krüger</name>
<email>476013+matthiaskrgr@users.noreply.github.com</email>
</author>
<published>2025-07-18T02:27:51+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=accf61dd42548bd5ec61d43f246b3eb499e980dd'/>
<id>urn:sha1:accf61dd42548bd5ec61d43f246b3eb499e980dd</id>
<content type='text'>
fix `-Zsanitizer=kcfi` on `#[naked]` functions

fixes https://github.com/rust-lang/rust/issues/143266

With `-Zsanitizer=kcfi`, indirect calls happen via generated intermediate shim that forwards the call. The generated shim preserves the attributes of the original, including `#[unsafe(naked)]`. The shim is not a naked function though, and violates its invariants (like having a body that consists of a single `naked_asm!` call).

My fix here is to match on the `InstanceKind`, and only use `codegen_naked_asm` when the instance is not a `ReifyShim`. That does beg the question whether there are other `InstanceKind`s that could come up. As far as I can tell the answer is no: calling via `dyn` seems to work find, and `#[track_caller]` is disallowed in combination with `#[naked]`.

r? codegen
````@rustbot```` label +A-naked
cc ````@maurer```` ````@rcvalle````
</content>
</entry>
<entry>
<title>Rollup merge of #143388 - bjorn3:lto_refactors, r=compiler-errors</title>
<updated>2025-07-17T01:58:28+00:00</updated>
<author>
<name>León Orell Valerian Liehr</name>
<email>me@fmease.dev</email>
</author>
<published>2025-07-17T01:58:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.dreamy.place/mirrors/rust/commit/?id=be5f8f299dce5c04e2a644546e780d8a07b0b14f'/>
<id>urn:sha1:be5f8f299dce5c04e2a644546e780d8a07b0b14f</id>
<content type='text'>
Various refactors to the LTO handling code

In particular reducing the sharing of code paths between fat and thin-LTO and making the fat LTO implementation more self-contained. This also moves some autodiff handling out of cg_ssa into cg_llvm given that Enzyme only works with LLVM anyway and an implementation for another backend may do things entirely differently. This will also make it a bit easier to split LTO handling out of the coordinator thread main loop into a separate loop, which should reduce the complexity of the coordinator thread.
</content>
</entry>
</feed>
