about summary refs log tree commit diff
path: root/src
diff options
context:
space:
mode:
authorStuart Cook <Zalathar@users.noreply.github.com>2025-08-07 20:49:36 +1000
committerGitHub <noreply@github.com>2025-08-07 20:49:36 +1000
commit1cd368a744e556d48e698670cc97928f5e320f70 (patch)
tree8319dc1f836c0e622ab176ef07dd30ce7ba42e1b /src
parentbcd50fd45fb92a6c96af7e91893488bd4499153c (diff)
parent35a485ddd86229101c4c17d9167f23cf75cae644 (diff)
downloadrust-1cd368a744e556d48e698670cc97928f5e320f70.tar.gz
rust-1cd368a744e556d48e698670cc97928f5e320f70.zip
Rollup merge of #138689 - jedbrown:jed/nvptx-target-feature, r=ZuseZ4
add nvptx_target_feature

Tracking issue: #141468 (nvptx), which is part of #44839 (catch-all arches)
The feature gate is `#![feature(nvptx_target_feature)]`

This exposes the target features `sm_20` through `sm_120a` [as defined](https://github.com/llvm/llvm-project/blob/llvmorg-20.1.1/llvm/lib/Target/NVPTX/NVPTX.td#L59-L85) by LLVM.

Cc: ``````@gonzalobg``````
``````@rustbot`````` label +O-NVPTX +A-target-feature
Diffstat (limited to 'src')
-rw-r--r--src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md40
1 files changed, 40 insertions, 0 deletions
diff --git a/src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md b/src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md
index 106ec562bfc..36598982481 100644
--- a/src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md
+++ b/src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md
@@ -10,6 +10,46 @@ platform.
 [@RDambrosio016](https://github.com/RDambrosio016)
 [@kjetilkjeka](https://github.com/kjetilkjeka)
 
+## Requirements
+
+This target is `no_std` and will typically be built with crate-type `cdylib` and `-C linker-flavor=llbc`, which generates PTX.
+The necessary components for this workflow are:
+
+- `rustup toolchain add nightly`
+- `rustup component add llvm-tools --toolchain nightly`
+- `rustup component add llvm-bitcode-linker --toolchain nightly`
+
+There are two options for using the core library:
+
+- `rustup component add rust-src --toolchain nightly` and build using `-Z build-std=core`.
+- `rustup target add nvptx64-nvidia-cuda --toolchain nightly`
+
+### Target and features
+
+It is generally necessary to specify the target, such as `-C target-cpu=sm_89`, because the default is very old. This implies two target features: `sm_89` and `ptx78` (and all preceding features within `sm_*` and `ptx*`). Rust will default to using the oldest PTX version that supports the target processor (see [this table](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history)), which maximizes driver compatibility.
+One can use `-C target-feature=+ptx80` to choose a later PTX version without changing the target (the default in this case, `ptx78`, requires CUDA driver version 11.8, while `ptx80` would require driver version 12.0).
+Later PTX versions may allow more efficient code generation.
+
+Although Rust follows LLVM in representing `ptx*` and `sm_*` as target features, they should be thought of as having crate granularity, set via (either via `-Ctarget-cpu` and optionally `-Ctarget-feature`).
+While the compiler accepts `#[target_feature(enable = "ptx80", enable = "sm_89")]`, it is not supported, may not behave as intended, and may become erroneous in the future.
+
+## Building Rust kernels
+
+A `no_std` crate containing one or more functions with `extern "ptx-kernel"` can be compiled to PTX using a command like the following.
+
+```console
+$ RUSTFLAGS='-Ctarget-cpu=sm_89' cargo +nightly rustc --target=nvptx64-nvidia-cuda -Zbuild-std=core --crate-type=cdylib -- -Clinker-flavor=llbc -Zunstable-options
+```
+
+Intrinsics in `core::arch::nvptx` may use `#[cfg(target_feature = "...")]`, thus it's necessary to use `-Zbuild-std=core` with appropriate `RUSTFLAGS`. The following components are needed for this workflow:
+
+```console
+$ rustup component add rust-src --toolchain nightly
+$ rustup component add llvm-tools --toolchain nightly
+$ rustup component add llvm-bitcode-linker --toolchain nightly
+```
+
+
 <!-- FIXME: fill this out
 
 ## Requirements