diff options
| author | Stuart Cook <Zalathar@users.noreply.github.com> | 2025-08-07 20:49:36 +1000 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-08-07 20:49:36 +1000 |
| commit | 1cd368a744e556d48e698670cc97928f5e320f70 (patch) | |
| tree | 8319dc1f836c0e622ab176ef07dd30ce7ba42e1b /src | |
| parent | bcd50fd45fb92a6c96af7e91893488bd4499153c (diff) | |
| parent | 35a485ddd86229101c4c17d9167f23cf75cae644 (diff) | |
| download | rust-1cd368a744e556d48e698670cc97928f5e320f70.tar.gz rust-1cd368a744e556d48e698670cc97928f5e320f70.zip | |
Rollup merge of #138689 - jedbrown:jed/nvptx-target-feature, r=ZuseZ4
add nvptx_target_feature Tracking issue: #141468 (nvptx), which is part of #44839 (catch-all arches) The feature gate is `#![feature(nvptx_target_feature)]` This exposes the target features `sm_20` through `sm_120a` [as defined](https://github.com/llvm/llvm-project/blob/llvmorg-20.1.1/llvm/lib/Target/NVPTX/NVPTX.td#L59-L85) by LLVM. Cc: ``````@gonzalobg`````` ``````@rustbot`````` label +O-NVPTX +A-target-feature
Diffstat (limited to 'src')
| -rw-r--r-- | src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md | 40 |
1 files changed, 40 insertions, 0 deletions
diff --git a/src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md b/src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md index 106ec562bfc..36598982481 100644 --- a/src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md +++ b/src/doc/rustc/src/platform-support/nvptx64-nvidia-cuda.md @@ -10,6 +10,46 @@ platform. [@RDambrosio016](https://github.com/RDambrosio016) [@kjetilkjeka](https://github.com/kjetilkjeka) +## Requirements + +This target is `no_std` and will typically be built with crate-type `cdylib` and `-C linker-flavor=llbc`, which generates PTX. +The necessary components for this workflow are: + +- `rustup toolchain add nightly` +- `rustup component add llvm-tools --toolchain nightly` +- `rustup component add llvm-bitcode-linker --toolchain nightly` + +There are two options for using the core library: + +- `rustup component add rust-src --toolchain nightly` and build using `-Z build-std=core`. +- `rustup target add nvptx64-nvidia-cuda --toolchain nightly` + +### Target and features + +It is generally necessary to specify the target, such as `-C target-cpu=sm_89`, because the default is very old. This implies two target features: `sm_89` and `ptx78` (and all preceding features within `sm_*` and `ptx*`). Rust will default to using the oldest PTX version that supports the target processor (see [this table](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history)), which maximizes driver compatibility. +One can use `-C target-feature=+ptx80` to choose a later PTX version without changing the target (the default in this case, `ptx78`, requires CUDA driver version 11.8, while `ptx80` would require driver version 12.0). +Later PTX versions may allow more efficient code generation. + +Although Rust follows LLVM in representing `ptx*` and `sm_*` as target features, they should be thought of as having crate granularity, set via (either via `-Ctarget-cpu` and optionally `-Ctarget-feature`). +While the compiler accepts `#[target_feature(enable = "ptx80", enable = "sm_89")]`, it is not supported, may not behave as intended, and may become erroneous in the future. + +## Building Rust kernels + +A `no_std` crate containing one or more functions with `extern "ptx-kernel"` can be compiled to PTX using a command like the following. + +```console +$ RUSTFLAGS='-Ctarget-cpu=sm_89' cargo +nightly rustc --target=nvptx64-nvidia-cuda -Zbuild-std=core --crate-type=cdylib -- -Clinker-flavor=llbc -Zunstable-options +``` + +Intrinsics in `core::arch::nvptx` may use `#[cfg(target_feature = "...")]`, thus it's necessary to use `-Zbuild-std=core` with appropriate `RUSTFLAGS`. The following components are needed for this workflow: + +```console +$ rustup component add rust-src --toolchain nightly +$ rustup component add llvm-tools --toolchain nightly +$ rustup component add llvm-bitcode-linker --toolchain nightly +``` + + <!-- FIXME: fill this out ## Requirements |
