about summary refs log tree commit diff
path: root/compiler/rustc_codegen_gcc/src/intrinsic
AgeCommit message (Collapse)AuthorLines
2025-09-28Rollup merge of #144197 - KMJ-007:type-tree, r=ZuseZ4Matthias Krüger-0/+1
TypeTree support in autodiff # TypeTrees for Autodiff ## What are TypeTrees? Memory layout descriptors for Enzyme. Tell Enzyme exactly how types are structured in memory so it can compute derivatives efficiently. ## Structure ```rust TypeTree(Vec<Type>) Type { offset: isize, // byte offset (-1 = everywhere) size: usize, // size in bytes kind: Kind, // Float, Integer, Pointer, etc. child: TypeTree // nested structure } ``` ## Example: `fn compute(x: &f32, data: &[f32]) -> f32` **Input 0: `x: &f32`** ```rust TypeTree(vec![Type { offset: -1, size: 8, kind: Pointer, child: TypeTree(vec![Type { offset: -1, size: 4, kind: Float, child: TypeTree::new() }]) }]) ``` **Input 1: `data: &[f32]`** ```rust TypeTree(vec![Type { offset: -1, size: 8, kind: Pointer, child: TypeTree(vec![Type { offset: -1, size: 4, kind: Float, // -1 = all elements child: TypeTree::new() }]) }]) ``` **Output: `f32`** ```rust TypeTree(vec![Type { offset: -1, size: 4, kind: Float, child: TypeTree::new() }]) ``` ## Why Needed? - Enzyme can't deduce complex type layouts from LLVM IR - Prevents slow memory pattern analysis - Enables correct derivative computation for nested structures - Tells Enzyme which bytes are differentiable vs metadata ## What Enzyme Does With This Information: Without TypeTrees (current state): ```llvm ; Enzyme sees generic LLVM IR: define float ``@distance(ptr*`` %p1, ptr* %p2) { ; Has to guess what these pointers point to ; Slow analysis of all memory operations ; May miss optimization opportunities } ``` With TypeTrees (our implementation): ```llvm define "enzyme_type"="{[]:Float@float}" float ``@distance(`` ptr "enzyme_type"="{[]:Pointer}" %p1, ptr "enzyme_type"="{[]:Pointer}" %p2 ) { ; Enzyme knows exact type layout ; Can generate efficient derivative code directly } ``` # TypeTrees - Offset and -1 Explained ## Type Structure ```rust Type { offset: isize, // WHERE this type starts size: usize, // HOW BIG this type is kind: Kind, // WHAT KIND of data (Float, Int, Pointer) child: TypeTree // WHAT'S INSIDE (for pointers/containers) } ``` ## Offset Values ### Regular Offset (0, 4, 8, etc.) **Specific byte position within a structure** ```rust struct Point { x: f32, // offset 0, size 4 y: f32, // offset 4, size 4 id: i32, // offset 8, size 4 } ``` TypeTree for `&Point` (internal representation): ```rust TypeTree(vec![ Type { offset: 0, size: 4, kind: Float }, // x at byte 0 Type { offset: 4, size: 4, kind: Float }, // y at byte 4 Type { offset: 8, size: 4, kind: Integer } // id at byte 8 ]) ``` Generates LLVM: ```llvm "enzyme_type"="{[]:Float@float}" ``` ### Offset -1 (Special: "Everywhere") **Means "this pattern repeats for ALL elements"** #### Example 1: Array `[f32; 100]` ```rust TypeTree(vec![Type { offset: -1, // ALL positions size: 4, // each f32 is 4 bytes kind: Float, // every element is float }]) ``` Instead of listing 100 separate Types with offsets `0,4,8,12...396` #### Example 2: Slice `&[i32]` ```rust // Pointer to slice data TypeTree(vec![Type { offset: -1, size: 8, kind: Pointer, child: TypeTree(vec![Type { offset: -1, // ALL slice elements size: 4, // each i32 is 4 bytes kind: Integer }]) }]) ``` #### Example 3: Mixed Structure ```rust struct Container { header: i64, // offset 0 data: [f32; 1000], // offset 8, but elements use -1 } ``` ```rust TypeTree(vec![ Type { offset: 0, size: 8, kind: Integer }, // header Type { offset: 8, size: 4000, kind: Pointer, child: TypeTree(vec![Type { offset: -1, size: 4, kind: Float // ALL array elements }]) } ]) ```
2025-09-21Add panic=immediate-abortBen Kimock-2/+1
2025-09-19added typetree support for memcpyKaran Janthe-0/+1
2025-09-12Remove unreachable unsized arg handling in `store_fn_arg/store_arg` in codegenZachary S-7/+2
2025-08-26Merge commit 'feb42827f11a7ae241ceecc81e9ae556fb6ba214' into ↵Guillaume Gomez-1/+0
subtree-update_cg_gcc_2025-08-26
2025-08-04Merge commit '482e8540a1b757ed7bccc2041c5400f051fdb01e' into ↵Guillaume Gomez-1/+149
subtree-update_cg_gcc_2025-08-04
2025-07-07compiler: Parse `p-` specs in datalayout string, allow definition of custom ↵Edoardo Marangoni-2/+2
default data address space
2025-06-30Merge commit '4b5c44b14166083eef8d71f15f5ea1f53fc976a0' into ↵Guillaume Gomez-59/+106
subtree-update_cg_gcc_2025-06-30
2025-06-29Rollup merge of #142078 - sayantn:more-intrinsics, r=workingjubileeGuillaume Gomez-0/+2
Add SIMD funnel shift and round-to-even intrinsics This PR adds 3 new SIMD intrinsics - `simd_funnel_shl` - funnel shift left - `simd_funnel_shr` - funnel shift right - `simd_round_ties_even` (vector version of `round_ties_even_fN`) TODO (future PR): implement `simd_fsh{l,r}` in miri, cg_gcc and cg_clif (it is surprisingly hard to implement without branches, the common tricks that rotate uses doesn't work because we have 2 elements now. e.g, the `-n&31` trick used by cg_gcc to implement rotate doesn't work with this because then `fshl(a, b, 0)` will be `a | b`) [#t-compiler > More SIMD intrinsics](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/More.20SIMD.20intrinsics/with/522130286) `@rustbot` label T-compiler T-libs A-intrinsics F-core_intrinsics r? `@workingjubilee`
2025-06-28Merge commit 'b7091eca6d8eb0fe88b58cc9a7aec405d8de5b85' into ↵Guillaume Gomez-9/+12
subtree-update_cg_gcc_2025-06-28
2025-06-27rustc_codegen_gcc: Fix clippy::manual_is_multiple_ofPhilipp Krones-2/+3
2025-06-18Merge commit 'fda0bb9588912a3e0606e880ca9f6e913cf8a5a4' into ↵Guillaume Gomez-9996/+10330
subtree-update_cg_gcc_2025-06-18
2025-06-16Fix RISC-V C function ABI when passing/returning structs containing floatsbeetrees-1/+1
2025-06-15Implement `simd_round_ties_even` for miri, cg_clif and cg_gccsayantn-0/+2
2025-06-03Remove type_test from IntrinsicCallBuilderMethodsbjorn3-5/+0
It is only used within cg_llvm.
2025-05-30Directly use from_immediate for handling boolbjorn3-6/+3
2025-05-30Avoid computing function type for intrinsic instancesbjorn3-8/+3
2025-05-30Use layout field of OperandRef in generic_simd_intrinsicbjorn3-47/+41
2025-05-30Use layout field of OperandRef and PlaceRef in codegen_intrinsic_callbjorn3-10/+11
This avoids having to get the function signature.
2025-05-28Remove unused arg_memory_ty methodbjorn3-11/+0
2025-05-26Remove usage of FnAbi in codegen_intrinsic_callbjorn3-21/+11
2025-05-26Pass PlaceRef rather than Bx::Value to codegen_intrinsic_callbjorn3-12/+9
2025-05-14Merge commit '6ba33f5e1189a5ae58fb96ce3546e76b13d090f5' into ↵Guillaume Gomez-140/+495
subtree-update_cg_gcc_2025-05-14
2025-05-09Use intrinsics for `{f16,f32,f64,f128}::{minimum,maximum}` operationsUrgau-0/+36
2025-05-05Rename Instance::new to Instance::new_raw and add a note that it is rawMichael Goulet-1/+1
2025-04-20Rollup merge of #137953 - RalfJung:simd-intrinsic-masks, r=WaffleLapkinChris Denton-14/+11
simd intrinsics with mask: accept unsigned integer masks, and fix some of the errors It's not clear at all why the mask would have to be signed, it is anyway interpreted bitwise. The backend should just make sure that works no matter the surface-level type; our LLVM backend already does this correctly. The note of "the mask may be widened, which only has the correct behavior for signed integers" explains... nothing? Why can't the code do the widening correctly? If necessary, just cast to the signed type first... Also while we are at it, fix the errors. For simd_masked_load/store, the errors talked about the "third argument" but they meant the first argument (the mask is the first argument there). They also used the wrong type for `expected_element`. I have extremely low confidence in the GCC part of this PR. See [discussion on Zulip](https://rust-lang.zulipchat.com/#narrow/channel/257879-project-portable-simd/topic/On.20the.20sign.20of.20masks)
2025-04-20simd intrinsics with mask: accept unsigned integer masksRalf Jung-14/+11
2025-04-18Merge commit 'db1a31c243a649e1fe20f5466ba181da5be35c14' into ↵Guillaume Gomez-44/+254
subtree-update_cg_gcc_2025-04-18
2025-02-28rename BackendRepr::Vector → SimdVectorRalf Jung-1/+1
2025-02-25Rollup merge of #137595 - folkertdev:remove-simd-pow-powi, r=RalfJungLeón Orell Valerian Liehr-20/+8
remove `simd_fpow` and `simd_fpowi` Discussed in https://github.com/rust-lang/rust/issues/137555 These functions are not exposed from `std::intrinsics::simd`, and not used anywhere outside of the compiler. They also don't lower to particularly good code at least on the major ISAs (I checked x86_64, aarch64, s390x, powerpc), where the vector is just spilled to the stack and scalar functions are used for the actual logic. r? `@RalfJung`
2025-02-25remove `simd_fpow` and `simd_fpowi`Folkert de Vries-20/+8
2025-02-23Rollup merge of #136543 - RalfJung:round-ties-even, r=tgross35Trevor Gross-6/+3
intrinsics: unify rint, roundeven, nearbyint in a single round_ties_even intrinsic LLVM has three intrinsics here that all do the same thing (when used in the default FP environment). There's no reason Rust needs to copy that historically-grown mess -- let's just have one intrinsic and leave it up to the LLVM backend to decide how to lower that. Suggested by `@hanna-kruppe` in https://github.com/rust-lang/rust/issues/136459; Cc `@tgross35` try-job: test-various
2025-02-20Remove `BackendRepr::Uninhabited`, replaced with an `uninhabited: bool` ↵Zachary S-1/+1
field in `LayoutData`. Also update comments that refered to BackendRepr::Uninhabited.
2025-02-19Rework `OperandRef::extract_field` to stop calling `to_immediate_scalar` on ↵Scott McMurray-3/+8
things which are already immediates That means it stops trying to truncate things that are already `i1`s.
2025-02-08Rustfmtbjorn3-239/+249
2025-02-04cg_gcc: Directly use rustc_abi instead of reexportsJubilee Young-8/+8
2025-02-04intrinsics: unify rint, roundeven, nearbyint in a single round_ties_even ↵Ralf Jung-6/+3
intrinsic
2025-01-15Use a C-safe return type for `__rust_[ui]128_*` overflowing intrinsicsTrevor Gross-2/+4
Combined with [1], this will change the overflowing multiplication operations to return an `extern "C"`-safe type. Link: https://github.com/rust-lang/compiler-builtins/pull/735 [1]
2025-01-13Fix formattingAntoni Boucher-2/+2
2025-01-13Merge commit '59a81c2ca1edc88ad3ac4b27a8e03977ffb8e73a' into ↵Antoni Boucher-6/+20
subtree-update_cg_gcc_2025_01_12
2024-12-18chore: fix some typosacceptacross-1/+1
Signed-off-by: acceptacross <csqcqs@gmail.com>
2024-11-23Add simd_relaxed_fma intrinsicCaleb Zulawski-0/+1
2024-11-18use `TypingEnv` when no `infcx` is availablelcnr-7/+9
the behavior of the type system not only depends on the current assumptions, but also the currentnphase of the compiler. This is mostly necessary as we need to decide whether and how to reveal opaque types. We track this via the `TypingMode`.
2024-11-17Likely unlikely fixJiri Bobek-2/+0
2024-10-29cg_gcc: `rustc_abi::Abi` => `BackendRepr`Jubilee Young-3/+3
2024-10-19Fix testsMichael Goulet-4/+8
2024-10-11intrinsics.fmuladdf{16,32,64,128}: expose llvm.fmuladd.* semanticsJed Brown-0/+3
Add intrinsics `fmuladd{f16,f32,f64,f128}`. This computes `(a * b) + c`, to be fused if the code generator determines that (i) the target instruction set has support for a fused operation, and (ii) that the fused operation is more efficient than the equivalent, separate pair of `mul` and `add` instructions. https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic MIRI support is included for f32 and f64. The codegen_cranelift uses the `fma` function from libc, which is a correct implementation, but without the desired performance semantic. I think this requires an update to cranelift to expose a suitable instruction in its IR. I have not tested with codegen_gcc, but it should behave the same way (using `fma` from libc).
2024-10-04Use wide pointers consistenly across the compilerUrgau-2/+2
2024-09-27FmtGuillaume Gomez-2/+1
2024-09-27Merge commit '3187d32079b817522cc17413ec9185b130daf693' into subtree-updateGuillaume Gomez-172/+651