| Age | Commit message (Collapse) | Author | Lines |
|
|
|
Add intrinsics `fmuladd{f16,f32,f64,f128}`. This computes `(a * b) +
c`, to be fused if the code generator determines that (i) the target
instruction set has support for a fused operation, and (ii) that the
fused operation is more efficient than the equivalent, separate pair
of `mul` and `add` instructions.
https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic
MIRI support is included for f32 and f64.
The codegen_cranelift uses the `fma` function from libc, which is a
correct implementation, but without the desired performance semantic. I
think this requires an update to cranelift to expose a suitable
instruction in its IR.
I have not tested with codegen_gcc, but it should behave the same
way (using `fma` from libc).
|
|
codegen_ssa: consolidate tied target checks
Fixes #105110.
Fixes #105111.
`rustc_codegen_llvm` and `rustc_codegen_gcc` duplicated logic for checking if tied target features were partially enabled. This PR consolidates these checks into `rustc_codegen_ssa` in the `codegen_fn_attrs` query, which also is run pre-monomorphisation for each function, which ensures that this check is run for unused functions, as would be expected.
Also adds a test confirming that enabling one tied feature doesn't imply another - the appropriate error for this was already being emitted. I did a bisect and narrowed it down to two patches it was likely to be - something in #128796, probably #128221 or #128679.
|
|
|
|
|
|
Rollup of 4 pull requests
Successful merges:
- #130005 (Replace -Z default-hidden-visibility with -Z default-visibility)
- #130229 (ptr::add/sub: do not claim equivalence with `offset(c as isize)`)
- #130773 (Update Unicode escapes in `/library/core/src/char/methods.rs`)
- #130933 (rustdoc: lists items that contain multiple paragraphs are more clear)
r? `@ghost`
`@rustbot` modify labels: rollup
|
|
Support clobber_abi and vector/access registers (clobber-only) in s390x inline assembly
This supports `clobber_abi` which is one of the requirements of stabilization mentioned in #93335.
This also supports vector registers (as `vreg`) and access registers (as `areg`) as clobber-only, which need to support clobbering of them to implement clobber_abi.
Refs:
- "1.2.1.1. Register Preservation Rules" section in ELF Application Binary Interface s390x Supplement, Version 1.6.1 (lzsabi_s390x.pdf in https://github.com/IBM/s390x-abi/releases/tag/v1.6.1)
- Register definition in LLVM:
- Vector registers https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td#L249
- Access registers https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td#L332
I have three questions:
- ~~ELF Application Binary Interface s390x Supplement says that `cc` (condition code, bits 18-19 of PSW) is "Volatile".
However, we do not have a register class for `cc` and instead mark `cc` as clobbered unless `preserves_flags` is specified (https://github.com/rust-lang/rust/pull/111331).
Therefore, in the current implementation, if both `preserves_flags` and `clobber_abi` are specified, `cc` is not marked as clobbered. Is this okay? Or even if `preserves_flags` is used, should `cc` be marked as clobbered if `clobber_abi` is used?~~ UPDATE: resolved https://github.com/rust-lang/rust/pull/130630#issuecomment-2367923121
- ~~ELF Application Binary Interface s390x Supplement says that `pm` (program mask, bits 20-23 of PSW) is "Cleared".
There does not appear to be any registers associated with this in either [LLVM](https://github.com/llvm/llvm-project/blob/llvmorg-19.1.0/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td) or [GCC](https://github.com/gcc-mirror/gcc/blob/33ccc1314dcdb0b988a9276ca6b6ce9b07bea21e/gcc/config/s390/s390.h#L407-L431), so at this point I don't see any way other than to just ignore it. Is this okay as-is?~~ UPDATE: resolved https://github.com/rust-lang/rust/pull/130630#issuecomment-2367923121
- Is "areg" a good name for register class name for access registers? It may be a bit confusing between that and `reg_addr`, which uses the “a” constraint (https://github.com/rust-lang/rust/pull/119431)...
Note:
- GCC seems to [recognize only `a0` and `a1`](https://github.com/gcc-mirror/gcc/blob/33ccc1314dcdb0b988a9276ca6b6ce9b07bea21e/gcc/config/s390/s390.h#L428-L429), and using `a[2-15]` [causes errors](https://godbolt.org/z/a46vx8jjn).
Given that cg_gcc has a similar problem with other architecture (https://github.com/rust-lang/rustc_codegen_gcc/issues/485), I don't feel this is a blocker for this PR, but it is worth mentioning here.
- `vreg` should be able to accept `#[repr(simd)]` types as input if the `vector` target feature added in https://github.com/rust-lang/rust/pull/127506 is enabled, but core_arch has no s390x vector type and both `#[repr(simd)]` and `core::simd` are unstable, so I have not implemented it in this PR. EDIT: And supporting it is probably more complex than doing the equivalent on other architectures... https://github.com/rust-lang/rust/pull/88245#issuecomment-905559591
cc `@uweigand`
r? `@Amanieu`
`@rustbot` label +O-SystemZ
|
|
MCP: https://github.com/rust-lang/compiler-team/issues/782
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
|
|
|
|
|
|
|
|
`rustc_codegen_llvm` and `rustc_codegen_gcc` duplicated logic for
checking if tied target features were partially enabled. This commit
consolidates these checks into `rustc_codegen_ssa` in the
`codegen_fn_attrs` query, which also is run pre-monomorphisation for
each function, which ensures that this check is run for unused functions,
as would be expected.
|
|
llvm: replace some deprecated functions
`LLVMMDStringInContext` and `LLVMMDNodeInContext` are deprecated, replace them with `LLVMMDStringInContext2` and `LLVMMDNodeInContext2`.
Also replace `Value` with `Metadata` in some function signatures for better consistency.
|
|
|
|
|
|
inline assembly
|
|
It's crazy to have the integer methods in something close to random
order.
The reordering makes the gaps clear: `const_i64`, `const_i128`,
`const_isize`, and `const_u16`. I guess they just aren't needed.
|
|
|
|
This avoids some repetitive boilerplate code.
|
|
Supertraits of `BuilderMethods` are all called `XyzBuilderMethods`.
Supertraits of `CodegenMethods` are all called `XyzMethods`. This commit
changes the latter to `XyzCodegenMethods`, for consistency.
|
|
They both are part of `BuilderMethods`, and so should have `Builder` in
their name like all the other traits in `BuilderMethods`.
|
|
It has `Backend` and `Deref` boudns, plus an associated type
`CodegenCx`, and it has a single use. This commit "inlines" it into
`BuilderMethods`, which makes the complicated backend trait situation a
little simpler.
|
|
|
|
r=michaelwoerister
Don't leave debug locations for constants sitting on the builder indefinitely
Because constants are currently emitted *before* the prologue, leaving the debug location on the IRBuilder spills onto other instructions in the prologue and messes up both line numbers as well as the point LLVM chooses to be the prologue end.
Example LLVM IR (irrelevant IR elided):
Before:
```
define internal { i64, i64 } `@_ZN3tmp3Foo18var_return_opt_try17he02116165b0fc08cE(ptr` align 8 %self) !dbg !347 { start:
%self.dbg.spill = alloca [8 x i8], align 8
%_0 = alloca [16 x i8], align 8
%residual.dbg.spill = alloca [0 x i8], align 1
#dbg_declare(ptr %residual.dbg.spill, !353, !DIExpression(), !357)
store ptr %self, ptr %self.dbg.spill, align 8, !dbg !357
#dbg_declare(ptr %self.dbg.spill, !350, !DIExpression(), !358)
```
After:
```
define internal { i64, i64 } `@_ZN3tmp3Foo18var_return_opt_try17h00b17d08874ddd90E(ptr` align 8 %self) !dbg !347 { start:
%self.dbg.spill = alloca [8 x i8], align 8
%_0 = alloca [16 x i8], align 8
%residual.dbg.spill = alloca [0 x i8], align 1
#dbg_declare(ptr %residual.dbg.spill, !353, !DIExpression(), !357)
store ptr %self, ptr %self.dbg.spill, align 8
#dbg_declare(ptr %self.dbg.spill, !350, !DIExpression(), !358)
```
Note in particular how !357 from %residual.dbg.spill's dbg_declare no longer falls through onto the store to %self.dbg.spill. This fixes argument values at entry when the constant is a ZST (e.g. `<Option as Try>::Residual`). This fixes #130003 (but note that it does *not* fix issues with argument values and non-ZST constants, which emit their own stores that have debug info on them, like #128945).
r? `@michaelwoerister`
|
|
Remove `serialized_bitcode` from `LtoModuleCodegen`.
It's unused.
r? ``@bjorn3``
|
|
It's unused.
|
|
Because constants are currently emitted *before* the prologue, leaving the
debug location on the IRBuilder spills onto other instructions in the prologue
and messes up both line numbers as well as the point LLVM chooses to be the
prologue end.
Example LLVM IR (irrelevant IR elided):
Before:
define internal { i64, i64 } @_ZN3tmp3Foo18var_return_opt_try17he02116165b0fc08cE(ptr align 8 %self) !dbg !347 {
start:
%self.dbg.spill = alloca [8 x i8], align 8
%_0 = alloca [16 x i8], align 8
%residual.dbg.spill = alloca [0 x i8], align 1
#dbg_declare(ptr %residual.dbg.spill, !353, !DIExpression(), !357)
store ptr %self, ptr %self.dbg.spill, align 8, !dbg !357
#dbg_declare(ptr %self.dbg.spill, !350, !DIExpression(), !358)
After:
define internal { i64, i64 } @_ZN3tmp3Foo18var_return_opt_try17h00b17d08874ddd90E(ptr align 8 %self) !dbg !347 {
start:
%self.dbg.spill = alloca [8 x i8], align 8
%_0 = alloca [16 x i8], align 8
%residual.dbg.spill = alloca [0 x i8], align 1
#dbg_declare(ptr %residual.dbg.spill, !353, !DIExpression(), !357)
store ptr %self, ptr %self.dbg.spill, align 8
#dbg_declare(ptr %self.dbg.spill, !350, !DIExpression(), !358)
Note in particular how !357 from %residual.dbg.spill's dbg_declare no longer
falls through onto the store to %self.dbg.spill. This fixes argument values
at entry when the constant is a ZST (e.g. <Option as Try>::Residual). This
fixes #130003 (but note that it does *not* fix issues with argument values and
non-ZST constants, which emit their own stores that have debug info on them,
like #128945).
|
|
This shrinks `compiler/rustc_codegen_gcc/Cargo.lock` quite a bit. The
only remaining dependencies in `compiler/rustc_codegen_gcc/Cargo.lock`
are `gccjit`, `lang_tester`, and `boml`, all of which aren't used in any
other compiler crates.
The commit also reorders and adds comments to the `extern crate` items
so they match those in miri.
|
|
simd_shuffle intrinsic: allow argument to be passed as vector
See https://github.com/rust-lang/rust/issues/128738 for context.
I'd like to get rid of [this hack](https://github.com/rust-lang/rust/blob/6c0b89dfac65be9a5be12f938f23098ebc36c635/compiler/rustc_codegen_ssa/src/mir/block.rs#L922-L935). https://github.com/rust-lang/rust/pull/128537 almost lets us do that since constant SIMD vectors will then be passed as immediate arguments. However, simd_shuffle for some reason actually takes an *array* as argument, not a vector, so the hack is still required to ensure that the array becomes an immediate (which then later stages of codegen convert into a vector, as that's what LLVM needs).
This PR prepares simd_shuffle to also support a vector as the `idx` argument. Once this lands, stdarch can hopefully be updated to pass `idx` as a vector, and then support for arrays can be removed, which finally lets us get rid of that hack.
|
|
|
|
Stabilize `asm_const`
tracking issue: https://github.com/rust-lang/rust/issues/93332
reference PR: https://github.com/rust-lang/reference/pull/1556
this will probably require some CI wrangling (and a rebase), so let's get that over with even though the final required PR is not merged yet.
r? `@ghost`
|
|
Shrink `TyKind::FnPtr`.
By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and `FnHeader`, which can be packed more efficiently. This reduces the size of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms. This reduces peak memory usage by a few percent on some benchmarks. It also reduces cache misses and page faults similarly, though this doesn't translate to clear cycles or wall-time improvements on CI.
r? `@compiler-errors`
|
|
|
|
bootstrap: don't use rustflags for `--rustc-args`
r? `@onur-ozkan`
This is going to require a bit of context.
https://github.com/rust-lang/rust/pull/47558 has added `--rustc-args` to `./x test` to allow passing flags when building `compiletest` tests. It was made specifically because using `RUSTFLAGS` would rebuild the compiler/stdlib, which would in turn require the flag you want to build tests with to successfully bootstrap.
#113178 made the request that it also works for other tests and doctests. This is not trivial to support across the board for `library`/`compiler` unit-tests/doctests and across stages. This issue was closed in #113948 by using `RUSTFLAGS`, seemingly incorrectly since https://github.com/rust-lang/rust/pull/123489 fixed that part to make it work.
Unfortunately #123489/#113948 have regressed the goals of `--rustc-args`:
- now we can't use rustc args that don't bootstrap, to run the UI tests: we can't test incomplete features. The new trait solver doesn't bootstrap, in-progress borrowck/polonius changes don't bootstrap, some other features are similarly incomplete, etc.
- using the flag now rebuilds everything from scratch: stage0 stdlib, stage1 compiler, stage1 stdlib. You don't need to re-do all this to compile UI tests, you only need the latter to run stdlib tests with a new flag, etc. This happens for contributors, but also on CI today. (Not to mention that in doing that it will rebuild things with flags that are not meant to be used, e.g. stdlib cfgs that don't exist in the compiler; or you could also imagine that this silently enables flags that were not meant to be enabled in this way).
Since then, https://github.com/rust-lang/rust/pull/125011/commits/bd71c71ea04b4a4f954e579c2a6d44113274846a has started using it to test a stdlib feature, relying on the fact that it now rebuilds everything. So #125011 also regressed CI times more than necessary because it rebuilds everything instead of just stage 1 stdlib.
It's not easy for me to know how to properly fix #113178 in bootstrap, but #113948/#123489 are not it since they regress the initial intent. I'd think bootstrap would have to know from the list of test targets that are passed that the `library` or `compiler` paths that are passed could require rebuilding these crates with different rustflags, probably also depending on stages. Therefore I would not be able to fix it, and will just try in this PR to unregress the situation to unblock the initial use-case.
It seems miri now also uses `./x miri --rustc-args` in this incorrect meaning to rebuild the `library` paths they support to run with the new args. I've not made any bootstrap changes related to `./x miri` in this PR, so `--rustc-args` wouldn't work there anymore. I'd assume this would need to use rustflags again but I don't know how to make that work properly in bootstrap, hence opening as draft, so you can tell me how to do that. I assume we don't want to break their use-case again now that it exists, even though there are ways to use `./x test` to do exactly that.
`RUSTFLAGS_NOT_BOOTSTRAP=flag ./x test library/std` is a way to run unit tests with a new flag without rebuilding everything, while with #123489 there is no way anymore to run tests with a flag that doesn't bootstrap.
---
edit: after review, this PR:
- renames `./x test --rustc-args` to `./x test --compiletest-rustc-args` as it only applies there, and cannot use rustflags for this purpose.
- fixes the regression that using these args rebuilt everything from scratch
- speeds up some CI jobs via the above point
- removes `./x miri --rustc-args` as only library tests are supported, needs to rebuild libstd, and `./x miri --compiletest-rustc-args` wouldn't work since compiletests are not supported.
|
|
|
|
array)
|
|
|
|
const vector passed through to codegen
This allows constant vectors using a repr(simd) type to be propagated
through to the backend by reusing the functionality used to do a similar
thing for the simd_shuffle intrinsic
#118209
r? RalfJung
|
|
nontemporal_store: make sure that the intrinsic is truly just a hint
The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](https://github.com/llvm/llvm-project/issues/64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.
~~Blocked on https://github.com/rust-lang/stdarch/pull/1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~
Fixes https://github.com/rust-lang/rust/issues/114582
Cc `@Amanieu` `@workingjubilee`
|
|
|
|
By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and
`FnHeader`, which can be packed more efficiently. This reduces the size
of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms.
This reduces peak memory usage by a few percent on some benchmarks. It
also reduces cache misses and page faults similarly, though this doesn't
translate to clear cycles or wall-time improvements on CI.
|
|
|
|
|
|
Add implied target features to target_feature attribute
See [zulip](https://rust-lang.zulipchat.com/#narrow/stream/208962-t-libs.2Fstdarch/topic/Why.20would.20target-feature.20include.20implied.20features.3F) for some context. Adds implied target features, e.g. `#[target_feature(enable = "avx2")]` acts like `#[target_feature(enable = "avx2,avx,sse4.2,sse4.1...")]`. Fixes #128125, fixes #128426
The implied feature sets are taken from [the rust reference](https://doc.rust-lang.org/reference/attributes/codegen.html?highlight=target-fea#x86-or-x86_64), there are certainly more features and targets to add.
Please feel free to reassign this to whoever should review it.
r? ``@Amanieu``
|
|
|
|
|
|
|
|
|
|
|
|
Update compiler_builtins to 0.1.114
The `weak-intrinsics` feature was removed from compiler_builtins in https://github.com/rust-lang/compiler-builtins/pull/598, so dropped the `compiler-builtins-weak-intrinsics` feature from alloc/std/sysroot.
In https://github.com/rust-lang/compiler-builtins/pull/593, some builtins for f16/f128 were added. These don't work for all compiler backends, so add a `compiler-builtins-no-f16-f128` feature and disable it for cranelift and gcc.
|