| Age | Commit message (Collapse) | Author | Lines |
|
|
|
r=workingjubilee
Make repr(packed) vectors work with SIMD intrinsics
In #117116 I fixed `#[repr(packed, simd)]` by doing the expected thing and removing padding from the layout. This should be the last step in providing a solution to rust-lang/portable-simd#319
|
|
|
|
Unroll first iteration of checked_ilog loop
This follows the optimization of #115913. As shown in https://github.com/rust-lang/rust/pull/115913#issuecomment-2066788006, the performance was improved in all important cases, but some regressions were introduced for the benchmarks `u32_log_random_small`, `u8_log_random` and `u8_log_random_small`.
Basically, #115913 changed the implementation from one division per iteration to one multiplication per iteration plus one division. When there are zero iterations, this is a regression from zero divisions to one division.
This PR avoids this by avoiding the division if we need zero iterations by returning `Some(0)` early. It also reduces the number of multiplications by one in all other cases.
|
|
Except for `simd-intrinsic/`, which has a lot of files containing
multiple types like `u8x64` which really are better when hand-formatted.
There is a surprising amount of two-space indenting in this directory.
Non-trivial changes:
- `rustfmt::skip` needed in `debug-column.rs` to preserve meaning of the
test.
- `rustfmt::skip` used in a few places where hand-formatting read more
nicely: `enum/enum-match.rs`
- Line number adjustments needed for the expected output of
`debug-column.rs` and `coroutine-debug.rs`.
|
|
Add `-Zfixed-x18`
This PR is a follow-up to #124323 that proposes a different implementation. Please read the description of that PR for motivation.
See the equivalent flag in [the clang docs](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-ffixed-x18).
MCP: https://github.com/rust-lang/compiler-team/issues/748
Fixes https://github.com/rust-lang/rust/issues/121970
r? rust-lang/compiler
|
|
Make more of the test suite run on Mac Catalyst
Combined with https://github.com/rust-lang/rust/pull/125225, the only failing parts of the test suite are in `tests/rustdoc-js`, `tests/rustdoc-js-std` and `tests/debuginfo`. Tested with:
```console
./x test --target=aarch64-apple-ios-macabi library/std
./x test --target=aarch64-apple-ios-macabi --skip=tests/rustdoc-js --skip=tests/rustdoc-js-std --skip=tests/debuginfo tests
```
Will probably put up a PR later to enable _running_ on (not just compiling for) Mac Catalyst in CI, though not sure where exactly I should do so? `src/ci/github-actions/jobs.yml`?
Note that I've deliberately _not_ enabled stack overflow handlers on iOS/tvOS/watchOS/visionOS (see https://github.com/rust-lang/rust/issues/25872), but rather just skipped those tests, as it uses quite a few APIs that I'd be weary about getting rejected by the App Store (note that Swift doesn't do it on those platforms either).
r? ``@workingjubilee``
CC ``@thomcc``
``@rustbot`` label O-ios O-apple
|
|
Add an intrinsic for `ptr::metadata`
The follow-up to #123840, so we can remove `PtrComponents` and `PtrRepr` from libcore entirely (well, after a bootstrap update).
As discussed in <https://rust-lang.zulipchat.com/#narrow/stream/189540-t-compiler.2Fwg-mir-opt/topic/.60ptr_metadata.60.20in.20MIR/near/435637808>, this introduces `UnOp::PtrMetadata` taking a raw pointer and returning the associated metadata value.
By no longer going through a `union`, this should also help future PRs better optimize pointer operations.
r? ``@oli-obk``
|
|
|
|
Omit non-needs_drop drop_in_place in vtables
This replaces the drop_in_place reference with null in vtables. On librustc_driver.so, this drops about ~17k (11%) dynamic relocations from the output, since many vtables can now be placed in read-only memory, rather than having a relocated pointer included.
This makes a tradeoff by adding a null check at vtable call sites. I'm not sure that's readily avoidable without changing the vtable format (e.g., so that we can use a pc-relative relocation instead of an absolute address, and avoid the dynamic relocation that way). But it seems likely that the check is cheap at runtime.
Accepted MCP: https://github.com/rust-lang/compiler-team/issues/730
|
|
|
|
This adds the `only-apple`/`ignore-apple` compiletest directive, and
uses that basically everywhere instead of `only-macos`/`ignore-macos`.
Some of the updates in `run-make` are a bit redundant, as they use
`ignore-cross-compile` and won't run on iOS - but using Apple in these
is still more correct, so I've made that change anyhow.
|
|
This replaces the drop_in_place reference with null in vtables. On
librustc_driver.so, this drops about ~17k dynamic relocations from the
output, since many vtables can now be placed in read-only memory, rather
than having a relocated pointer included.
This makes a tradeoff by adding a null check at vtable call sites.
That's hard to avoid without changing the vtable format (e.g., to use a
pc-relative relocation instead of an absolute address, and avoid the
dynamic relocation that way). But it seems likely that the check is
cheap at runtime.
|
|
Migrate `run-make/issue-46239` to `rmake`
Part of #121876 and the associated [Google Summer of Code project](https://blog.rust-lang.org/2024/05/01/gsoc-2024-selected-projects.html).
|
|
|
|
Add codegen test for array comparision opt
Fixed since rust 1.55
closes #62531
|
|
|
|
|
|
Fix ICE in non-operand `aggregate_raw_ptr` intrinsic codegen
Introduced in #123840
Found in #121571, cc `@clarfonthey`
|
|
|
|
These types are currently rejected for `as` casts by the compiler.
Remove this incorrect check and add codegen tests for all conversions
involving these types.
|
|
|
|
|
|
|
|
|
|
https://github.com/llvm/llvm-project/pull/89799 changes llvm.dbg.value/declare intrinsics to be in a different, out-of-instruction-line representation. For example
call void @llvm.dbg.declare(...)
becomes
#dbg_declare(...)
Update tests accordingly to work with both the old and new way.
|
|
|
|
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
|
|
|
|
There are a few tests that depend on some target features **not** being
enabled by default, and usually they are correct with the default x86-64
target CPU. However, in downstream builds we have modified the default
to fit our distros -- `x86-64-v2` in RHEL 9 and `x86-64-v3` in RHEL 10
-- and the latter especially trips tests that expect not to have AVX.
These cases are few enough that we can just set them back explicitly.
|
|
codegen tests: Tolerate `range()` qualifications in enum tests
Current LLVM can infer range bounds on the i8s involved with these tests, and annotates it. Accept these bounds if present.
`@rustbot` label: +llvm-main
cc `@durin42`
|
|
Current LLVM can infer range bounds on the i8s involved with these
tests, and annotates it. Accept these bounds if present.
|
|
No functional changes intended.
Found by our experimental rust + LLVM @ HEAD bot:
https://buildkite.com/llvm-project/rust-llvm-integrate-prototype/builds/27747#018f2570-018c-4b12-9c5a-38cf81453683/957-965
|
|
Set writable and dead_on_unwind attributes for sret arguments
Set the `writable` and `dead_on_unwind` attributes for `sret` arguments. This allows call slot optimization to remove more memcpy's.
See https://llvm.org/docs/LangRef.html#parameter-attributes for the specification of these attributes. In short, the statement we're making here is that:
* The return slot is writable.
* The return slot will not be read if the function unwinds.
Fixes https://github.com/rust-lang/rust/issues/90595.
|
|
When compiled with -C panic=abort we'd generate an extra
panic_cannot_unwind shim in the variant calling C-unwind.
|
|
|
|
|
|
|
|
And suggest adding the `#[coroutine]` to the closure
|
|
Stop using LLVM struct types for alloca
The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation.
Split out from #121577.
r? `@ghost`
|
|
Dellvmize some intrinsics (use `u32` instead of `Self` in some integer intrinsics)
This implements https://github.com/rust-lang/compiler-team/issues/693 minus what was implemented in #123226.
Note: I decided to _not_ change `shl`/... builder methods, as it just doesn't seem worth it.
r? ``@scottmcm``
|
|
The test confirms that when val < base, we do not divide or multiply.
|
|
|
|
This saves an extra load from memory.
|
|
For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them.
For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither
```rust
a.checked_sub(b).unwrap()
```
nor
```rust
a.checked_sub(b).unwrap_unchecked()
```
actually optimizes to `sub nuw`. After this PR they do.
|
|
|
|
|
|
Add the missing inttoptr when we ptrtoint in ptr atomics
Ralf noticed this here: https://github.com/rust-lang/rust/pull/122220#discussion_r1535172094
Our previous codegen forgot to add the cast back to integer type. The code compiles anyway, because of course all locals are in-memory to start with, so previous codegen would do the integer atomic, store the integer to a local, then load a pointer from that local. Which is definitely _not_ what we wanted: That's an integer-to-pointer transmute, so all pointers returned by these `AtomicPtr` methods didn't have provenance. Yikes.
Here's the IR for `AtomicPtr::fetch_byte_add` on 1.76: https://godbolt.org/z/8qTEjeraY
```llvm
define noundef ptr `@atomicptr_fetch_byte_add(ptr` noundef nonnull align 8 %a, i64 noundef %v) unnamed_addr #0 !dbg !7 {
start:
%0 = alloca ptr, align 8, !dbg !12
%val = inttoptr i64 %v to ptr, !dbg !12
call void `@llvm.lifetime.start.p0(i64` 8, ptr %0), !dbg !28
%1 = ptrtoint ptr %val to i64, !dbg !28
%2 = atomicrmw add ptr %a, i64 %1 monotonic, align 8, !dbg !28
store i64 %2, ptr %0, align 8, !dbg !28
%self = load ptr, ptr %0, align 8, !dbg !28
call void `@llvm.lifetime.end.p0(i64` 8, ptr %0), !dbg !28
ret ptr %self, !dbg !33
}
```
r? `@RalfJung`
cc `@nikic`
|
|
do not add prolog for variadic naked functions
fixes #99858
|
|
It's irrelevant for the purposes of this test (there is only one alloca)
and its size changes depending on the target, so it can't be matched
easily.
|