| Age | Commit message (Collapse) | Author | Lines |
|
Add codegen tests for additional cases where noop iterators get optimized away
Optimizations have improved over time and now LLVM manages to optimize more in-place-collect noop-iterators to O(1) functions. This updates the codegen test to match.
Many but not all cases reported in #79308 work now.
|
|
Speed up Vec::clear().
Currently it just calls `truncate(0)`. `truncate()` is (a) not marked as
`#[inline]`, and (b) more general than needed for `clear()`.
This commit changes `clear()` to do the work itself. This modest change
was first proposed in rust-lang#74172, where the reviewer rejected it because
there was insufficient evidence that `Vec::clear()`'s performance
mattered enough to justify the change. Recent changes within rustc have
made `Vec::clear()` hot within `macro_parser.rs`, so the change is now
clearly worthwhile.
Although it doesn't show wins on CI perf runs, this seems to be because they
use PGO. But not all platforms currently use PGO. Also, local builds don't use
PGO, and `truncate` sometimes shows up in an over-represented fashion in local
profiles. So local profiling will be made easier by this change.
Note that this will also benefit `String::clear()`, because it just
calls `Vec::clear()`.
Finally, the commit removes the `vec-clear.rs` codegen test. It was
added in #52908. From before then until now, `Vec::clear()` just called
`Vec::truncate()` with a zero length. The body of Vec::truncate() has
changed a lot since then. Now that `Vec::clear()` is doing actual work
itself, and not just calling `Vec::truncate()`, it's not surprising that
its generated code includes a load and an icmp. I think it's reasonable
to remove this test.
r? `@m-ou-se`
|
|
|
|
Optimization have improved over time and now LLVM manages to optimize more
in-place-collect noop-iterators to O(1) functions. This updates the codegen test to match.
|
|
Currently it just calls `truncate(0)`. `truncate()` is (a) not marked as
`#[inline]`, and (b) more general than needed for `clear()`.
This commit changes `clear()` to do the work itself. This modest change
was first proposed in rust-lang#74172, where the reviewer rejected it because
there was insufficient evidence that `Vec::clear()`'s performance
mattered enough to justify the change. Recent changes within rustc have
made `Vec::clear()` hot within `macro_parser.rs`, so the change is now
clearly worthwhile.
Although it doesn't show wins on CI perf runs, this seems to be because they
use PGO. But not all platforms currently use PGO. Also, local builds don't use
PGO, and `truncate` sometimes shows up in an over-represented fashion in local
profiles. So local profiling will be made easier by this change.
Note that this will also benefit `String::clear()`, because it just
calls `Vec::clear()`.
Finally, the commit removes the `vec-clear.rs` codegen test. It was
added in #52908. From before then until now, `Vec::clear()` just called
`Vec::truncate()` with a zero length. The body of Vec::truncate() has
changed a lot since then. Now that `Vec::clear()` is doing actual work
itself, and not just calling `Vec::truncate()`, it's not surprising that
its generated code includes a load and an icmp. I think it's reasonable
to remove this test.
|
|
Fix miscompilation of inline assembly with outputs in cases where we emit an invoke instead of call instruction.
We ran into this bug where rustc would segfault while trying to compile certain uses of inline assembly.
Here is a simple repro that demonstrates the issue:
```rust
#![feature(asm_unwind)]
fn main() {
let _x = String::from("string here just cause we need something with a non-trivial drop");
let foo: u64;
unsafe {
std::arch::asm!(
"mov {}, 1",
out(reg) foo,
options(may_unwind)
);
}
println!("{}", foo);
}
```
([playground link](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=7d6641e83370d2536a07234aca2498ff))
But crucially `feature(asm_unwind)` is not actually needed and this can be triggered on stable as a result of the way async functions/generators are handled in the compiler. e.g.:
```rust
extern crate futures; // 0.3.21
async fn bar() {
let foo: u64;
unsafe {
std::arch::asm!(
"mov {}, 1",
out(reg) foo,
);
}
println!("{}", foo);
}
fn main() {
futures::executor::block_on(bar());
}
```
([playground link](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1c7781c34dd4a3e80ae4bd936a0c82fc))
An example of the incorrect LLVM generated:
```llvm
bb1: ; preds = %start
%1 = invoke i64 asm sideeffect alignstack inteldialect unwind "mov ${0:q}, 1", "=&r,~{dirflag},~{fpsr},~{flags},~{memory}"()
to label %bb2 unwind label %cleanup, !srcloc !9
store i64 %1, i64* %foo, align 8
bb2:
[...snip...]
```
The store should not be placed after the asm invoke but rather should be in the normal control flow basic block (`bb2` in this case).
[Here](https://gist.github.com/luqmana/be1af5b64d2cda5a533e3e23a7830b44) is a writeup of the investigation that lead to finding this.
|
|
|
|
|
|
It was using `Iterator::by_ref` in the implementation, which ended up pessimizing it enough that, for example, it didn't vectorize when we tried it in the <https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Reducing.20sum.20into.20wider.20types> conversation.
Demonstration that the codegen test doesn't pass on the current nightly: <https://rust.godbolt.org/z/Taxev5eMn>
|
|
async: Give predictable name to binding generated from .await expressions.
This name makes it to debuginfo and allows debuggers to identify such bindings and their captured versions in suspended async fns.
This will be useful for async stack traces, as discussed in https://internals.rust-lang.org/t/async-debugging-logical-stack-traces-setting-goals-collecting-examples/15547.
I don't know if this needs some discussion by ````@rust-lang/compiler,```` e.g. about the name of the binding (`__awaitee`) or about the fact that this PR introduces a (soft) guarantee about a compiler generated name. Although, regarding the later, I think the same reasoning applies here as it does for debuginfo in general.
r? ````@tmandry````
|
|
expressions.
This name makes it to debuginfo and allows debuggers to identify such bindings and
their captured versions in suspended async fns.
|
|
|
|
|
|
Signed-off-by: codehorseman <cricis@yeah.net>
|
|
works for them
|
|
This commit
- changes names to use di_node instead of metadata
- uniformly names all functions that build new debuginfo nodes build_xyz_di_node
- renames CrateDebugContext to CodegenUnitDebugContext (which is more accurate)
- moves TypeMap and functions that work directly work with it to a new type_map module
- moves and reimplements enum related builder functions to a new enums module
- splits enum debuginfo building for the native and cpp-like cases, since they are mostly separate
- uses SmallVec instead of Vec in many places
- removes the old infrastructure for dealing with recursion cycles (create_and_register_recursive_type_forward_declaration(), RecursiveTypeDescription, set_members_of_composite_type(), MemberDescription, MemberDescriptionFactory, prepare_xyz_metadata(), etc)
- adds type_map::build_type_with_children() as a replacement for dealing with recursion cycles
- adds many (doc-)comments explaining what's going on
- changes cpp-like naming for C-Style enums so they don't get a enum$<...> name (because the NatVis visualizer does not apply to them)
- fixes detection of what is a C-style enum because some enums where classified as C-style even though they have fields
- changes the position of discriminant debuginfo node so it is consistently nested inside the top-level union instead of, sometimes, next to it
|
|
This change is somewhat extensive, since it affects MIR -- since this is called to determine Copy vs Move -- so any test that's `no_core` needs to actually have the normal `impl`s it uses.
|
|
We can't do it for everything, but it would be nice to at least stop making calls to clone methods in debug from things like derived-clones.
|
|
This ensures that information about target features configured with
`-C target-feature=...` or detected with `-C target-cpu=native` is
retained for subsequent consumers of LLVM bitcode.
This is crucial for linker plugin LTO, since this information is not
conveyed to the plugin otherwise.
|
|
Reopen 91719
Reopened #91719, which was closed inadvertently due to technical difficulties.
|
|
Add !align metadata on loads of &/&mut/Box
Note that this refers to the alignment of what the loaded value points
to, _not_ the alignment of the loaded value itself.
r? `@ghost` (blocked on #94158)
|
|
|
|
Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>
|
|
|
|
|
|
|
|
r=michaelwoerister,davidtwco
Restore the local filter on mono item sorting
In `CodegenUnit::items_in_deterministic_order`, there's a comment that
only local HirIds should be taken into account, but #90408 removed the
`as_local` call that sets others to None. Restoring that check fixes the
s390x hangs seen in [RHBZ 2058803].
[RHBZ 2058803]: https://bugzilla.redhat.com/show_bug.cgi?id=2058803
|
|
|
|
|
|
|
|
This changed in upstream change https://reviews.llvm.org/D98152 (aka
https://github.com/llvm/llvm-project/commit/a266af721153fab6452094207b09ed265ab0be7b)
wherein LLVM got smarter about using intrinsics. As best I can tell the
change I've made here preserves the intent of the test on LLVM 14 and
before while also passing on LLVM 15 and later.
|
|
Revert "Auto merge of #92419 - erikdesjardins:coldland, r=nagisa"
Should fix (untested) #94390
Reopens #46515, #87055
r? `@ehuss`
|
|
Note that this refers to the alignment of what the loaded value points
to, _not_ the alignment of the loaded value itself.
|
|
more complete sparc64 ABI fix for aggregates with floating point members
Previous fix didn't handle nested structures at all.
|
|
Apply noundef metadata to loads of types that do not permit raw init
This matches the noundef attributes we apply on arguments/return types.
Fixes (partially) #74378.
|
|
|
|
This reverts commit 4f49627c6fe2a32d1fed6310466bb0e1c535c0c0, reversing
changes made to 028c6f1454787c068ff5117e9000a1de4fd98374.
|
|
Apply noundef attribute to all scalar types which do not permit raw init
Beyond `&`/`&mut`/`Box`, this covers `char`, enum discriminants, `NonZero*`, etc.
All such types currently cause a Miri error if left uninitialized,
and an `invalid_value` lint in cases like `mem::uninitialized::<char>()`.
Note that this _does not_ change whether or not it is UB for `u64` (or
other integer types with no invalid values) to be undef.
Fixes (partially) #74378.
r? `@ghost` (blocked on #94127)
`@rustbot` label S-blocked
|
|
For MIRI, cfg out the swap vectorization logic from 94212
Because of #69488 the swap logic from #94212 doesn't currently work in MIRI.
Copying in smaller pieces is probably much worse for its performance anyway, so it'd probably rather just use the simple path regardless.
Part of #94371, though another PR will be needed for the CTFE aspect.
r? `@oli-obk`
cc `@RalfJung`
|
|
This matches the noundef attributes we apply on arguments/return types.
|
|
|
|
No branch protection metadata unless enabled
Even if we emit metadata disabling branch protection, this metadata may
conflict with other modules (e.g. during LTO) that have different branch
protection metadata set.
This is an unstable flag and feature, so ideally the flag not being
specified should act as if the feature wasn't implemented in the first
place.
Additionally this PR also ensures we emit an error if
`-Zbranch-protection` is set on targets other than the supported
aarch64. For now the error is being output from codegen, but ideally it
should be moved to earlier in the pipeline before stabilization.
|
|
Beyond `&`/`&mut`/`Box`, this covers `char`, discriminants, `NonZero*`, etc.
All such types currently cause a Miri error if left uninitialized,
and an `invalid_value` lint in cases like `mem::uninitialized::<char>()`
Note that this _does not_ change whether or not it is UB for `u64` (or
other integer types with no invalid values) to be undef.
|
|
At opt-level=0, apply only ABI-affecting attributes to functions
This should provide a small perf improvement for debug builds,
and should more than cancel out the perf regression from adding noundef (https://github.com/rust-lang/rust/pull/93670#issuecomment-1038347581, #94106).
r? `@nikic`
|
|
|
|
|
|
debuginfo: Simplify TypeMap used during LLVM debuginfo generation.
This PR simplifies the TypeMap that is used in `rustc_codegen_llvm::debuginfo::metadata`. It was unnecessarily complicated because it was originally implemented when types were not yet normalized before codegen. So it did it's own normalization and kept track of multiple unnormalized types being mapped to a single unique id.
This PR is based on https://github.com/rust-lang/rust/pull/93503, which is not merged yet.
The PR also removes the arena used for allocating string ids and instead uses `InlinableString` from the [inlinable_string](https://crates.io/crates/inlinable_string) crate. That might not be the best choice, since that crate does not seem to be very actively maintained. The [flexible-string](https://crates.io/crates/flexible-string) crate would be an alternative.
r? `@ghost`
|
|
Use undef for (some) partially-uninit constants
There needs to be some limit to avoid perf regressions on large arrays
with undef in each element (see comment in the code).
Fixes: #84565
Original PR: #83698
Depends on LLVM 14: #93577
|
|
Stop manually SIMDing in `swap_nonoverlapping`
Like I previously did for `reverse` (#90821), this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.
A variety of codegen tests are included to confirm that the various cases are still being vectorized.
It does still need logic to type-erase in some cases, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`.
As a bonus, this change also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>
<details>
<summary>ASM for this example</summary>
## Before (from godbolt)
note the `push`/`pop`s and `memcpy`
```x86
swap_m256_slice:
push r15
push r14
push r13
push r12
push rbx
sub rsp, 32
cmp rsi, rcx
jne .LBB0_6
mov r14, rsi
shl r14, 5
je .LBB0_6
mov r15, rdx
mov rbx, rdi
xor eax, eax
.LBB0_3:
mov rcx, rax
vmovaps ymm0, ymmword ptr [rbx + rax]
vmovaps ymm1, ymmword ptr [r15 + rax]
vmovaps ymmword ptr [rbx + rax], ymm1
vmovaps ymmword ptr [r15 + rax], ymm0
add rax, 32
add rcx, 64
cmp rcx, r14
jbe .LBB0_3
sub r14, rax
jbe .LBB0_6
add rbx, rax
add r15, rax
mov r12, rsp
mov r13, qword ptr [rip + memcpy@GOTPCREL]
mov rdi, r12
mov rsi, rbx
mov rdx, r14
vzeroupper
call r13
mov rdi, rbx
mov rsi, r15
mov rdx, r14
call r13
mov rdi, r15
mov rsi, r12
mov rdx, r14
call r13
.LBB0_6:
add rsp, 32
pop rbx
pop r12
pop r13
pop r14
pop r15
vzeroupper
ret
```
## After (from my machine)
Note no `rsp` manipulation, sorry for different ASM syntax
```x86
swap_m256_slice:
cmpq %r9, %rdx
jne .LBB1_6
testq %rdx, %rdx
je .LBB1_6
cmpq $1, %rdx
jne .LBB1_7
xorl %r10d, %r10d
jmp .LBB1_4
.LBB1_7:
movq %rdx, %r9
andq $-2, %r9
movl $32, %eax
xorl %r10d, %r10d
.p2align 4, 0x90
.LBB1_8:
vmovaps -32(%rcx,%rax), %ymm0
vmovaps -32(%r8,%rax), %ymm1
vmovaps %ymm1, -32(%rcx,%rax)
vmovaps %ymm0, -32(%r8,%rax)
vmovaps (%rcx,%rax), %ymm0
vmovaps (%r8,%rax), %ymm1
vmovaps %ymm1, (%rcx,%rax)
vmovaps %ymm0, (%r8,%rax)
addq $2, %r10
addq $64, %rax
cmpq %r10, %r9
jne .LBB1_8
.LBB1_4:
testb $1, %dl
je .LBB1_6
shlq $5, %r10
vmovaps (%rcx,%r10), %ymm0
vmovaps (%r8,%r10), %ymm1
vmovaps %ymm1, (%rcx,%r10)
vmovaps %ymm0, (%r8,%r10)
.LBB1_6:
vzeroupper
retq
```
</details>
This does all its copying operations as either the original type or as `MaybeUninit`s, so as far as I know there should be no potential abstract machine issues with reading padding bytes as integers.
<details>
<summary>Perf is essentially unchanged</summary>
Though perhaps with more target features this would help more, if it could pick bigger chunks
## Before
```
running 10 tests
test slice::swap_with_slice_4x_usize_30 ... bench: 894 ns/iter (+/- 11)
test slice::swap_with_slice_4x_usize_3000 ... bench: 99,476 ns/iter (+/- 2,784)
test slice::swap_with_slice_5x_usize_30 ... bench: 1,257 ns/iter (+/- 7)
test slice::swap_with_slice_5x_usize_3000 ... bench: 139,922 ns/iter (+/- 959)
test slice::swap_with_slice_rgb_30 ... bench: 328 ns/iter (+/- 27)
test slice::swap_with_slice_rgb_3000 ... bench: 16,215 ns/iter (+/- 176)
test slice::swap_with_slice_u8_30 ... bench: 312 ns/iter (+/- 9)
test slice::swap_with_slice_u8_3000 ... bench: 5,401 ns/iter (+/- 123)
test slice::swap_with_slice_usize_30 ... bench: 368 ns/iter (+/- 3)
test slice::swap_with_slice_usize_3000 ... bench: 28,472 ns/iter (+/- 3,913)
```
## After
```
running 10 tests
test slice::swap_with_slice_4x_usize_30 ... bench: 868 ns/iter (+/- 36)
test slice::swap_with_slice_4x_usize_3000 ... bench: 99,642 ns/iter (+/- 1,507)
test slice::swap_with_slice_5x_usize_30 ... bench: 1,194 ns/iter (+/- 11)
test slice::swap_with_slice_5x_usize_3000 ... bench: 139,761 ns/iter (+/- 5,018)
test slice::swap_with_slice_rgb_30 ... bench: 324 ns/iter (+/- 6)
test slice::swap_with_slice_rgb_3000 ... bench: 15,962 ns/iter (+/- 287)
test slice::swap_with_slice_u8_30 ... bench: 281 ns/iter (+/- 5)
test slice::swap_with_slice_u8_3000 ... bench: 5,324 ns/iter (+/- 40)
test slice::swap_with_slice_usize_30 ... bench: 275 ns/iter (+/- 5)
test slice::swap_with_slice_usize_3000 ... bench: 28,277 ns/iter (+/- 277)
```
</detail>
|
|
|