about summary refs log tree commit diff
path: root/src/test/codegen
AgeCommit message (Collapse)AuthorLines
2022-11-26Auto merge of #103556 - clubby789:specialize-option-partial-eq, r=scottmcmbors-0/+34
Manually implement PartialEq for Option<T> and specialize non-nullable types This PR manually implements `PartialEq` and `StructuralPartialEq` for `Option`, which seems to produce slightly better codegen than the automatically derived implementation. It also allows specializing on the `core::num::NonZero*` and `core::ptr::NonNull` types, taking advantage of the niche optimization by transmuting the `Option<T>` to `T` to be compared directly, which can be done in just two instructions. A comparison of the original, new and specialized code generation is available [here](https://godbolt.org/z/dE4jxdYsa).
2022-11-24Tune RepeatWith::try_fold and Take::for_each and Vec::extend_trustedScott McMurray-0/+7
2022-11-24Avoid `GenFuture` shim when compiling async constructsArpad Borsos-3/+5
Previously, async constructs would be lowered to "normal" generators, with an additional `from_generator` / `GenFuture` shim in between to convert from `Generator` to `Future`. The compiler will now special-case these generators internally so that async constructs will *directly* implement `Future` without the need to go through the `from_generator` / `GenFuture` shim. The primary motivation for this change was hiding this implementation detail in stack traces and debuginfo, but it can in theory also help the optimizer as there is less abstractions to see through.
2022-11-22fix tests, update size assertsThe 8472-5/+5
2022-11-20Rollup merge of #104435 - scottmcm:iter-repeat-n, r=thomccYuki Okushi-0/+56
`VecDeque::resize` should re-use the buffer in the passed-in element Today it always copies it for *every* appended element, but one of those clones is avoidable. This adds `iter::repeat_n` (https://github.com/rust-lang/rust/issues/104434) as the primitive needed to do this. If this PR is acceptable, I'll also use this in `Vec` rather than its custom `ExtendElement` type & infrastructure that is harder to share between multiple different containers: https://github.com/rust-lang/rust/blob/101e1822c3e54e63996c8aaa014d55716f3937eb/library/alloc/src/vec/mod.rs#L2479-L2492
2022-11-18Rollup merge of #103456 - scottmcm:fix-unchecked-shifts, r=scottmcmManish Goregaokar-0/+66
`unchecked_{shl|shr}` should use `u32` as the RHS The other shift methods, such as https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.checked_shr and https://doc.rust-lang.org/nightly/std/primitive.i16.html#method.wrapping_shl, use `u32` for the shift amount. That's consistent with other things, like `count_ones`, which also always use `u32` for a bit count, regardless of the size of the type. This PR changes `unchecked_shl` and `unchecked_shr` to also use `u32` for the shift amount (rather than Self). cc #85122, the `unchecked_math` tracking issue
2022-11-18rustc_codegen_ssa: Fix for codegen_get_discrMichael Benfield-5/+2
When doing the optimized implementation of getting the discriminant, the arithmetic needs to be done in the tag type so wrapping behavior works correctly. Fixes #104519
2022-11-16Ignore the unchecked-shifts codegen test in debug-assertions buildsScott McMurray-0/+1
2022-11-15`VecDeque::resize` should re-use the buffer in the passed-in elementScott McMurray-0/+56
Today it always copies it for *every* appended element, but one of those clones is avoidable.
2022-11-11rustc_codegen_ssa: Better code generation for niche discriminants.Michael Benfield-0/+112
In some cases we can avoid arithmetic before checking whether a niche represents an untagged variant. This is relevant to #101872
2022-11-10Rollup merge of #104077 - nicholasbishop:bishop-uefi-aapcs, r=nagisaManish Goregaokar-1/+1
Use aapcs for efiapi calling convention on arm On arm, [llvm treats the C calling convention as `aapcs` on soft-float targets and `aapcs-vfp` on hard-float targets](https://github.com/rust-lang/compiler-builtins/issues/116#issuecomment-261057422). UEFI specifies in the arm calling convention that [floating point extensions aren't used](https://uefi.org/specs/UEFI/2.10/02_Overview.html#detailed-calling-convention), so always translate `efiapi` to `aapcs` on arm. https://github.com/rust-lang/rust/issues/65815
2022-11-08Rollup merge of #103353 - wesleywiser:fix_lld_thinlto_msvc, r=michaelwoeristerManish Goregaokar-0/+28
Fix Access Violation when using lld & ThinLTO on windows-msvc Users report an AV at runtime of the compiled binary when using lld and ThinLTO on windows-msvc. The AV occurs when accessing a static value which is defined in one crate but used in another. Based on the disassembly of the cross-crate use, it appears that the use is not correctly linked with the definition and is instead assigned a garbage pointer value. If we look at the symbol tables for each crates' obj file, we can see what is happening: *lib.obj*: ``` COFF SYMBOL TABLE ... 00E 00000000 SECT2 notype External | _ZN10reproducer7memrchr2FN17h612b61ca0e168901E ... ``` *bin.obj*: ``` COFF SYMBOL TABLE ... 010 00000000 UNDEF notype External | __imp__ZN10reproducer7memrchr2FN17h612b61ca0e168901E ... ``` The use of the symbol has the "import" style symbol name but the declaration doesn't generate any symbol with the same name. As a result, linking the files generates a warning from lld: > rust-lld: warning: bin.obj: locally defined symbol imported: reproducer::memrchr::FN::h612b61ca0e168901 (defined in lib.obj) [LNK4217] and the symbol reference remains undefined at runtime leading to the AV. To fix this, we just need to detect that we are performing ThinLTO (and thus, static linking) and omit the `dllimport` attribute on the extern item in LLVM IR. Fixes #81408
2022-11-06Use aapcs for efiapi calling convention on armNicholas Bishop-1/+1
On arm, llvm treats the C calling convention as `aapcs` on soft-float targets and `aapcs-vfp` on hard-float targets [1]. UEFI specifies in the arm calling convention that floating point extensions aren't used [2], so always translate `efiapi` to `aapcs` on arm. [1]: https://github.com/rust-lang/compiler-builtins/issues/116#issuecomment-261057422 [2]: https://uefi.org/specs/UEFI/2.10/02_Overview.html#detailed-calling-convention https://github.com/rust-lang/rust/issues/65815
2022-11-05Bless more testsMichael Goulet-11/+5
2022-11-04LLVM 16: Switch to using MemoryEffectsTim Neumann-2/+4
2022-11-03Fix Access Violation when using lld & ThinLTO on windows-msvcWesley Wiser-1/+1
Users report an AV at runtime of the compiled binary when using lld and ThinLTO on windows-msvc. The AV occurs when accessing a static value which is defined in one crate but used in another. Based on the disassembly of the cross-crate use, it appears that the use is not correctly linked with the definition and is instead assigned a garbage pointer value. If we look at the symbol tables for each crates' obj file, we can see what is happening: *lib.obj*: ``` COFF SYMBOL TABLE ... 00E 00000000 SECT2 notype External | _ZN10reproducer7memrchr2FN17h612b61ca0e168901E ... ``` *bin.obj*: ``` COFF SYMBOL TABLE ... 010 00000000 UNDEF notype External | __imp__ZN10reproducer7memrchr2FN17h612b61ca0e168901E ... ``` The use of the symbol has the "import" style symbol name but the declaration doesn't generate any symbol with the same name. As a result, linking the files generates a warning from lld: > rust-lld: warning: bin.obj: locally defined symbol imported: reproducer::memrchr::FN::h612b61ca0e168901 (defined in lib.obj) [LNK4217] and the symbol reference remains undefined at runtime leading to the AV. To fix this, we just need to detect that we are performing ThinLTO (and thus, static linking) and omit the `dllimport` attribute on the extern item in LLVM IR.
2022-11-03Add test caseWesley Wiser-0/+28
2022-10-31Specialize PartialEq for Option<num::NonZero*> and Option<ptr::NonNull>clubby789-0/+34
2022-10-31Remove bounds check with enum castouz-a-8/+4
2022-10-31Use `br` instead of `switch` in more cases.Nicholas Nethercote-29/+83
`codegen_switchint_terminator` already uses `br` instead of `switch` when there is one normal target plus the `otherwise` target. But there's another common case with two normal targets and an `otherwise` target that points to an empty unreachable BB. This comes up a lot when switching on the tags of enums that use niches. The pattern looks like this: ``` bb1: ; preds = %bb6 %3 = load i8, ptr %_2, align 1, !range !9, !noundef !4 %4 = sub i8 %3, 2 %5 = icmp eq i8 %4, 0 %_6 = select i1 %5, i64 0, i64 1 switch i64 %_6, label %bb3 [ i64 0, label %bb4 i64 1, label %bb2 ] bb3: ; preds = %bb1 unreachable ``` This commit adds code to convert the `switch` to a `br`: ``` bb1: ; preds = %bb6 %3 = load i8, ptr %_2, align 1, !range !9, !noundef !4 %4 = sub i8 %3, 2 %5 = icmp eq i8 %4, 0 %_6 = select i1 %5, i64 0, i64 1 %6 = icmp eq i64 %_6, 0 br i1 %6, label %bb4, label %bb2 bb3: ; No predecessors! unreachable ``` This has a surprisingly large effect on compile times, with reductions of 5% on debug builds of some crates. The reduction is all due to LLVM taking less time. Maybe LLVM is just much better at handling `br` than `switch`. The resulting code is still suboptimal. - The `icmp`, `select`, `icmp` sequence is silly, converting an `i1` to an `i64` and back to an `i1`. But with the current code structure it's hard to avoid, and LLVM will easily clean it up, in opt builds at least. - `bb3` is usually now truly dead code (though not always, so it can't be removed universally).
2022-10-30Auto merge of #103299 - nikic:usub-overflow, r=wesleywiserbors-0/+16
Don't use usub.with.overflow intrinsic The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential. The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR. Fixes #103285.
2022-10-28Auto merge of #103071 - wesleywiser:fix_inlined_line_numbers, r=davidtwcobors-0/+25
Fix line numbers for MIR inlined code `should_collapse_debuginfo` detects if the specified span is part of a macro expansion however it does this by checking if the span is anything other than a normal (non-expanded) kind, then the span sequence is walked backwards to the root span. This doesn't work when the MIR inliner inlines code as it creates spans with expansion information set to `ExprKind::Inlined` and results in the line number being attributed to the inline callsite rather than the normal line number of the inlined code. Fixes #103068
2022-10-23`unchecked_{shl|shr}` should use `u32` as the RHSScott McMurray-0/+65
2022-10-21Introduce deduced parameter attributes, and use them for deducing `readonly` onPatrick Walton-1/+61
indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization *now*; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: https://github.com/rust-lang/rust/issues/103103 [PR 103070]: https://github.com/rust-lang/rust/pull/103070
2022-10-20Don't use usub.with.overflow intrinsicNikita Popov-0/+16
The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential. The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR. Fixes #103285.
2022-10-14Fix line numbers for MIR inlined codeWesley Wiser-1/+1
`should_collapse_debuginfo` detects if the specified span is part of a macro expansion however it does this by checking if the span is anything other than a normal (non-expanded) kind, then the span sequence is walked backwards to the root span. This doesn't work when the MIR inliner inlines code as it creates spans with expansion information set to `ExprKind::Inlined` and results in the line number being attributed to the inline callsite rather than the normal line number of the inlined code.
2022-10-14Add test case for MIR inlining debuginfo line numbersWesley Wiser-0/+25
2022-10-14more dupe word typosRageking8-1/+1
2022-10-11Auto merge of #102724 - pcc:scs-fix-test, r=Mark-Simulacrumbors-3/+3
Fix the sanitizer_scs_attr_check.rs test The test is failing when targeting aarch64 Android. The intent appears to have been to look for a function attributes comment (or the absence of one) on the line preceding the function declaration. But this isn't quite possible with FileCheck and the test as written was looking for a line with `no_scs` after a line with `scs`, which doesn't appear in the output. Instead, match on the function attributes comment on the line following the demangled function name comment.
2022-10-10Auto merge of #102596 - scottmcm:option-bool-calloc, r=Mark-Simulacrumbors-1/+18
Do the `calloc` optimization for `Option<bool>` Inspired by <https://old.reddit.com/r/rust/comments/xtiqj8/why_is_this_functional_version_faster_than_my_for/iqqy37b/>.
2022-10-07add a few more assert_unsafe_preconditionRalf Jung-0/+1
2022-10-05Fix the sanitizer_scs_attr_check.rs testPeter Collingbourne-3/+3
The test is failing when targeting aarch64 Android. The intent appears to have been to look for a function attributes comment (or the absence of one) on the line preceding the function declaration. But this isn't quite possible with FileCheck and the test as written was looking for a line with `no_scs` after a line with `scs`, which doesn't appear in the output. Instead, match on the function attributes comment on the line following the demangled function name comment.
2022-10-03Auto merge of #102503 - cuviper:x86-stack-probes, r=nagisabors-1/+9
Enable inline stack probes on X86 with LLVM 16 The known problems with x86 inline-asm stack probes have been solved on LLVM main (16), so this flips the switch. Anyone using bleeding-edge LLVM with rustc can start testing this, as I have done locally. We'll get more direct rust-ci when LLVM 16 branches and we start our upgrade, and we can always patch or disable it then if we find new problems. The previous attempt was #77885, reverted in #84708.
2022-10-02Do the `calloc` optimization for `Option<bool>`Scott McMurray-1/+18
Inspired by <https://old.reddit.com/r/rust/comments/xtiqj8/why_is_this_functional_version_faster_than_my_for/iqqy37b/>.
2022-10-02Auto merge of #102535 - scottmcm:optimize-split-at-partition-point, r=thomccbors-0/+20
Tell LLVM that `partition_point` returns a valid fencepost This was already done for a successful `binary_search`, but this way `partition_point` can get similar optimizations. Demonstration that nightly can't do this optimization today, and leaves in the panicking path: <https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=e1074cd2faf5f68e49cffd728ded243a> r? `@thomcc`
2022-10-02Auto merge of #102424 - sunfishcode:sunfishcode/hidden-main, r=nagisabors-1/+1
Declare `main` as visibility hidden on targets that default to hidden. On targets with `default_hidden_visibility` set, which is currrently just WebAssembly, declare the generated `main` function with visibility hidden. This makes it consistent with clang's WebAssembly target, where `main` is just a user function that gets the same visibility as any other user function, which is hidden on WebAssembly unless explicitly overridden. This will help simplify use cases which in the future may want to automatically wasm-export all visibility-"default" symbols. `main` isn't intended to be wasm-exported, and marking it hidden prevents it from being wasm-exported in that scenario.
2022-09-30Tell LLVM that `partition_point` returns a valid fencepostScott McMurray-0/+20
This was already done for a successful `binary_search`, but this way `partition_point` can get similar optimizations.
2022-09-30Allow `hidden` in src/test/codegen/abi-main-signature-32bit-c-int.rsDan Gohman-1/+1
2022-09-29Enable inline stack probes on X86 with LLVM 16Josh Stone-1/+9
2022-09-26Enable inline stack probes on PowerPC and SystemZJosh Stone-22/+48
2022-09-21Auto merge of #100214 - scottmcm:strict-range, r=thomccbors-0/+14
Optimize `array::IntoIter` `.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`. This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better. I added one simple demonstration of the improvement as a codegen test. (`vec::IntoIter` uses pointers instead of indexes, so doesn't have this problem, but that only works because its elements are boxed. `array::IntoIter` can't use pointers because that would keep it from being movable.)
2022-09-19Optimize `array::IntoIter`Scott McMurray-0/+14
`.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`. This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better. I added one simple demonstration of the improvement as a codegen test.
2022-09-17Add a codegen test for slice::from_ptr_rangeScott McMurray-0/+23
2022-09-16Auto merge of #97800 - ↵bors-3/+252
pnkfelix:issue-97463-fix-aarch64-call-abi-does-not-zeroext, r=wesleywiser Aarch64 call abi does not zeroext (and one cannot assume it does so) Fix #97463
2022-09-13Bless codegen.Camille GILLOT-2/+2
2022-09-13Bless codegen test.Camille GILLOT-2/+2
2022-09-09Use RelocModel::Pic for UEFI targetsNicholas Bishop-1/+1
In https://github.com/rust-lang/rust/pull/100537, the relocation model for UEFI targets was changed from PIC (the default value) to static. There was some dicussion of this change here: https://github.com/rust-lang/rust/pull/100537#discussion_r952363012 It turns out that this can cause compilation to fail as described in https://github.com/rust-lang/rust/issues/101377, so switch back to PIC. Fixes https://github.com/rust-lang/rust/issues/101377
2022-09-05Add test for #98294Nikita Popov-0/+19
Add a test to make that the failure condition for this pattern is optimized away. Fixes #98294.
2022-09-01Auto merge of #100707 - dzvon:fix-typo, r=davidtwcobors-16/+16
Fix a bunch of typo This PR will fix some typos detected by [typos]. I only picked the ones I was sure were spelling errors to fix, mostly in the comments. [typos]: https://github.com/crate-ci/typos
2022-09-01Auto merge of #100537 - petrochenkov:piccheck, r=oli-obkbors-2/+2
rustc_target: Add some more target spec sanity checking