rust - https://github.com/rust-lang/rust

Age	Commit message (Collapse)	Author	Lines
2022-12-21	Auto merge of #105812 - ojeda:no-jump-tables, r=nikic	bors	-0/+22
	Add `-Zno-jump-tables` This flag mimics GCC/Clang's `-fno-jump-tables` [1][2], which makes the codegen backend avoid generating jump tables when lowering switches. In the case of LLVM, the `"no-jump-tables"="true"` function attribute is added to every function. The kernel currently needs it for x86 when enabling IBT [3], as well as for Alpha (plus VDSO objects in MIPS/LoongArch). [1] https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#index-fno-jump-tables [2] https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fjump-tables [3] https://github.com/torvalds/linux/blob/v6.1/arch/x86/Makefile#L75-L83
2022-12-20	Add `-Zno-jump-tables`	Miguel Ojeda	-0/+22
	This flag mimics GCC/Clang's `-fno-jump-tables` [1][2], which makes the codegen backend avoid generating jump tables when lowering switches. In the case of LLVM, the `"no-jump-tables"="true"` function attribute is added to every function. The kernel currently needs it for x86 when enabling IBT [3], as well as for Alpha (plus VDSO objects in MIPS/LoongArch). [1] https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#index-fno-jump-tables [2] https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fjump-tables [3] https://github.com/torvalds/linux/blob/v6.1/arch/x86/Makefile#L75-L83 Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-12-18	Auto merge of #105446 - erikdesjardins:vt-size, r=nikic	bors	-1/+52
	Add 0..=isize::MAX range metadata to size loads from vtables This is the (much belated) size counterpart to #91569. Inspired by https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/Range.20metadata.20for.20.60size_of_val.60.20and.20other.20isize.3A.3AMAX.20limits. This could help optimize layout computations based on the size of a dyn trait. Though, admittedly, adding this to vtables wouldn't be as beneficial as adding it to slice len, which is used much more often. Miri detects this UB already: https://github.com/rust-lang/rust/blob/b7cc99142ad0cfe47e2fe9f7a82eaf5b672c0573/compiler/rustc_const_eval/src/interpret/traits.rs#L119-L121 (In fact Miri goes further, [assuming a 48-bit address space on 64-bit platforms](https://github.com/rust-lang/rust/blob/9db224fc908059986c179fc6ec433944e9cfce50/compiler/rustc_abi/src/lib.rs#L312-L331), but I don't think we can assume that in an optimization.)
2022-12-16	Make enum-match.rs test robust against variable name changes	Collin Baker	-2/+2
	https://reviews.llvm.org/D140192 caused the LLVM variable generated for enum discriminant checks to be named differently (%narrow vs %1). This adjusts the test CHECK directives to match any name.
2022-12-14	Rollup merge of #105578 - erikdesjardins:addrspacecast, r=bjorn3	Matthias Krüger	-1/+23
	Fix transmutes between pointers in different address spaces (e.g. fn ptrs on AVR) Currently, this causes a verifier error (https://godbolt.org/z/YYohed4bj), since it uses `bitcast`, which can't convert between address spaces. Uncovered due to https://github.com/rust-lang/rust/pull/105545#discussion_r1045269309 r? `@bjorn3`
2022-12-11	fix transmutes between pointers in different address spaces	Erik Desjardins	-1/+23

2022-12-11	Auto merge of #102900 - abrachet:master, r=bjorn3	bors	-0/+10
	Don't internalize __llvm_profile_counter_bias Currently, LLVM profiling runtime counter relocation cannot be used by rust during LTO because symbols are being internalized before all symbol information is known. This mode makes LLVM emit a __llvm_profile_counter_bias symbol which is referenced by the profiling initialization, which itself is pulled in by the rust driver here [1]. It is enabled with -Cllvm-args=-runtime-counter-relocation for platforms which are opt-in to this mode like Linux. On these platforms there will be no link error, rather just surprising behavior for a user which request runtime counter relocation. The profiling runtime will not see that symbol go on as if it were never there. On Fuchsia, the profiling runtime must have this symbol which will cause a hard link error. As an aside, I don't have enough context as to why rust's LTO model is how it is. AFAICT, the internalize pass is only safe to run at link time when all symbol information is actually known, this being an example as to why. I think special casing this symbol as a known one that LLVM can emit which should not have it's visbility de-escalated should be fine given how seldom this pattern of defining an undefined symbol to get initilization code pulled in is. From a quick grep, __llvm_profile_runtime is the only symbol that rustc does this for. [1] https://github.com/rust-lang/rust/blob/0265a3e93bf1b89d97cae113ed214954d5c35e22/compiler/rustc_codegen_ssa/src/back/linker.rs#L598
2022-12-10	Auto merge of #105531 - matthiaskrgr:rollup-7y7zbgl, r=matthiaskrgr	bors	-0/+22
	Rollup of 6 pull requests Successful merges: - #104460 (Migrate parts of `rustc_expand` to session diagnostics) - #105192 (Point at LHS on binop type err if relevant) - #105234 (Remove unneeded field from `SwitchTargets`) - #105239 (Avoid heap allocation when truncating thread names) - #105410 (Consider `parent_count` for const param defaults) - #105482 (Fix invalid codegen during debuginfo lowering) Failed merges: - #105411 (Introduce `with_forced_trimmed_paths`) r? `@ghost` `@rustbot` modify labels: rollup
2022-12-10	Rollup merge of #105482 - wesleywiser:fix_debuginfo_ub, r=tmiasko	Matthias Krüger	-0/+22
	Fix invalid codegen during debuginfo lowering In order for LLVM to correctly generate debuginfo for msvc, we sometimes need to spill arguments to the stack and perform some direct & indirect offsets into the value. Previously, this code always performed those actions, even when not required as LLVM would clean it up during optimization. However, when MIR inlining is enabled, this can cause problems as the operations occur prior to the spilled value being initialized. To solve this, we first calculate the necessary offsets using just the type which is side-effect free and does not alter the LLVM IR. Then, if we are in a situation which requires us to generate the LLVM IR (and this situation only occurs for arguments, not local variables) then we perform the same calculation again, this time generating the appropriate LLVM IR as we go. r? `@tmiasko` but feel free to reassign if you want 🙂 Fixes #105386
2022-12-10	Auto merge of #105384 - uweigand:s390x-test-codegen, r=Mark-Simulacrum	bors	-2/+6
	Fix failing codegen tests on s390x Several codegen tests are currently failing due to making assumptions that are not valid for the s390x architecture: - catch-unwind.rs: fails due to inlining differences. Already ignored on another platform for the same reason. Solution: Ignore on s390x. - remap_path_prefix/main.rs: fails due to different alignment requirement for string constants. Solution: Do not test for the alignment requirement. - repr-transparent-aggregates-1.rs: many ABI assumptions. Already ignored on many platforms for the same reason. Solution: Ignore on s390x. - repr-transparent.rs: no vector ABI by default on s390x. Already ignored on another platform for a similar reason. Solution: Ignore on s390x. - uninit-consts.rs: hard-coded little-endian constant. Solution: Match both little- and big-endian versions. Fixes part of https://github.com/rust-lang/rust/issues/105383.
2022-12-10	Rollup merge of #105109 - rcvalle:rust-kcfi, r=bjorn3	Matthias Krüger	-0/+58
	Add LLVM KCFI support to the Rust compiler This PR adds LLVM Kernel Control Flow Integrity (KCFI) support to the Rust compiler. It initially provides forward-edge control flow protection for operating systems kernels for Rust-compiled code only by aggregating function pointers in groups identified by their return and parameter types. (See llvm/llvm-project@cff5bef.) Forward-edge control flow protection for C or C++ and Rust -compiled code "mixed binaries" (i.e., for when C or C++ and Rust -compiled code share the same virtual address space) will be provided in later work as part of this project by identifying C char and integer type uses at the time types are encoded (see Type metadata in the design document in the tracking issue #89653). LLVM KCFI can be enabled with -Zsanitizer=kcfi. Thank you again, `@bjorn3,` `@eddyb,` `@nagisa,` and `@ojeda,` for all the help!
2022-12-08	Don't generate pointer loads to spills unless necessary	Wesley Wiser	-2/+2
	In order for LLVM to correctly generate debuginfo for msvc, we sometimes need to spill arguments to the stack and perform some direct & indirect offsets into the value. Previously, this code always performed those actions, even when not required as LLVM would clean it up during optimization. However, when MIR inlining is enabled, this can cause problems as the operations occur prior to the spilled value being initialized. To solve this, we first calculate the necessary offsets using just the type which is side-effect free and does not alter the LLVM IR. Then, if we are in a situation which requires us to generate the LLVM IR (and this situation only occurs for arguments, not local variables) then we perform the same calculation again, this time generating the appropriate LLVM IR as we go.
2022-12-08	Add test case for ub in generated LLVM IR	Wesley Wiser	-0/+22

2022-12-08	Add LLVM KCFI support to the Rust compiler	Ramon de C Valle	-0/+58
	This commit adds LLVM Kernel Control Flow Integrity (KCFI) support to the Rust compiler. It initially provides forward-edge control flow protection for operating systems kernels for Rust-compiled code only by aggregating function pointers in groups identified by their return and parameter types. (See llvm/llvm-project@cff5bef.) Forward-edge control flow protection for C or C++ and Rust -compiled code "mixed binaries" (i.e., for when C or C++ and Rust -compiled code share the same virtual address space) will be provided in later work as part of this project by identifying C char and integer type uses at the time types are encoded (see Type metadata in the design document in the tracking issue #89653). LLVM KCFI can be enabled with -Zsanitizer=kcfi. Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
2022-12-08	disable mergefunc instead of making fns unique	Erik Desjardins	-8/+4

2022-12-08	Add 0..=isize::MAX range metadata to size loads from vtables	Erik Desjardins	-0/+55

2022-12-07	Don't internalize __llvm_profile_counter_bias	Alex Brachet	-0/+10
	Currently, LLVM profiling runtime counter relocation cannot be used by rust during LTO because symbols are being internalized before all symbol information is known. This mode makes LLVM emit a __llvm_profile_counter_bias symbol which is referenced by the profiling initialization, which itself is pulled in by the rust driver here [1]. It is enabled with -Cllvm-args=-runtime-counter-relocation for platforms which are opt-in to this mode like Linux. On these platforms there will be no link error, rather just surprising behavior for a user which request runtime counter relocation. The profiling runtime will not see that symbol go on as if it were never there. On Fuchsia, the profiling runtime must have this symbol which will cause a hard link error. As an aside, I don't have enough context as to why rust's LTO model is how it is. AFAICT, the internalize pass is only safe to run at link time when all symbol information is actually known, this being an example as to why. I think special casing this symbol as a known one that LLVM can emit which should not have it's visbility de-escalated should be fine given how seldom this pattern of defining an undefined symbol to get initilization code pulled in is. From a quick grep, __llvm_profile_runtime is the only symbol that rustc does this for. [1] https://github.com/rust-lang/rust/blob/0265a3e93bf1b89d97cae113ed214954d5c35e22/compiler/rustc_codegen_ssa/src/back/linker.rs#L598
2022-12-06	Fix failing codegen tests on s390x	Ulrich Weigand	-2/+6
	Several codegen tests are currently failing due to making assumptions that are not valid for the s390x architecture: - catch-unwind.rs: fails due to inlining differences. Already ignored on another platform for the same reason. Solution: Ignore on s390x. - remap_path_prefix/main.rs: fails due to different alignment requirement for string constants. Solution: Do not test for the alignment requirement. - repr-transparent-aggregates-1.rs: many ABI assumptions. Already ignored on many platforms for the same reason. Solution: Ignore on s390x. - repr-transparent.rs: no vector ABI by default on s390x. Already ignored on another platform for a similar reason. Solution: Ignore on s390x. - uninit-consts.rs: hard-coded little-endian constant. Solution: Match both little- and big-endian versions. Fixes part of https://github.com/rust-lang/rust/issues/105383.
2022-12-04	Auto merge of #104535 - mikebenfield:discr-fix, r=pnkfelix	bors	-5/+2
	rustc_codegen_ssa: Fix for codegen_get_discr When doing the optimized implementation of getting the discriminant, the arithmetic needs to be done in the tag type so wrapping behavior works correctly. Fixes #104519
2022-12-03	Disable coverage instrumentation for naked functions	Tomasz Miąsko	-0/+19

2022-12-01	Auto merge of #105003 - flba-eb:only_windows, r=Mark-Simulacrum	bors	-25/+3
	Run Windows-only tests only on Windows This removes the need to maintain an ignore-list of all other OSs. See https://github.com/rust-lang/rust/pull/102305 for a similar change.
2022-12-01	Ignore `gnu` systems (`windows-msvc` only)	Florian Bartels	-1/+2

2022-11-29	test/codegen: test inter-crate linkage with static relocation	David Rheinsberg	-0/+37
	Add a codegen-test that verifies inter-crate linkage with the static relocation model. We expect all symbols that are part of a rust compilation to end up in the same DSO, thus we expect `dso_local` annotations.
2022-11-28	Run Windows-only tests only on Windows	Florian Bartels	-25/+2
	This removes the need to maintain a list of all other OSs which ignore the tests.
2022-11-27	Auto merge of #104818 - scottmcm:refactor-extend-func, r=the8472	bors	-0/+7
	Stop peeling the last iteration of the loop in `Vec::resize_with` `resize_with` uses the `ExtendWith` code that peels the last iteration: https://github.com/rust-lang/rust/blob/341d8b8a2c290b4535e965867e876b095461ff6e/library/alloc/src/vec/mod.rs#L2525-L2529 But that's kinda weird for `ExtendFunc` because it does the same thing on the last iteration anyway: https://github.com/rust-lang/rust/blob/341d8b8a2c290b4535e965867e876b095461ff6e/library/alloc/src/vec/mod.rs#L2494-L2502 So this just has it use the normal `extend`-from-`TrustedLen` code instead. r? `@ghost`
2022-11-26	Auto merge of #103556 - clubby789:specialize-option-partial-eq, r=scottmcm	bors	-0/+34
	Manually implement PartialEq for Option<T> and specialize non-nullable types This PR manually implements `PartialEq` and `StructuralPartialEq` for `Option`, which seems to produce slightly better codegen than the automatically derived implementation. It also allows specializing on the `core::num::NonZero*` and `core::ptr::NonNull` types, taking advantage of the niche optimization by transmuting the `Option<T>` to `T` to be compared directly, which can be done in just two instructions. A comparison of the original, new and specialized code generation is available [here](https://godbolt.org/z/dE4jxdYsa).
2022-11-24	Tune RepeatWith::try_fold and Take::for_each and Vec::extend_trusted	Scott McMurray	-0/+7

2022-11-24	Avoid `GenFuture` shim when compiling async constructs	Arpad Borsos	-3/+5
	Previously, async constructs would be lowered to "normal" generators, with an additional `from_generator` / `GenFuture` shim in between to convert from `Generator` to `Future`. The compiler will now special-case these generators internally so that async constructs will directly implement `Future` without the need to go through the `from_generator` / `GenFuture` shim. The primary motivation for this change was hiding this implementation detail in stack traces and debuginfo, but it can in theory also help the optimizer as there is less abstractions to see through.
2022-11-22	fix tests, update size asserts	The 8472	-5/+5

2022-11-20	Rollup merge of #104435 - scottmcm:iter-repeat-n, r=thomcc	Yuki Okushi	-0/+56
	`VecDeque::resize` should re-use the buffer in the passed-in element Today it always copies it for every appended element, but one of those clones is avoidable. This adds `iter::repeat_n` (https://github.com/rust-lang/rust/issues/104434) as the primitive needed to do this. If this PR is acceptable, I'll also use this in `Vec` rather than its custom `ExtendElement` type & infrastructure that is harder to share between multiple different containers: https://github.com/rust-lang/rust/blob/101e1822c3e54e63996c8aaa014d55716f3937eb/library/alloc/src/vec/mod.rs#L2479-L2492
2022-11-18	Rollup merge of #103456 - scottmcm:fix-unchecked-shifts, r=scottmcm	Manish Goregaokar	-0/+66
	`unchecked_{shl\|shr}` should use `u32` as the RHS The other shift methods, such as https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.checked_shr and https://doc.rust-lang.org/nightly/std/primitive.i16.html#method.wrapping_shl, use `u32` for the shift amount. That's consistent with other things, like `count_ones`, which also always use `u32` for a bit count, regardless of the size of the type. This PR changes `unchecked_shl` and `unchecked_shr` to also use `u32` for the shift amount (rather than Self). cc #85122, the `unchecked_math` tracking issue
2022-11-18	rustc_codegen_ssa: Fix for codegen_get_discr	Michael Benfield	-5/+2
	When doing the optimized implementation of getting the discriminant, the arithmetic needs to be done in the tag type so wrapping behavior works correctly. Fixes #104519
2022-11-16	Ignore the unchecked-shifts codegen test in debug-assertions builds	Scott McMurray	-0/+1

2022-11-15	`VecDeque::resize` should re-use the buffer in the passed-in element	Scott McMurray	-0/+56
	Today it always copies it for every appended element, but one of those clones is avoidable.
2022-11-11	rustc_codegen_ssa: Better code generation for niche discriminants.	Michael Benfield	-0/+112
	In some cases we can avoid arithmetic before checking whether a niche represents an untagged variant. This is relevant to #101872
2022-11-10	Rollup merge of #104077 - nicholasbishop:bishop-uefi-aapcs, r=nagisa	Manish Goregaokar	-1/+1
	Use aapcs for efiapi calling convention on arm On arm, [llvm treats the C calling convention as `aapcs` on soft-float targets and `aapcs-vfp` on hard-float targets](https://github.com/rust-lang/compiler-builtins/issues/116#issuecomment-261057422). UEFI specifies in the arm calling convention that [floating point extensions aren't used](https://uefi.org/specs/UEFI/2.10/02_Overview.html#detailed-calling-convention), so always translate `efiapi` to `aapcs` on arm. https://github.com/rust-lang/rust/issues/65815
2022-11-08	Rollup merge of #103353 - wesleywiser:fix_lld_thinlto_msvc, r=michaelwoerister	Manish Goregaokar	-0/+28
	Fix Access Violation when using lld & ThinLTO on windows-msvc Users report an AV at runtime of the compiled binary when using lld and ThinLTO on windows-msvc. The AV occurs when accessing a static value which is defined in one crate but used in another. Based on the disassembly of the cross-crate use, it appears that the use is not correctly linked with the definition and is instead assigned a garbage pointer value. If we look at the symbol tables for each crates' obj file, we can see what is happening: lib.obj: ``` COFF SYMBOL TABLE ... 00E 00000000 SECT2 notype External \| _ZN10reproducer7memrchr2FN17h612b61ca0e168901E ... ``` bin.obj: ``` COFF SYMBOL TABLE ... 010 00000000 UNDEF notype External \| __imp__ZN10reproducer7memrchr2FN17h612b61ca0e168901E ... ``` The use of the symbol has the "import" style symbol name but the declaration doesn't generate any symbol with the same name. As a result, linking the files generates a warning from lld: > rust-lld: warning: bin.obj: locally defined symbol imported: reproducer::memrchr::FN::h612b61ca0e168901 (defined in lib.obj) [LNK4217] and the symbol reference remains undefined at runtime leading to the AV. To fix this, we just need to detect that we are performing ThinLTO (and thus, static linking) and omit the `dllimport` attribute on the extern item in LLVM IR. Fixes #81408
2022-11-06	Use aapcs for efiapi calling convention on arm	Nicholas Bishop	-1/+1
	On arm, llvm treats the C calling convention as `aapcs` on soft-float targets and `aapcs-vfp` on hard-float targets [1]. UEFI specifies in the arm calling convention that floating point extensions aren't used [2], so always translate `efiapi` to `aapcs` on arm. [1]: https://github.com/rust-lang/compiler-builtins/issues/116#issuecomment-261057422 [2]: https://uefi.org/specs/UEFI/2.10/02_Overview.html#detailed-calling-convention https://github.com/rust-lang/rust/issues/65815
2022-11-05	Bless more tests	Michael Goulet	-11/+5

2022-11-04	LLVM 16: Switch to using MemoryEffects	Tim Neumann	-2/+4

2022-11-03	Fix Access Violation when using lld & ThinLTO on windows-msvc	Wesley Wiser	-1/+1
	Users report an AV at runtime of the compiled binary when using lld and ThinLTO on windows-msvc. The AV occurs when accessing a static value which is defined in one crate but used in another. Based on the disassembly of the cross-crate use, it appears that the use is not correctly linked with the definition and is instead assigned a garbage pointer value. If we look at the symbol tables for each crates' obj file, we can see what is happening: lib.obj: ``` COFF SYMBOL TABLE ... 00E 00000000 SECT2 notype External \| _ZN10reproducer7memrchr2FN17h612b61ca0e168901E ... ``` bin.obj: ``` COFF SYMBOL TABLE ... 010 00000000 UNDEF notype External \| __imp__ZN10reproducer7memrchr2FN17h612b61ca0e168901E ... ``` The use of the symbol has the "import" style symbol name but the declaration doesn't generate any symbol with the same name. As a result, linking the files generates a warning from lld: > rust-lld: warning: bin.obj: locally defined symbol imported: reproducer::memrchr::FN::h612b61ca0e168901 (defined in lib.obj) [LNK4217] and the symbol reference remains undefined at runtime leading to the AV. To fix this, we just need to detect that we are performing ThinLTO (and thus, static linking) and omit the `dllimport` attribute on the extern item in LLVM IR.
2022-11-03	Add test case	Wesley Wiser	-0/+28

2022-10-31	Specialize PartialEq for Option<num::NonZero*> and Option<ptr::NonNull>	clubby789	-0/+34

2022-10-31	Remove bounds check with enum cast	ouz-a	-8/+4

2022-10-31	Use `br` instead of `switch` in more cases.	Nicholas Nethercote	-29/+83
	`codegen_switchint_terminator` already uses `br` instead of `switch` when there is one normal target plus the `otherwise` target. But there's another common case with two normal targets and an `otherwise` target that points to an empty unreachable BB. This comes up a lot when switching on the tags of enums that use niches. The pattern looks like this: ``` bb1: ; preds = %bb6 %3 = load i8, ptr %_2, align 1, !range !9, !noundef !4 %4 = sub i8 %3, 2 %5 = icmp eq i8 %4, 0 %_6 = select i1 %5, i64 0, i64 1 switch i64 %_6, label %bb3 [ i64 0, label %bb4 i64 1, label %bb2 ] bb3: ; preds = %bb1 unreachable ``` This commit adds code to convert the `switch` to a `br`: ``` bb1: ; preds = %bb6 %3 = load i8, ptr %_2, align 1, !range !9, !noundef !4 %4 = sub i8 %3, 2 %5 = icmp eq i8 %4, 0 %_6 = select i1 %5, i64 0, i64 1 %6 = icmp eq i64 %_6, 0 br i1 %6, label %bb4, label %bb2 bb3: ; No predecessors! unreachable ``` This has a surprisingly large effect on compile times, with reductions of 5% on debug builds of some crates. The reduction is all due to LLVM taking less time. Maybe LLVM is just much better at handling `br` than `switch`. The resulting code is still suboptimal. - The `icmp`, `select`, `icmp` sequence is silly, converting an `i1` to an `i64` and back to an `i1`. But with the current code structure it's hard to avoid, and LLVM will easily clean it up, in opt builds at least. - `bb3` is usually now truly dead code (though not always, so it can't be removed universally).
2022-10-30	Auto merge of #103299 - nikic:usub-overflow, r=wesleywiser	bors	-0/+16
	Don't use usub.with.overflow intrinsic The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential. The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR. Fixes #103285.
2022-10-28	Auto merge of #103071 - wesleywiser:fix_inlined_line_numbers, r=davidtwco	bors	-0/+25
	Fix line numbers for MIR inlined code `should_collapse_debuginfo` detects if the specified span is part of a macro expansion however it does this by checking if the span is anything other than a normal (non-expanded) kind, then the span sequence is walked backwards to the root span. This doesn't work when the MIR inliner inlines code as it creates spans with expansion information set to `ExprKind::Inlined` and results in the line number being attributed to the inline callsite rather than the normal line number of the inlined code. Fixes #103068
2022-10-23	`unchecked_{shl\|shr}` should use `u32` as the RHS	Scott McMurray	-0/+65

2022-10-21	Introduce deduced parameter attributes, and use them for deducing `readonly` on	Patrick Walton	-1/+61
	indirect immutable freeze by-value function parameters. Right now, `rustc` only examines function signatures and the platform ABI when determining the LLVM attributes to apply to parameters. This results in missed optimizations, because there are some attributes that can be determined via analysis of the MIR making up the function body. In particular, `readonly` could be applied to most indirectly-passed by-value function arguments (specifically, those that are freeze and are observed not to be mutated), but it currently is not. This patch introduces the machinery that allows `rustc` to determine those attributes. It consists of a query, `deduced_param_attrs`, that, when evaluated, analyzes the MIR of the function to determine supplementary attributes. The results of this query for each function are written into the crate metadata so that the deduced parameter attributes can be applied to cross-crate functions. In this patch, we simply check the parameter for mutations to determine whether the `readonly` attribute should be applied to parameters that are indirect immutable freeze by-value. More attributes could conceivably be deduced in the future: `nocapture` and `noalias` come to mind. Adding `readonly` to indirect function parameters where applicable enables some potential optimizations in LLVM that are discussed in [issue 103103] and [PR 103070] around avoiding stack-to-stack memory copies that appear in functions like `core::fmt::Write::write_fmt` and `core::panicking::assert_failed`. These functions pass a large structure unchanged by value to a subfunction that also doesn't mutate it. Since the structure in this case is passed as an indirect parameter, it's a pointer from LLVM's perspective. As a result, the intermediate copy of the structure that our codegen emits could be optimized away by LLVM's MemCpyOptimizer if it knew that the pointer is `readonly nocapture noalias` in both the caller and callee. We already pass `nocapture noalias`, but we're missing `readonly`, as we can't determine whether a by-value parameter is mutated by examining the signature in Rust. I didn't have much success with having LLVM infer the `readonly` attribute, even with fat LTO; it seems that deducing it at the MIR level is necessary. No large benefits should be expected from this optimization now; LLVM needs some changes (discussed in [PR 103070]) to more aggressively use the `noalias nocapture readonly` combination in its alias analysis. I have some LLVM patches for these optimizations and have had them looked over. With all the patches applied locally, I enabled LLVM to remove all the `memcpy`s from the following code: ```rust fn main() { println!("Hello {}", 3); } ``` which is a significant codegen improvement over the status quo. I expect that if this optimization kicks in in multiple places even for such a simple program, then it will apply to Rust code all over the place. [issue 103103]: https://github.com/rust-lang/rust/issues/103103 [PR 103070]: https://github.com/rust-lang/rust/pull/103070
2022-10-20	Don't use usub.with.overflow intrinsic	Nikita Popov	-0/+16
	The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential. The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR. Fixes #103285.