rust - https://github.com/rust-lang/rust

Age	Commit message (Collapse)	Author	Lines
2025-01-26	Put all coretests in a separate crate	bjorn3	-2816/+0

2024-12-27	Fix typos	chloefeal	-1/+1
	Signed-off-by: chloefeal <188809157+chloefeal@users.noreply.github.com>
2024-12-22	Auto merge of #130733 - okaneco:is_ascii, r=scottmcm	bors	-3/+44
	Optimize `is_ascii` for `str` and `[u8]` further Replace the existing optimized function with one that enables auto-vectorization. This is especially beneficial on x86-64 as `pmovmskb` can be emitted with careful structuring of the code. The instruction can detect non-ASCII characters one vector register width at a time instead of the current `usize` at a time check. The resulting implementation is completely safe. `case00_libcore` is the current implementation, `case04_while_loop` is this PR. ``` benchmarks: ascii::is_ascii_slice::long::case00_libcore 22.25/iter +/- 1.09 ascii::is_ascii_slice::long::case04_while_loop 6.78/iter +/- 0.92 ascii::is_ascii_slice::medium::case00_libcore 2.81/iter +/- 0.39 ascii::is_ascii_slice::medium::case04_while_loop 1.56/iter +/- 0.78 ascii::is_ascii_slice::short::case00_libcore 5.55/iter +/- 0.85 ascii::is_ascii_slice::short::case04_while_loop 3.75/iter +/- 0.22 ascii::is_ascii_slice::unaligned_both_long::case00_libcore 26.59/iter +/- 0.66 ascii::is_ascii_slice::unaligned_both_long::case04_while_loop 5.78/iter +/- 0.16 ascii::is_ascii_slice::unaligned_both_medium::case00_libcore 2.97/iter +/- 0.32 ascii::is_ascii_slice::unaligned_both_medium::case04_while_loop 2.41/iter +/- 0.10 ascii::is_ascii_slice::unaligned_head_long::case00_libcore 23.71/iter +/- 0.79 ascii::is_ascii_slice::unaligned_head_long::case04_while_loop 7.83/iter +/- 1.31 ascii::is_ascii_slice::unaligned_head_medium::case00_libcore 3.69/iter +/- 0.54 ascii::is_ascii_slice::unaligned_head_medium::case04_while_loop 7.05/iter +/- 0.32 ascii::is_ascii_slice::unaligned_tail_long::case00_libcore 24.44/iter +/- 1.41 ascii::is_ascii_slice::unaligned_tail_long::case04_while_loop 5.12/iter +/- 0.18 ascii::is_ascii_slice::unaligned_tail_medium::case00_libcore 3.24/iter +/- 0.40 ascii::is_ascii_slice::unaligned_tail_medium::case04_while_loop 2.86/iter +/- 0.14 ``` `unaligned_head_medium` is the main regression in the benchmarks. It is a 32 byte string being sliced `bytes[1..]`. The first commit can be used to run the benchmarks against the current core implementation. Previous implementation was done in #74066 --- Two potential drawbacks of this implementation are that it increases instruction count and may regress other platforms/architectures. The benches here may also be too artificial to glean much insight from. https://rust.godbolt.org/z/G9znGfY36
2024-11-14	Auto merge of #122770 - iximeow:ixi/int-formatting-optimization, ↵	bors	-0/+14
	r=workingjubilee improve codegen of fmt_num to delete unreachable panic it seems LLVM doesn't realize that `curr` is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later `&buf[curr..]` generates a check for out-of-bounds access and panic. this is unreachable in reality as even for `x == T::zero()` we'll produce at least the character `Self::digit(T::zero())`, yielding at least one character output, and `curr` will always be at least one below `buf.len()`. adjust `fmt_int` to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers. in the program i'd noticed this in, you can see the `cmp $0x80,%rdi; ja 7c` here, which branches to a slice index fail helper: <img width="660" alt="before" src="https://github.com/rust-lang/rust/assets/4615790/ac482d54-21f8-494b-9c83-4beadc3ca0ef"> where after this change the function is broadly similar, but smaller, with one fewer registers updated in each pass through the loop in addition the never-taken `cmp/ja` being gone: <img width="646" alt="after" src="https://github.com/rust-lang/rust/assets/4615790/1bee1d76-b674-43ec-9b21-4587364563aa"> this represents a ~2-3% difference in runtime in my [admittedly comically i32-formatting-bound](https://github.com/athre0z/disas-bench/blob/master/bench/yaxpeax/src/main.rs#L58-L67) use case (printing x86 instructions, including i32 displacements and immediates) as measured on a ryzen 9 3950x. the impact on `<impl LowerHex for i8>::fmt` is both more dramatic and less impactful: it continues to have a loop that is evaluated at most twice, though the compiler doesn't know that to unroll it. the generated code there is identical to the impl for `i32`. there, the smaller loop body has less effect on runtime, and removing the never-taken slice bounds check is offset by whatever address recalculation is happening with the `lea/add/neg` at the end of the loop. it behaves about the same before and after. --- i initially measured slightly better outcomes using `unreachable_unchecked()` here instead, but that was hacking on std and rebuilding with `-Z build-std` on an older rustc (nightly 5b377cece, 2023-06-30). it does not yield better outcomes now, so i see no reason to proceed with that approach at all. <details> <summary>initial notes about that, seemingly irrelevant on modern rustc</summary> i went through a few tries at getting llvm to understand the bounds check isn't necessary, but i should mention the _best_ i'd seen here was actually from the existing `fmt_int` with a diff like ```diff if x == zero { // No more digits left to accumulate. break; }; } } + + if curr >= buf.len() { + unsafe { core::hint::unreachable_unchecked(); } + } let buf = &buf[curr..]; ``` posting a random PR to `rust-lang/rust` to do that without a really really compelling reason seemed a bit absurd, so i tried to work that into something that seems more palatable at a glance. but if you're interested, that certainly produced better (x86_64) code through LLVM. in that case with `buf.iter_mut().rev()` as the iterator, `<impl LowerHex for i8>::fmt` actually unrolls into something like ``` put_char(x & 0xf); let mut len = 1; if x > 0xf { put_char((x >> 4) & 0xf); len = 2; } pad_integral(buf[buf.len() - len..]); ``` it's pretty cool! `<impl LowerHex for i32>::fmt` also was slightly better. that all resulted in closer to an 6% difference in my use case. </details> --- i have not looked at formatters other than LowerHex/UpperHex with this change, though i'd be a bit shocked if any were _worse_. (i have absolutely _no_ idea how you'd regression test this, but that might be just my not knowing what the right tool for that would be in rust-lang/rust. i'm of half a mind that this is small and fiddly enough to not be worth landing lest it quietly regress in the future anyway. but i didn't want to discard the idea without at least offering it upstream here)
2024-11-06	Add `is_ascii` function optimized for x86-64 for [u8]	okaneco	-9/+11
	The new `is_ascii` function is optimized to use the `pmovmskb` vector instruction which tests the high bit in a lane. This corresponds to the same check of whether a byte is ASCII so ASCII validity checking can be vectorized. This instruction does not exist on other platforms so it is likely to regress performance and is gated to all(target_arch = "x86_64", target_feature = "sse2"). Add codegen test Remove crate::mem import for functions included in the prelude
2024-11-05	Add new implementation benchmark	okaneco	-3/+42
	Add LONG benchmarks for more comparison between the methods
2024-10-08	Stabilize `isqrt` feature	Chai T. Rex	-1/+0

2024-09-22	Reformat using the new identifier sorting from rustfmt	Michael Goulet	-23/+23

2024-08-28	Improve `isqrt` tests and add benchmarks	Chai T. Rex	-0/+64
	* Choose test inputs more thoroughly and systematically. * Check that `isqrt` and `checked_isqrt` have equivalent results for signed types, either equivalent numerically or equivalent as a panic and a `None`. * Check that `isqrt` has numerically-equivalent results for unsigned types and their `NonZero` counterparts. * Reuse `ilog10` benchmarks, plus benchmarks that use a uniform distribution.
2024-07-29	Reformat `use` declarations.	Nicholas Nethercote	-16/+26
	The previous commit updated `rustfmt.toml` appropriately. This commit is the outcome of running `x fmt --all` with the new formatting options.
2024-05-20	Write `char::DebugEscape` sequences using `write_str`	Arpad Borsos	-2/+2
	Instead of writing each `char` of an escape sequence one by one, this delegates to `Display`, which uses `write_str` internally in order to write the whole escape sequence at once.
2024-05-01	Add benchmarks for `impl Debug for str`	Arpad Borsos	-0/+80
	In order to inform future perf improvements and prevent regressions, lets add some benchmarks that stress `impl Debug for str`.
2024-04-22	Rollup merge of #115913 - FedericoStra:checked_ilog, r=the8472	Guillaume Gomez	-6/+73
	checked_ilog: improve performance Addresses #115874. (This PR replicates the original #115875, which I accidentally closed by deleting my forked repository...)
2024-04-07	disable benches in Miri	Ralf Jung	-0/+2

2024-03-23	try adding a test that LowerHex and friends don't panic, but it doesn't work	iximeow	-0/+14

2024-03-04	Add benches for `net` parsing	okaneco	-0/+80
	Add benches for IpAddr, Ipv4Addr, Ipv6Addr, SocketAddr, SocketAddrV4, and SocketAddrV6 parsing
2024-01-21	Auto merge of #85528 - the8472:iter-markers, r=dtolnay	bors	-0/+13
	Implement iterator specialization traits on more adapters This adds * `TrustedLen` to `Skip` and `StepBy` * `TrustedRandomAccess` to `Skip` * `InPlaceIterable` and `SourceIter` to `Copied` and `Cloned` The first two might improve performance in the compiler itself since `skip` is used in several places. Constellations that would exercise the last point are probably rare since it would require an owning iterator that has references as Items somewhere in its iterator pipeline. Improvements for `Skip`: ``` # old test iter::bench_skip_trusted_random_access ... bench: 8,335 ns/iter (+/- 90) # new test iter::bench_skip_trusted_random_access ... bench: 2,753 ns/iter (+/- 27) ```
2024-01-20	Rollup merge of #113142 - the8472:opt-cstr-display, r=Mark-Simulacrum	Matthias Krüger	-0/+28
	optimize EscapeAscii's Display and CStr's Debug ``` old: ascii::bench_ascii_escape_display_mixed 17.97µs/iter +/- 204.00ns ascii::bench_ascii_escape_display_no_escape 545.00ns/iter +/- 6.00ns new: ascii::bench_ascii_escape_display_mixed 4.99µs/iter +/- 56.00ns ascii::bench_ascii_escape_display_no_escape 91.00ns/iter +/- 1.00ns ```
2024-01-11	Reduced amount of int_pow benches	Nicholas Thompson	-730/+43
	Also simplified the macros
2024-01-11	Edited int_pow micro-benchmarks	Nicholas Thompson	-103/+339

2024-01-11	Added int_pow micro-benchmarks	Nicholas Thompson	-0/+551

2024-01-10	bench trustedrandomaccess specialization in zip	The8472	-0/+13

2024-01-02	Adjust library tests for unused_tuple_struct_fields -> dead_code	Jake Goulding	-2/+2

2023-12-10	remove redundant imports	surechen	-3/+0
	detects redundant imports that can be eliminated. for #117772 : In order to facilitate review and modification, split the checking code and removing redundant imports code into two PR.
2023-11-27	benchmarks for Chars::advance_by	The 8472	-0/+19

2023-09-22	checked_ilog: add benchmarks	Federico Stra	-6/+73

2023-07-23	fix	Deadbeef	-0/+1

2023-06-29	optimize Cstr/EscapeAscii display	The 8472	-0/+28
	old: ascii::bench_ascii_escape_display_mixed 17.97µs/iter +/- 204.00ns ascii::bench_ascii_escape_display_no_escape 545.00ns/iter +/- 6.00ns new: ascii::bench_ascii_escape_display_mixed 4.99µs/iter +/- 56.00ns ascii::bench_ascii_escape_display_no_escape 91.00ns/iter +/- 1.00ns
2023-06-23	Specialize StepBy<Range<{integer}>>	The 8472	-0/+52
	For ranges < usize we determine the number of items StepBy would yield and then store that in the range.end instead of the actual end. This significantly simplifies calculation of the loop induction variable especially in cases where StepBy::step (an usize) could overflow the Range's item type
2023-06-12	add benchmark	The 8472	-0/+9

2023-05-20	optimize next_chunk impls for Filter and FilterMap	The 8472	-2/+44

2023-05-15	Rollup merge of #108291 - chenyukang:yukang/fix-benchmarks, r=workingjubilee	Matthias Krüger	-30/+30
	Fix more benchmark test with black_box Follow up fix for https://github.com/rust-lang/rust/issues/107590
2023-04-25	Add shortcut for Grisu3 algorithm.	mazong1123	-0/+27
	Check requested digit length and the fractional or integral parts of the number. Falls back earlier without trying the Grisu algorithm if the specific condition meets. Fix #110129
2023-03-05	Auto merge of #108157 - scottmcm:tuple-gt-via-partialcmp, r=dtolnay	bors	-0/+23
	Use `partial_cmp` to implement tuple `lt`/`le`/`ge`/`gt` In today's implementation, `(A, B)::gt` contains calls to both `A::eq` and `A::gt`. That's fine for primitives, but for things like `String`s it's kinda weird -- `(String, usize)::gt` has a call to both `bcmp` and `memcmp` (<https://rust.godbolt.org/z/7jbbPMesf>) because when `bcmp` says the `String`s aren't equal, it turns around and calls `memcmp` to find out which one's bigger. This PR changes the implementation to instead implement `(A, …, C, Z)::gt` using `A::partial_cmp`, `…::partial_cmp`, `C::partial_cmp`, and `Z::gt`. (And analogously for `lt`, `le`, and `ge`.) That way expensive comparisons don't need to be repeated. Technically this is an observable change on stable, so I've marked it `needs-fcp` + `T-libs-api` and will r? rust-lang/libs-api I'm hoping that this will be non-controversial, however, since it's very similar to the observable changes that were made to the derives (#81384 #98655) -- like those, this only changes behaviour if a type overrode behaviour in a way inconsistent with the rules for the various traits involved. (The first commit here is #108156, adding the codegen test, which I used to make sure this doesn't regress behaviour for primitives.) Zulip conversation about this change: <https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/.60.3E.60.20on.20Tuples/near/328392927>.
2023-02-21	fix more benchmark test with black_box	yukang	-30/+30

2023-02-17	Add a slightly-contrived tuple comparison benchmark	Scott McMurray	-0/+23

2023-02-14	Shrink size of array benchmarks	kadmin	-5/+5

2023-02-11	Add array::map benchmarks	kadmin	-0/+20

2023-02-03	fix #107590, Fix benchmarks in library/core with black_box	yukang	-32/+44

2023-01-04	Update rand in the stdlib tests, and remove the getrandom feature from it	Thom Chiovoloni	-2/+2

2022-11-09	Rollup merge of #103570 - lukas-code:stabilize-ilog, r=scottmcm	Dylan DPC	-1/+0
	Stabilize integer logarithms Stabilizes feature `int_log`. I've also made the functions const stable, because they don't depend on any unstable const features. `rustc_allow_const_fn_unstable` is just there for `Option::expect`, which could be replaced with a `match` and `panic!`. cc ``@rust-lang/wg-const-eval`` closes https://github.com/rust-lang/rust/issues/70887 (tracking issue) ~~blocked on FCP finishing: https://github.com/rust-lang/rust/issues/70887#issuecomment-1289028216~~ FCP finished: https://github.com/rust-lang/rust/issues/70887#issuecomment-1302121266
2022-11-07	add benchmark for iter::ArrayChunks::fold specialization	The 8472	-2/+22
	This also updates the existing iter::Copied::next_chunk benchmark so that the thing it benches doesn't get masked by the ArrayChunks specialization
2022-10-26	stabilize `int_log`	Lukas Markeffsky	-1/+0

2022-10-17	add a benchmark for slice_iter.copied().array_chunks()	The 8472	-0/+21

2022-08-21	Use internal iteration in `Iterator::{cmp_by, partial_cmp_by, eq_by}`	Tim Vermeulen	-0/+7

2022-08-09	Rename integer log* methods to ilog*	Eric Holk	-3/+3
	This reflects the concensus from the libs team as reported at https://github.com/rust-lang/rust/issues/70887#issuecomment-1209513261 Co-authored-by: Yosh Wuyts <github@yosh.is>
2022-05-31	Add unicode fast path to `is_printable`	Nilstrieb	-0/+11
	Before, it would enter the full expensive check even for normal ascii characters. Now, it skips the check for the ascii characters in `32..127`. This range was checked manually from the current behavior.
2022-05-05	Auto merge of #96626 - thomcc:rand-bump, r=m-ou-se	bors	-2/+10
	Avoid using `rand::thread_rng` in the stdlib benchmarks. This is kind of an anti-pattern because it introduces extra nondeterminism for no real reason. In thread_rng's case this comes both from the random seed and also from the reseeding operations it does, which occasionally does syscalls (which adds additional nondeterminism). The impact of this would be pretty small in most cases, but it's a good practice to avoid (particularly because avoiding it was not hard). Anyway, several of our benchmarks already did the right thing here anyway, so the change was pretty easy and mostly just applying it more universally. That said, the stdlib benchmarks aren't particularly stable (nor is our benchmark framework particularly great), so arguably this doesn't matter that much in practice. ~~Anyway, this also bumps the `rand` dev-dependency to 0.8, since it had fallen somewhat out of date.~~ Nevermind, too much of a headache.
2022-05-02	add benchmark	The 8472	-0/+25

2022-05-02	Avoid use of `rand::thread_rng` in stdlib benchmarks	Thom Chiovoloni	-2/+10