| Age | Commit message (Collapse) | Author | Lines |
|
Short-circuiting internal iteration with Iterator::try_fold & try_rfold
These are the core methods in terms of which the other methods (`fold`, `all`, `any`, `find`, `position`, `nth`, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).
Based off the `Try` trait, so works with both `Result` and `Option` (:tada: https://github.com/rust-lang/rust/pull/42526). The `try_fold` rustdoc examples use `Option` and the `try_rfold` ones use `Result`.
AKA continuing in the vein of PRs https://github.com/rust-lang/rust/pull/44682 & https://github.com/rust-lang/rust/pull/44856 for more of `Iterator`.
New bench following the pattern from the latter of those:
```
test iter::bench_take_while_chain_ref_sum ... bench: 1,130,843 ns/iter (+/- 25,110)
test iter::bench_take_while_chain_sum ... bench: 362,530 ns/iter (+/- 391)
```
I also ran the benches without the `fold` & `rfold` overrides to test their new default impls, with basically no change. I left them there, though, to take advantage of existing overrides and because `AlwaysOk` has some sub-optimality due to https://github.com/rust-lang/rust/issues/43278 (which 45225 should fix).
If you're wondering why there are three type parameters, see issue https://github.com/rust-lang/rust/issues/45462
Thanks for @bluss for the [original IRLO thread](https://internals.rust-lang.org/t/pre-rfc-fold-ok-is-composable-internal-iteration/4434) and the rfold PR and to @cuviper for adding so many folds, [encouraging me](https://github.com/rust-lang/rust/pull/45379#issuecomment-339424670) to make this PR, and finding a catastrophic bug in a pre-review.
|
|
Improve SliceExt::binary_search performance
Improve the performance of binary_search by reducing the number of unpredictable conditional branches in the loop. In addition improve the benchmarks to test performance in l1, l2 and l3 caches on sorted arrays with or without dups.
Before:
```
test slice::binary_search_l1 ... bench: 48 ns/iter (+/- 1)
test slice::binary_search_l2 ... bench: 63 ns/iter (+/- 0)
test slice::binary_search_l3 ... bench: 152 ns/iter (+/- 12)
test slice::binary_search_l1_with_dups ... bench: 36 ns/iter (+/- 0)
test slice::binary_search_l2_with_dups ... bench: 64 ns/iter (+/- 1)
test slice::binary_search_l3_with_dups ... bench: 153 ns/iter (+/- 6)
```
After:
```
test slice::binary_search_l1 ... bench: 15 ns/iter (+/- 0)
test slice::binary_search_l2 ... bench: 23 ns/iter (+/- 0)
test slice::binary_search_l3 ... bench: 100 ns/iter (+/- 17)
test slice::binary_search_l1_with_dups ... bench: 15 ns/iter (+/- 0)
test slice::binary_search_l2_with_dups ... bench: 23 ns/iter (+/- 0)
test slice::binary_search_l3_with_dups ... bench: 98 ns/iter (+/- 14)
```
|
|
unpredictable conditional branches in the loop. In addition improve the
benchmarks to test performance in l1, l2 and l3 caches on sorted arrays
with or without dups.
Before:
```
test slice::binary_search_l1 ... bench: 48 ns/iter (+/- 1)
test slice::binary_search_l2 ... bench: 63 ns/iter (+/- 0)
test slice::binary_search_l3 ... bench: 152 ns/iter (+/- 12)
test slice::binary_search_l1_with_dups ... bench: 36 ns/iter (+/- 0)
test slice::binary_search_l2_with_dups ... bench: 64 ns/iter (+/- 1)
test slice::binary_search_l3_with_dups ... bench: 153 ns/iter (+/- 6)
```
After:
```
test slice::binary_search_l1 ... bench: 15 ns/iter (+/- 0)
test slice::binary_search_l2 ... bench: 23 ns/iter (+/- 0)
test slice::binary_search_l3 ... bench: 100 ns/iter (+/- 17)
test slice::binary_search_l1_with_dups ... bench: 15 ns/iter (+/- 0)
test slice::binary_search_l2_with_dups ... bench: 23 ns/iter (+/- 0)
test slice::binary_search_l3_with_dups ... bench: 98 ns/iter (+/- 14)
```
|
|
|
|
|
|
This is the core method in terms of which the other methods (fold, all, any, find, position, nth, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).
|
|
These functions were deprecated and removed in 1.5, but such simple
functionality shouldn't require using unsafe code, and it isn't
cluttering libstd too much.
|
|
remove FIXME(#13101) since `assert_receiver_is_total_eq` stays.
remove FIXME(#19649) now that stability markers render.
remove FIXME(#13642) now the benchmarks were moved.
remove FIXME(#6220) now that floating points can be formatted.
remove FIXME(#18248) and write tests for `Rc<str>` and `Rc<[u8]>`
remove reference to irelevent issues in FIXME(#1697, #2178...)
update FIXME(#5516) to point to getopts issue 7
update FIXME(#7771) to point to RFC 628
update FIXME(#19839) to point to issue 26925
|
|
RangeInclusive { start, end }, this way we supress the warnings about `...` in expressions being deprecated until `..=` is available in the compiler
|
|
Add ..= to the parser
Add ..= to libproc_macro
Add ..= to ICH
Highlight ..= in rustdoc
Update impl Debug for RangeInclusive to ..=
Replace `...` to `..=` in range docs
Make the dotdoteq warning point to the ...
Add warning for ... in expressions
Updated more tests to the ..= syntax
Updated even more tests to the ..= syntax
Updated the inclusive_range entry in unstable book
|
|
|
|
The safe version of a method from ptr, like [T]::copy_from_slice
|
|
Like #43008 (f668999), but _much more aggressive_.
|
|
This PR cuts down on a large number of `#[inline(always)]` and `#[inline]`
annotations in libcore for various core functions. The `#[inline(always)]`
annotation is almost never needed and is detrimental to debug build times as it
forces LLVM to perform inlining when it otherwise wouldn't need to in debug
builds. Additionally `#[inline]` is an unnecessary annoation on almost all
generic functions because the function will already be monomorphized into other
codegen units and otherwise rarely needs the extra "help" from us to tell LLVM
to inline something.
Overall this PR cut the compile time of a [microbenchmark][1] by 30% from 1s to
0.7s.
[1]: https://gist.github.com/alexcrichton/a7d70319a45aa60cf36a6a7bf540dd3a
|
|
|
|
|
|
Exposes the swapping logic from PR 40454 as `pub unsafe fn ptr::swap_nonoverlapping` under feature swap_nonoverlapping
This is most helpful for compound types where LLVM didn't vectorize the loop. Highlight: bench slice::rotate_medium_by727_strings gets 38% faster.
|
|
|
|
Add an in-place rotate method for slices to libcore
A helpful primitive for moving chunks of data around inside a slice.
For example, if you have a range selected and are drag-and-dropping it somewhere else (Example from [Sean Parent's talk](https://youtu.be/qH6sSOr-yk8?t=560)).
(If this should be an RFC instead of a PR, please let me know.)
Edit: changed example
|
|
Make RangeInclusive just a two-field struct
Not being an enum improves ergonomics and consistency, especially since NonEmpty variant wasn't prevented from being empty. It can still be iterable without an extra "done" bit by making the range have !(start <= end), which is even possible without changing the Step trait.
Implements merged https://github.com/rust-lang/rfcs/pull/1980; tracking issue https://github.com/rust-lang/rust/issues/28237.
This is definitely a breaking change to anything consuming `RangeInclusive` directly (not as an Iterator) or constructing it without using the sugar. Is there some change that would make sense before this so compilation failures could be compatibly fixed ahead of time?
r? @aturon (as FCP proposer on the RFC)
|
|
|
|
|
|
It can be revisted later after the mem::swap optimizations land.
|
|
A helpful primitive for moving chunks of data around inside a slice.
In particular, adding elements to the end of a Vec then moving them
somewhere else, as a way to do efficient multiple-insert. (There's
drain for efficient block-remove, but no easy way to block-insert.)
Talk with another example: <https://youtu.be/qH6sSOr-yk8?t=560>
|
|
Not being an enum improves ergonomics, especially since NonEmpty could be Empty. It can still be iterable without an extra "done" bit by making the range have !(start <= end), which is even possible without changing the Step trait.
Implements RFC 1980
|
|
These were found by running tidy on stable versions of rust and finding
features stabilised with the wrong version numbers.
|
|
|
|
Since LLVM doesn't vectorize the loop for us, do unaligned reads
of a larger type and use LLVM's bswap intrinsic to do the
reversing of the actual bytes. cfg!-restricted to x86 and
x86_64, as I assume it wouldn't help on things like ARMv5.
Also makes [u16]::reverse() a more modest 1.5x faster by
loading/storing u32 and swapping the u16s with ROT16.
Thank you ptr::*_unaligned for making this easy :)
|
|
|
|
|
|
Just like the forward case find, implement rfind explicitly
|
|
[T]::rsplit() and rsplit_mut(), #41020
|
|
Add ptr::offset_to
This PR adds a method to calculate the signed distance (in number of elements) between two pointers. The resulting value can then be passed to `offset` to get one pointer from the other. This is similar to pointer subtraction in C/C++.
There are 2 special cases:
- If the distance is not a multiple of the element size then the result is rounded towards zero. (in C/C++ this is UB)
- ZST return `None`, while normal types return `Some(isize)`. This forces the user to handle the ZST case in unsafe code. (C/C++ doesn't have ZSTs)
|
|
|
|
|
|
|
|
Checked slicing for strings
cc https://github.com/rust-lang/rust/issues/39932
|
|
`other` is guaranteed to be less than `2 * len`.
|
|
Select 3 random points instead of just 1.
Also the code now compiles on 16bit architectures.
|
|
This change slightly changes the main iteration loop so that LLVM can
optimize it more efficiently.
Benchmark:
name before ns/iter after ns/iter diff ns/iter diff %
slice::sort_unstable_small_ascending 39 (2051 MB/s) 38 (2105 MB/s) -1 -2.56%
slice::sort_unstable_small_big_random 579 (2210 MB/s) 575 (2226 MB/s) -4 -0.69%
slice::sort_unstable_small_descending 80 (1000 MB/s) 70 (1142 MB/s) -10 -12.50%
slice::sort_unstable_small_random 396 (202 MB/s) 386 -10 -2.53%
|
|
What is this magic‽
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|