| Age | Commit message (Collapse) | Author | Lines |
|
Add FromStr impl for NonZero types
This is a WIP implementation because I do have some questions regarding the solution.
Somebody should ping the lang team on this I guess.
Please see the annotations on the code for more details.
Closes #58604
|
|
Make ASCII case conversions more than 4× faster
Reformatted output of `./x.py bench src/libcore --test-args ascii` below. The `libcore` benchmark calls `[u8]::make_ascii_lowercase`. `lookup` has code (effectively) identical to that before this PR, and ~~`branchless`~~ `mask_shifted_bool_match_range` after this PR.
~~See [code comments](https://github.com/rust-lang/rust/pull/59283/commits/ce933f77c865a15670855ac5941fe200752b739f#diff-01076f91a26400b2db49663d787c2576R3796) in `u8::to_ascii_uppercase` in `src/libcore/num/mod.rs` for an explanation of the branchless algorithm.~~
**Update:** the algorithm was simplified while keeping the performance. See `branchless` v.s. `mask_shifted_bool_match_range` benchmarks.
Credits to @raphlinus for the idea in https://twitter.com/raphlinus/status/1107654782544736261, which extends this algorithm to “fake SIMD” on `u32` to convert four bytes at a time. The `fake_simd_u32` benchmarks implements this with [`let (before, aligned, after) = bytes.align_to_mut::<u32>()`](https://doc.rust-lang.org/std/primitive.slice.html#method.align_to_mut). Note however that this is buggy when addition carries/overflows into the next byte (which does not happen if the input is known to be ASCII).
This could be fixed (to optimize `[u8]::make_ascii_lowercase` and `[u8]::make_ascii_uppercase` in `src/libcore/slice/mod.rs`) either with some more bitwise trickery that I didn’t quite figure out, or by using “real” SIMD intrinsics for byte-wise addition. I did not pursue this however because the current (incorrect) fake SIMD algorithm is only marginally faster than the one-byte-at-a-time branchless algorithm. This is because LLVM auto-vectorizes the latter, as can be seen on https://rust.godbolt.org/z/anKtbR.
Benchmark results on Linux x64 with Intel i7-7700K: (updated from https://github.com/rust-lang/rust/pull/59283#issuecomment-474146863)
```rust
6830 bytes string:
alloc_only ... bench: 112 ns/iter (+/- 0) = 62410 MB/s
black_box_read_each_byte ... bench: 1,733 ns/iter (+/- 8) = 4033 MB/s
lookup_table ... bench: 1,766 ns/iter (+/- 11) = 3958 MB/s
branch_and_subtract ... bench: 417 ns/iter (+/- 1) = 16762 MB/s
branch_and_mask ... bench: 401 ns/iter (+/- 1) = 17431 MB/s
branchless ... bench: 365 ns/iter (+/- 0) = 19150 MB/s
libcore ... bench: 367 ns/iter (+/- 1) = 19046 MB/s
fake_simd_u32 ... bench: 361 ns/iter (+/- 2) = 19362 MB/s
fake_simd_u64 ... bench: 361 ns/iter (+/- 1) = 19362 MB/s
mask_mult_bool_branchy_lookup_table ... bench: 6,309 ns/iter (+/- 19) = 1107 MB/s
mask_mult_bool_lookup_table ... bench: 4,183 ns/iter (+/- 29) = 1671 MB/s
mask_mult_bool_match_range ... bench: 339 ns/iter (+/- 0) = 20619 MB/s
mask_shifted_bool_match_range ... bench: 339 ns/iter (+/- 1) = 20619 MB/s
32 bytes string:
alloc_only ... bench: 15 ns/iter (+/- 0) = 2133 MB/s
black_box_read_each_byte ... bench: 29 ns/iter (+/- 0) = 1103 MB/s
lookup_table ... bench: 24 ns/iter (+/- 4) = 1333 MB/s
branch_and_subtract ... bench: 16 ns/iter (+/- 0) = 2000 MB/s
branch_and_mask ... bench: 16 ns/iter (+/- 0) = 2000 MB/s
branchless ... bench: 16 ns/iter (+/- 0) = 2000 MB/s
libcore ... bench: 15 ns/iter (+/- 0) = 2133 MB/s
fake_simd_u32 ... bench: 17 ns/iter (+/- 0) = 1882 MB/s
fake_simd_u64 ... bench: 16 ns/iter (+/- 0) = 2000 MB/s
mask_mult_bool_branchy_lookup_table ... bench: 42 ns/iter (+/- 0) = 761 MB/s
mask_mult_bool_lookup_table ... bench: 35 ns/iter (+/- 0) = 914 MB/s
mask_mult_bool_match_range ... bench: 16 ns/iter (+/- 0) = 2000 MB/s
mask_shifted_bool_match_range ... bench: 16 ns/iter (+/- 0) = 2000 MB/s
7 bytes string:
alloc_only ... bench: 14 ns/iter (+/- 0) = 500 MB/s
black_box_read_each_byte ... bench: 22 ns/iter (+/- 0) = 318 MB/s
lookup_table ... bench: 16 ns/iter (+/- 0) = 437 MB/s
branch_and_subtract ... bench: 16 ns/iter (+/- 0) = 437 MB/s
branch_and_mask ... bench: 16 ns/iter (+/- 0) = 437 MB/s
branchless ... bench: 19 ns/iter (+/- 0) = 368 MB/s
libcore ... bench: 20 ns/iter (+/- 0) = 350 MB/s
fake_simd_u32 ... bench: 18 ns/iter (+/- 0) = 388 MB/s
fake_simd_u64 ... bench: 21 ns/iter (+/- 0) = 333 MB/s
mask_mult_bool_branchy_lookup_table ... bench: 20 ns/iter (+/- 0) = 350 MB/s
mask_mult_bool_lookup_table ... bench: 19 ns/iter (+/- 0) = 368 MB/s
mask_mult_bool_match_range ... bench: 19 ns/iter (+/- 0) = 368 MB/s
mask_shifted_bool_match_range ... bench: 19 ns/iter (+/- 0) = 368 MB/s
```
|
|
|
|
|
|
|
|
|
|
A few improvements to comments in user-facing crates
Not too many this time, and all concern comments (almost all doc comments) in user-facing crates (libstd, libcore, liballoc).
r? @steveklabnik
|
|
Fix documentation of from_ne_bytes and from_le_bytes
Copypasta mistake, the documentation of `from_ne_bytes` and `from_le_bytes` used the big-endian variant in the example snippets.
|
|
Expand docs for `TryFrom` and `TryInto`.
The examples are still lacking for now, both for module docs and for methods/impl's. Will be adding those in further pushes.
Should hopefully resolve the doc concern in #33417 when finished?
|
|
|
|
Report the diagnostic on macro expansions, and add a label indicating
why the comment is unused.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
They're not as good as `From` 'cause they don't stringify
the types and generate examples and so on, but it's a start.
|
|
I wondered what the `<<!` operator is although the exclamation mark was
only the end of the sentence.
|
|
Stabilize TryFrom and TryInto with a convert::Infallible empty enum
This is the plan proposed in https://github.com/rust-lang/rust/issues/33417#issuecomment-423073898
|
|
|
|
Make overflowing and wrapping negation const
Remember that the signed and unsigned versions are slightly different here, so there's four functions made const instead of just two.
|
|
|
|
|
|
Cosmetic improvements to doc comments
This has been factored out from https://github.com/rust-lang/rust/pull/58036 to only include changes to documentation comments (throughout the rustc codebase).
r? @steveklabnik
Once you're happy with this, maybe we could get it through with r=1, so it doesn't constantly get invalidated? (I'm not sure this will be an issue, but just in case...) Anyway, thanks for your advice so far!
|
|
|
|
|
|
intrinsics #58030
|
|
attribute #58030
|
|
|
|
|
|
|
|
|
|
Remember that the signed and unsigned versions are slightly different here, so there's four functions made const instead of just two.
|
|
|
|
|
|
Add grammar in docs for {f32,f64}::from_str, mention known bug.
- Original bug about documenting grammar
- https://github.com/rust-lang/rust/issues/32243
- Known bug with parsing
- https://github.com/rust-lang/rust/issues/31407
|
|
|
|
|
|
Multiple people have asked for them, in
https://github.com/rust-lang/rust/issues/49137.
Given that the unsigned ones already exist,
they are very easy to add and not an additional maintenance burden.
|
|
This reverts commit 722b4d695964906807b12379577bce5ee3d23e08, reversing
changes made to 956dba47d33fc8b2bdabcd50e5bfed264b570382.
|
|
|
|
Rollup of 16 pull requests
Successful merges:
- #57351 (Don't actually create a full MIR stack frame when not needed)
- #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).)
- #57412 (Improve the wording)
- #57436 (save-analysis: use a fallback when access levels couldn't be computed)
- #57453 (lldb_batchmode.py: try `import _thread` for Python 3)
- #57454 (Some cleanups for core::fmt)
- #57461 (Change `String` to `&'static str` in `ParseResult::Failure`.)
- #57473 (std: Render large exit codes as hex on Windows)
- #57474 (save-analysis: Get path def from parent in case there's no def for the path itself.)
- #57494 (Speed up item_bodies for large match statements involving regions)
- #57496 (re-do docs for core::cmp)
- #57508 (rustdoc: Allow inlining of reexported crates and crate items)
- #57547 (Use `ptr::eq` where applicable)
- #57557 (resolve: Mark extern crate items as used in more cases)
- #57560 (hygiene: Do not treat `Self` ctor as a local variable)
- #57564 (Update the const fn tracking issue to the new metabug)
Failed merges:
r? @ghost
|
|
Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).
These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.
The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:
```asm
is_infinite:
andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000
setae al
ret
is_finite:
andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000
ucomiss xmm1, xmm0
seta al
ret
```
When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).
The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
https://github.com/rust-lang/rust/commit/6284190ef9918e05cb9147a2a81100ddcb06fea8
7 years ago.
Benchmark (`abs` is the new form, `std` is the old):
```
test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10)
test f32_is_finite_std ... bench: 118 ns/iter (+/- 5)
test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1)
test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6)
test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12)
test f64_is_finite_std ... bench: 128 ns/iter (+/- 25)
test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5)
test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23)
```
```rust
#![feature(test)]
extern crate test;
use std::{f32, f64};
use test::Bencher;
const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
#[bench]
fn f32_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
#[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}
const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
#[bench]
fn f64_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
#[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
|
|
|
|
|
|
These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.
The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:
```asm
is_infinite:
andps xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
ucomiss xmm0, dword ptr [rip + .LCPI2_1] ; 0x7F80_0000
setae al
ret
is_finite:
andps xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
movss xmm1, dword ptr [rip + .LCPI1_1] ; 0x7F80_0000
ucomiss xmm1, xmm0
seta al
ret
```
When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).
The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
https://github.com/rust-lang/rust/commit/6284190ef9918e05cb9147a2a81100ddcb06fea8
7 years ago.
Benchmark (`abs` is the new form, `std` is the old):
```
test f32_is_finite_abs ... bench: 55 ns/iter (+/- 10)
test f32_is_finite_std ... bench: 118 ns/iter (+/- 5)
test f32_is_infinite_abs ... bench: 53 ns/iter (+/- 1)
test f32_is_infinite_std ... bench: 84 ns/iter (+/- 6)
test f64_is_finite_abs ... bench: 52 ns/iter (+/- 12)
test f64_is_finite_std ... bench: 128 ns/iter (+/- 25)
test f64_is_infinite_abs ... bench: 54 ns/iter (+/- 5)
test f64_is_infinite_std ... bench: 93 ns/iter (+/- 23)
```
```rust
#![feature(test)]
extern crate test;
use std::{f32, f64};
use test::Bencher;
const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
#[bench]
fn f32_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
#[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}
const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
#[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
#[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
#[bench]
fn f64_is_finite_std(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
#[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
|
|
|
|
|
|
|