Auto merge of #130733 - okaneco:is_ascii, r=scottmcm - rust

diff options

author	bors <bors@rust-lang.org>	2024-12-22 02:44:13 +0000
committer	bors <bors@rust-lang.org>	2024-12-22 02:44:13 +0000
commit	c1132470a6986b12503e8000e322e9164c1f03ac (patch)
tree	f3123bde4473d5fb46434817a8a61d93cc3f59e7 /compiler/rustc_privacy/src/lib.rs
parent	00bf74da6d1e8e49b18c0203e0cbd993fdac5663 (diff)
parent	1b5c02b7578879ebfcd54fdc6a4f86c49d2d9ecd (diff)
download	rust-c1132470a6986b12503e8000e322e9164c1f03ac.tar.gz rust-c1132470a6986b12503e8000e322e9164c1f03ac.zip

Auto merge of #130733 - okaneco:is_ascii, r=scottmcm

Optimize `is_ascii` for `str` and `[u8]` further

Replace the existing optimized function with one that enables auto-vectorization.

This is especially beneficial on x86-64 as `pmovmskb` can be emitted with careful structuring of the code. The instruction can detect non-ASCII characters one vector register width at a time instead of the current `usize` at a time check.

The resulting implementation is completely safe.

`case00_libcore` is the current implementation, `case04_while_loop` is this PR.
```
benchmarks:
    ascii::is_ascii_slice::long::case00_libcore                             22.25/iter  +/- 1.09
    ascii::is_ascii_slice::long::case04_while_loop                           6.78/iter  +/- 0.92
    ascii::is_ascii_slice::medium::case00_libcore                            2.81/iter  +/- 0.39
    ascii::is_ascii_slice::medium::case04_while_loop                         1.56/iter  +/- 0.78
    ascii::is_ascii_slice::short::case00_libcore                             5.55/iter  +/- 0.85
    ascii::is_ascii_slice::short::case04_while_loop                          3.75/iter  +/- 0.22
    ascii::is_ascii_slice::unaligned_both_long::case00_libcore              26.59/iter  +/- 0.66
    ascii::is_ascii_slice::unaligned_both_long::case04_while_loop            5.78/iter  +/- 0.16
    ascii::is_ascii_slice::unaligned_both_medium::case00_libcore             2.97/iter  +/- 0.32
    ascii::is_ascii_slice::unaligned_both_medium::case04_while_loop          2.41/iter  +/- 0.10
    ascii::is_ascii_slice::unaligned_head_long::case00_libcore              23.71/iter  +/- 0.79
    ascii::is_ascii_slice::unaligned_head_long::case04_while_loop            7.83/iter  +/- 1.31
    ascii::is_ascii_slice::unaligned_head_medium::case00_libcore             3.69/iter  +/- 0.54
    ascii::is_ascii_slice::unaligned_head_medium::case04_while_loop          7.05/iter  +/- 0.32
    ascii::is_ascii_slice::unaligned_tail_long::case00_libcore              24.44/iter  +/- 1.41
    ascii::is_ascii_slice::unaligned_tail_long::case04_while_loop            5.12/iter  +/- 0.18
    ascii::is_ascii_slice::unaligned_tail_medium::case00_libcore             3.24/iter  +/- 0.40
    ascii::is_ascii_slice::unaligned_tail_medium::case04_while_loop          2.86/iter  +/- 0.14

```

`unaligned_head_medium` is the main regression in the benchmarks. It is a 32 byte string being sliced `bytes[1..]`.

The first commit can be used to run the benchmarks against the current core implementation.

Previous implementation was done in #74066

---

Two potential drawbacks of this implementation are that it increases instruction count and may regress other platforms/architectures. The benches here may also be too artificial to glean much insight from.
https://rust.godbolt.org/z/G9znGfY36

Diffstat (limited to 'compiler/rustc_privacy/src/lib.rs')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: