diff options
| author | Alex Crichton <alex@alexcrichton.com> | 2015-04-29 15:45:37 -0700 |
|---|---|---|
| committer | Alex Crichton <alex@alexcrichton.com> | 2015-04-29 15:45:37 -0700 |
| commit | 41ee6df26153556f8ff3a9b08f38e28cfb5bc06c (patch) | |
| tree | 7de411b0cd37c893267dfaa52be4a22d528504aa /src/test/codegen | |
| parent | dfb60802c5411881658d28b79b55606e27d9f827 (diff) | |
| parent | 36dccec2f39c7e1da7f056ea421ad5256df3fb0b (diff) | |
| download | rust-41ee6df26153556f8ff3a9b08f38e28cfb5bc06c.tar.gz rust-41ee6df26153556f8ff3a9b08f38e28cfb5bc06c.zip | |
rollup merge of #24846: dotdash/fast_cttz8
Currently, LLVM lowers a cttz8 on x86_64 to these instructions:
```asm
movzbl %dil, %eax
bsfl %eax, %eax
movl $32, %ecx
cmovnel %eax, %ecx
cmpl $32, %ecx
movl $8, %eax
cmovnel %ecx, %eax
```
To improve the codegen, we can zero extend the 8 bit integer, then set
bit 8 and perform a cttz operation on the extended value. That way
there's no conditional operation involved at all.
This was discovered by this benchmark: https://github.com/Kimundi/long_strings_without_repeats
Timings on my box with the current nightly:
```
running 4 tests
test bench_cpp_naive_big ... bench: 5479222 ns/iter (+/- 254222)
test bench_noop_big ... bench: 571405 ns/iter (+/- 111950)
test bench_rust_naive_big ... bench: 7798102 ns/iter (+/- 148841)
test bench_rust_unsafe_big ... bench: 6606488 ns/iter (+/- 67529)
```
Timings with the patch applied:
```
running 4 tests
test bench_cpp_naive_big ... bench: 5470944 ns/iter (+/- 7109)
test bench_noop_big ... bench: 568944 ns/iter (+/- 6895)
test bench_rust_naive_big ... bench: 6795901 ns/iter (+/- 43806)
test bench_rust_unsafe_big ... bench: 5584879 ns/iter (+/- 5291)
```
Diffstat (limited to 'src/test/codegen')
0 files changed, 0 insertions, 0 deletions
