about summary refs log tree commit diff
path: root/src/libstd
diff options
context:
space:
mode:
authorbors <bors@rust-lang.org>2017-05-10 08:54:50 +0000
committerbors <bors@rust-lang.org>2017-05-10 08:54:50 +0000
commit2b97174ada7fb1854269558ed2cf3b089e58beee (patch)
tree880576d0244be04f587f845d2cb96936d213f3ff /src/libstd
parent58b33ad70cdd11f9ce7b5874c6effab9627e51aa (diff)
parentda91361d2a8ea86a42cbe2a23a7ff816cc5500af (diff)
downloadrust-2b97174ada7fb1854269558ed2cf3b089e58beee.tar.gz
rust-2b97174ada7fb1854269558ed2cf3b089e58beee.zip
Auto merge of #41764 - scottmcm:faster-reverse, r=brson
Make [u8]::reverse() 5x faster

Since LLVM doesn't vectorize the loop for us, do unaligned reads of a larger type and use LLVM's bswap intrinsic to do the reversing of the actual bytes.  cfg!-restricted to x86 and x86_64, as I assume it wouldn't help on things like ARMv5.

Also makes [u16]::reverse() a more modest 1.5x faster by loading/storing u32 and swapping the u16s with ROT16.

Thank you ptr::*_unaligned for making this easy :)

Benchmark results (from my i5-2500K):
```text
# Before
test slice::reverse_u8      ... bench:  273,836 ns/iter (+/- 15,592) =  3829 MB/s
test slice::reverse_u16     ... bench:  139,793 ns/iter (+/- 17,748) =  7500 MB/s
test slice::reverse_u32     ... bench:   74,997 ns/iter  (+/- 5,130) = 13981 MB/s
test slice::reverse_u64     ... bench:   47,452 ns/iter  (+/- 2,213) = 22097 MB/s

# After
test slice::reverse_u8      ... bench:   52,170 ns/iter (+/- 3,962) = 20099 MB/s
test slice::reverse_u16     ... bench:   93,330 ns/iter (+/- 4,412) = 11235 MB/s
test slice::reverse_u32     ... bench:   74,731 ns/iter (+/- 1,425) = 14031 MB/s
test slice::reverse_u64     ... bench:   47,556 ns/iter (+/- 3,025) = 22049 MB/s
```

If you're curious about the assembly, instead of doing this
```
movzx	eax, byte ptr [rdi]
movzx	ecx, byte ptr [rsi]
mov	byte ptr [rdi], cl
mov	byte ptr [rsi], al
```
it does this
```
mov	rax, qword ptr [rdx]
mov	rbx, qword ptr [r11 + rcx - 8]
bswap	rbx
mov	qword ptr [rdx], rbx
bswap	rax
mov	qword ptr [r11 + rcx - 8], rax
```
Diffstat (limited to 'src/libstd')
0 files changed, 0 insertions, 0 deletions