auto merge of #11280 : c-a/rust/inline_byteswap, r=brson - rust

diff options

author	bors <bors@rust-lang.org>	2014-01-03 21:21:50 -0800
committer	bors <bors@rust-lang.org>	2014-01-03 21:21:50 -0800
commit	0ff6c12ce94993dae702d597a213eee6b969231a (patch)
tree	615162f0e4e4815acbe85ee4710a8e811a5b3088 /src/rustllvm/RustWrapper.cpp
parent	8bfd2a84cfe83b3a0ff8f3a828303b378a8d94b9 (diff)
parent	a82f32b3ebe712f6e67e2c17cb5920bde83bdb6f (diff)
download	rust-0ff6c12ce94993dae702d597a213eee6b969231a.tar.gz rust-0ff6c12ce94993dae702d597a213eee6b969231a.zip

auto merge of #11280 : c-a/rust/inline_byteswap, r=brson

After writing some benchmarks for ebml::reader::vuint_at() I noticed that LLVM doesn't seem to inline the from_be32 function even though it only does a call to the bswap32 intrinsic in the x86_64 case. Marking the functions with #[inline(always)] fixes that and seems to me a reasonable thing to do. I got the following measurements in my vuint_at() benchmarks:

- Before
test ebml::bench::vuint_at_A_aligned          ... bench:      1075 ns/iter (+/- 58)
test ebml::bench::vuint_at_A_unaligned        ... bench:      1073 ns/iter (+/- 5)
test ebml::bench::vuint_at_D_aligned          ... bench:      1150 ns/iter (+/- 5)
test ebml::bench::vuint_at_D_unaligned        ... bench:      1151 ns/iter (+/- 6)

- Inline from_be32
test ebml::bench::vuint_at_A_aligned          ... bench:       769 ns/iter (+/- 9)
test ebml::bench::vuint_at_A_unaligned        ... bench:       795 ns/iter (+/- 6)
test ebml::bench::vuint_at_D_aligned          ... bench:       758 ns/iter (+/- 8)
test ebml::bench::vuint_at_D_unaligned        ... bench:       759 ns/iter (+/- 8)

- Using vuint_at_slow()
test ebml::bench::vuint_at_A_aligned          ... bench:       646 ns/iter (+/- 7)
test ebml::bench::vuint_at_A_unaligned        ... bench:       645 ns/iter (+/- 3)
test ebml::bench::vuint_at_D_aligned          ... bench:       907 ns/iter (+/- 4)
test ebml::bench::vuint_at_D_unaligned        ... bench:      1085 ns/iter (+/- 16)

As expected inlining from_be32() gave a considerable speedup.
I also tried how the "slow" version fared against the optimized version and noticed that it's
actually a bit faster for small A class integers (using only two bytes) but slower for big D class integers (using four bytes)

Diffstat (limited to 'src/rustllvm/RustWrapper.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: