Performance-oriented hashtable. - rust - https://github.com/rust-lang/rust

diff options

author	Clark Gaebel <cg.wowus.cg@gmail.com>	2014-02-28 22:23:53 -0500
committer	Clark Gaebel <cg.wowus.cg@gmail.com>	2014-03-12 18:30:11 -0400
commit	5bdbd2100946a5204ef82b12eb474e8e4d9ba64e (patch)
tree	db1ee223a25bb8a6c9e0bad870868a0d59e6ef7f /src/rustllvm/ExecutionEngineWrapper.cpp
parent	3316a0e6b2ad9352bab58e7c046ef3d212411d82 (diff)
download	rust-5bdbd2100946a5204ef82b12eb474e8e4d9ba64e.tar.gz rust-5bdbd2100946a5204ef82b12eb474e8e4d9ba64e.zip

Performance-oriented hashtable.

Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.

The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.

Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.

Benchmarks:

comprehensive_old_hashmap         378 ns/iter (+/- 8)
comprehensive_new_hashmap         206 ns/iter (+/- 4)
1.8x faster

old_hashmap_as_queue              238 ns/iter (+/- 8)
new_hashmap_as_queue              119 ns/iter (+/- 2)
2x faster

old_hashmap_insert                172 ns/iter (+/- 8)
new_hashmap_insert                146 ns/iter (+/- 11)
1.17x faster

old_hashmap_find_existing         50 ns/iter (+/- 12)
new_hashmap_find_existing         35 ns/iter (+/- 6)
1.43x faster

old_hashmap_find_notexisting      49 ns/iter (+/- 49)
new_hashmap_find_notexisting      34 ns/iter (+/- 4)
1.44x faster

Memory usage of old hashtable (64-bit assumed):

aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words

Memory usage of new hashtable:

(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 6.5 words

BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.

And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.

Diffstat (limited to 'src/rustllvm/ExecutionEngineWrapper.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: