about summary refs log tree commit diff
path: root/src/rustllvm/ExecutionEngineWrapper.cpp
diff options
context:
space:
mode:
authorClark Gaebel <cg.wowus.cg@gmail.com>2014-02-28 22:23:53 -0500
committerClark Gaebel <cg.wowus.cg@gmail.com>2014-03-12 18:30:11 -0400
commit5bdbd2100946a5204ef82b12eb474e8e4d9ba64e (patch)
treedb1ee223a25bb8a6c9e0bad870868a0d59e6ef7f /src/rustllvm/ExecutionEngineWrapper.cpp
parent3316a0e6b2ad9352bab58e7c046ef3d212411d82 (diff)
downloadrust-5bdbd2100946a5204ef82b12eb474e8e4d9ba64e.tar.gz
rust-5bdbd2100946a5204ef82b12eb474e8e4d9ba64e.zip
Performance-oriented hashtable.
Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.

The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.

Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.

Benchmarks:

comprehensive_old_hashmap         378 ns/iter (+/- 8)
comprehensive_new_hashmap         206 ns/iter (+/- 4)
1.8x faster

old_hashmap_as_queue              238 ns/iter (+/- 8)
new_hashmap_as_queue              119 ns/iter (+/- 2)
2x faster

old_hashmap_insert                172 ns/iter (+/- 8)
new_hashmap_insert                146 ns/iter (+/- 11)
1.17x faster

old_hashmap_find_existing         50 ns/iter (+/- 12)
new_hashmap_find_existing         35 ns/iter (+/- 6)
1.43x faster

old_hashmap_find_notexisting      49 ns/iter (+/- 49)
new_hashmap_find_notexisting      34 ns/iter (+/- 4)
1.44x faster

Memory usage of old hashtable (64-bit assumed):

aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words

Memory usage of new hashtable:

(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 6.5 words

BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.

And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.
Diffstat (limited to 'src/rustllvm/ExecutionEngineWrapper.cpp')
0 files changed, 0 insertions, 0 deletions