auto merge of #12029 : zkamsler/rust/merge-sort-allocations, r=huonw - rust

diff options

author	bors <bors@rust-lang.org>	2014-02-07 14:21:30 -0800
committer	bors <bors@rust-lang.org>	2014-02-07 14:21:30 -0800
commit	1fd2d7786013f98c59f099a2a0413b61a6e82d9d (patch)
tree	467b44d8810035eec87b9d5c65817dad6d3be6b6 /src/rustllvm/PassWrapper.cpp
parent	7d7a060f8d95ee43406560e69a12631e52c617a7 (diff)
parent	cebe5e8e6baecd448f810f5960daab10fa2d089c (diff)
download	rust-1fd2d7786013f98c59f099a2a0413b61a6e82d9d.tar.gz rust-1fd2d7786013f98c59f099a2a0413b61a6e82d9d.zip

auto merge of #12029 : zkamsler/rust/merge-sort-allocations, r=huonw

This pull request:
1) Changes the initial insertion sort to be in-place, and defers allocation of working set until merge is needed.
2) Increases the increases the maximum run length to use insertion sort for from 8 to 32 elements. This increases the size of vectors that will not allocate, and reduces the number of merge passes by two. It seemed to be the sweet spot in the benchmarks that I ran.

Here are the results of some benchmarks. Note that they are sorting u64s, so types that are more expensive to compare or copy may have different behaviors.
Before changes:
```
test vec::bench::sort_random_large      bench:    719753 ns/iter (+/- 130173) = 111 MB/s
test vec::bench::sort_random_medium     bench:      4726 ns/iter (+/- 742) = 169 MB/s
test vec::bench::sort_random_small      bench:       344 ns/iter (+/- 76) = 116 MB/s
test vec::bench::sort_sorted            bench:    437244 ns/iter (+/- 70043) = 182 MB/s
```

Deferred allocation (8 element insertion sort):
```
test vec::bench::sort_random_large      bench:    702630 ns/iter (+/- 88158) = 113 MB/s
test vec::bench::sort_random_medium     bench:      4529 ns/iter (+/- 497) = 176 MB/s
test vec::bench::sort_random_small      bench:       185 ns/iter (+/- 49) = 216 MB/s
test vec::bench::sort_sorted            bench:    425853 ns/iter (+/- 60907) = 187 MB/s
```

Deferred allocation (16 element insertion sort):
```
test vec::bench::sort_random_large      bench:    692783 ns/iter (+/- 165837) = 115 MB/s
test vec::bench::sort_random_medium     bench:      4434 ns/iter (+/- 722) = 180 MB/s
test vec::bench::sort_random_small      bench:       187 ns/iter (+/- 38) = 213 MB/s
test vec::bench::sort_sorted            bench:    393783 ns/iter (+/- 85548) = 203 MB/s
```

Deferred allocation (32 element insertion sort):
```
test vec::bench::sort_random_large      bench:    682556 ns/iter (+/- 131008) = 117 MB/s
test vec::bench::sort_random_medium     bench:      4370 ns/iter (+/- 1369) = 183 MB/s
test vec::bench::sort_random_small      bench:       179 ns/iter (+/- 32) = 223 MB/s
test vec::bench::sort_sorted            bench:    358353 ns/iter (+/- 65423) = 223 MB/s
```

Deferred allocation (64 element insertion sort):
```
test vec::bench::sort_random_large      bench:    712040 ns/iter (+/- 132454) = 112 MB/s
test vec::bench::sort_random_medium     bench:      4425 ns/iter (+/- 784) = 180 MB/s
test vec::bench::sort_random_small      bench:       179 ns/iter (+/- 81) = 223 MB/s
test vec::bench::sort_sorted            bench:    317812 ns/iter (+/- 62675) = 251 MB/s
```

This is the best I could manage with the basic merge sort while keeping the invariant that the original vector must contain each element exactly once when the comparison function is called. If one is not married to a stable sort, an in-place n*log(n) sorting algorithm may have better performance in some cases.

for #12011
cc @huonw

Diffstat (limited to 'src/rustllvm/PassWrapper.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: