auto merge of #5975 : huonw/rust/rustc-intrinsics-fixed-stack, r=pcwalton - rust

diff options

author	bors <bors@rust-lang.org>	2013-04-20 11:57:50 -0700
committer	bors <bors@rust-lang.org>	2013-04-20 11:57:50 -0700
commit	ae3b8690c1e9c4debf20b4455ad50a79d5859ee9 (patch)
tree	58406cf3c0ff22fc9ada505a40109c6acc779a2d /src/rt/rust_run_program.cpp
parent	2b09267b762398a3c851ecfd55d5d01aee906352 (diff)
parent	c5baeb1db3d84e1ab0d14a8055db3a7d3cba638d (diff)
download	rust-ae3b8690c1e9c4debf20b4455ad50a79d5859ee9.tar.gz rust-ae3b8690c1e9c4debf20b4455ad50a79d5859ee9.zip

auto merge of #5975 : huonw/rust/rustc-intrinsics-fixed-stack, r=pcwalton

This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully.

@pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in `base.rs` in `trans_closure`, but without adding the `set_no_inline` (reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect.

I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved.

Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million):

```
$ ./intrinsics-perf.sh f32
func   new   old   speedup
sin    0.80  2.75  3.44
cos    0.80  2.76  3.45
sqrt   0.56  2.73  4.88
ln     1.01  2.94  2.91
log10  0.97  2.90  2.99
log2   1.01  2.95  2.92
exp    0.90  2.85  3.17
exp2   0.92  2.87  3.12
pow    6.95  8.57  1.23

   geometric mean: 2.97

$ ./intrinsics-perf.sh f64
func   new   old   speedup
sin    12.08  14.06  1.16
cos    12.04  13.67  1.14
sqrt   0.49  2.73  5.57
ln     4.11  5.59  1.36
log10  5.09  6.54  1.28
log2   2.78  5.10  1.83
exp    2.00  3.97  1.99
exp2   1.71  3.71  2.17
pow    5.90  7.51  1.27

   geometric mean: 1.72
```

So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup.

(fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)

Diffstat (limited to 'src/rt/rust_run_program.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: