diff options
| author | bors <bors@rust-lang.org> | 2021-05-02 04:54:31 +0000 |
|---|---|---|
| committer | bors <bors@rust-lang.org> | 2021-05-02 04:54:31 +0000 |
| commit | e244e840f2ada19616ad3f18c388de9ea37a2550 (patch) | |
| tree | b846a4e666936b8f32e7dd504568f9348963130f /compiler/rustc_codegen_llvm/src/builder.rs | |
| parent | bd38aa104a3fa79e29bc2c96808b913d39e5e1b7 (diff) | |
| parent | c064b6560b7ce0adeb9bbf5d7dcf12b1acb0c807 (diff) | |
| download | rust-e244e840f2ada19616ad3f18c388de9ea37a2550.tar.gz rust-e244e840f2ada19616ad3f18c388de9ea37a2550.zip | |
Auto merge of #84725 - sebpop:arm64-isb, r=joshtriplett
[Arm64] use isb instruction instead of yield in spin loops
On arm64 we have seen on several databases that ISB (instruction synchronization
barrier) is better to use than yield in a spin loop. The yield instruction is a
nop. The isb instruction puts the processor to sleep for some short time. isb
is a good equivalent to the pause instruction on x86.
Below is an experiment that shows the effects of yield and isb on Arm64 and the
time of a pause instruction on x86 Intel processors. The micro-benchmarks use
https://github.com/google/benchmark.git
```
$ cat a.cc
static void BM_scalar_increment(benchmark::State& state) {
int i = 0;
for (auto _ : state)
benchmark::DoNotOptimize(i++);
}
BENCHMARK(BM_scalar_increment);
static void BM_yield(benchmark::State& state) {
for (auto _ : state)
asm volatile("yield"::);
}
BENCHMARK(BM_yield);
static void BM_isb(benchmark::State& state) {
for (auto _ : state)
asm volatile("isb"::);
}
BENCHMARK(BM_isb);
BENCHMARK_MAIN();
$ g++ -o run a.cc -O2 -lbenchmark -lpthread
$ ./run
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
AWS Graviton2 (Neoverse-N1) processor:
BM_scalar_increment 0.485 ns 0.485 ns 1000000000
BM_yield 0.400 ns 0.400 ns 1000000000
BM_isb 13.2 ns 13.2 ns 52993304
AWS Graviton (A-72) processor:
BM_scalar_increment 0.897 ns 0.874 ns 801558633
BM_yield 0.877 ns 0.875 ns 800002377
BM_isb 13.0 ns 12.7 ns 55169412
Apple Arm64 M1 processor:
BM_scalar_increment 0.315 ns 0.315 ns 1000000000
BM_yield 0.313 ns 0.313 ns 1000000000
BM_isb 9.06 ns 9.06 ns 77259282
```
```
static void BM_pause(benchmark::State& state) {
for (auto _ : state)
asm volatile("pause"::);
}
BENCHMARK(BM_pause);
Intel Skylake processor:
BM_scalar_increment 0.295 ns 0.295 ns 1000000000
BM_pause 41.7 ns 41.7 ns 16780553
```
Tested on Graviton2 aarch64-linux with `./x.py test`.
Diffstat (limited to 'compiler/rustc_codegen_llvm/src/builder.rs')
0 files changed, 0 insertions, 0 deletions
