src/doc/guide-testing.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336

% The Rust Testing Guide

# Quick start

To create test functions, add a `#[test]` attribute like this:

~~~
fn return_two() -> int {
    2
}

#[test]
fn return_two_test() {
    let x = return_two();
    assert!(x == 2);
}
~~~

To run these tests, compile with `rustc --test` and run the resulting
binary:

~~~ {.notrust}
$ rustc --test foo.rs
$ ./foo
running 1 test
test return_two_test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
~~~

`rustc foo.rs` will *not* compile the tests, since `#[test]` implies
`#[cfg(test)]`. The `--test` flag to `rustc` implies `--cfg test`.


# Unit testing in Rust

Rust has built in support for simple unit testing. Functions can be
marked as unit tests using the `test` attribute.

~~~
#[test]
fn return_none_if_empty() {
    // ... test code ...
}
~~~

A test function's signature must have no arguments and no return
value. To run the tests in a crate, it must be compiled with the
`--test` flag: `rustc myprogram.rs --test -o myprogram-tests`. Running
the resulting executable will run all the tests in the crate. A test
is considered successful if its function returns; if the task running
the test fails, through a call to `fail!`, a failed `assert`, or some
other (`assert_eq`, ...) means, then the test fails.

When compiling a crate with the `--test` flag `--cfg test` is also
implied, so that tests can be conditionally compiled.

~~~
#[cfg(test)]
mod tests {
    #[test]
    fn return_none_if_empty() {
      // ... test code ...
    }
}
~~~

Additionally `#[test]` items behave as if they also have the
`#[cfg(test)]` attribute, and will not be compiled when the `--test` flag
is not used.

Tests that should not be run can be annotated with the `ignore`
attribute. The existence of these tests will be noted in the test
runner output, but the test will not be run. Tests can also be ignored
by configuration so, for example, to ignore a test on windows you can
write `#[ignore(cfg(target_os = "win32"))]`.

Tests that are intended to fail can be annotated with the
`should_fail` attribute. The test will be run, and if it causes its
task to fail then the test will be counted as successful; otherwise it
will be counted as a failure. For example:

~~~
#[test]
#[should_fail]
fn test_out_of_bounds_failure() {
    let v: [int] = [];
    v[0];
}
~~~

A test runner built with the `--test` flag supports a limited set of
arguments to control which tests are run: the first free argument
passed to a test runner specifies a filter used to narrow down the set
of tests being run; the `--ignored` flag tells the test runner to run
only tests with the `ignore` attribute.

## Parallelism

By default, tests are run in parallel, which can make interpreting
failure output difficult. In these cases you can set the
`RUST_TEST_TASKS` environment variable to 1 to make the tests run
sequentially.

## Examples

### Typical test run

~~~ {.notrust}
$ mytests

running 30 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest2 ... ignored
... snip ...
running driver::tests::mytest30 ... ok

result: ok. 28 passed; 0 failed; 2 ignored
~~~

### Test run with failures

~~~ {.notrust}
$ mytests

running 30 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest2 ... ignored
... snip ...
running driver::tests::mytest30 ... FAILED

result: FAILED. 27 passed; 1 failed; 2 ignored
~~~

### Running ignored tests

~~~ {.notrust}
$ mytests --ignored

running 2 tests
running driver::tests::mytest2 ... failed
running driver::tests::mytest10 ... ok

result: FAILED. 1 passed; 1 failed; 0 ignored
~~~

### Running a subset of tests

~~~ {.notrust}
$ mytests mytest1

running 11 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest10 ... ignored
... snip ...
running driver::tests::mytest19 ... ok

result: ok. 11 passed; 0 failed; 1 ignored
~~~

# Microbenchmarking

The test runner also understands a simple form of benchmark execution.
Benchmark functions are marked with the `#[bench]` attribute, rather
than `#[test]`, and have a different form and meaning. They are
compiled along with `#[test]` functions when a crate is compiled with
`--test`, but they are not run by default. To run the benchmark
component of your testsuite, pass `--bench` to the compiled test
runner.

The type signature of a benchmark function differs from a unit test:
it takes a mutable reference to type
`extra::test::BenchHarness`. Inside the benchmark function, any
time-variable or "setup" code should execute first, followed by a call
to `iter` on the benchmark harness, passing a closure that contains
the portion of the benchmark you wish to actually measure the
per-iteration speed of.

For benchmarks relating to processing/generating data, one can set the
`bytes` field to the number of bytes consumed/produced in each
iteration; this will used to show the throughput of the benchmark.
This must be the amount used in each iteration, *not* the total
amount.

For example:

~~~
extern crate extra;
use std::vec;
use extra::test::BenchHarness;

#[bench]
fn bench_sum_1024_ints(b: &mut BenchHarness) {
    let v = vec::from_fn(1024, |n| n);
    b.iter(|| {v.iter().fold(0, |old, new| old + *new);} );
}

#[bench]
fn initialise_a_vector(b: &mut BenchHarness) {
    b.iter(|| {vec::from_elem(1024, 0u64);} );
    b.bytes = 1024 * 8;
}
~~~

The benchmark runner will calibrate measurement of the benchmark
function to run the `iter` block "enough" times to get a reliable
measure of the per-iteration speed.

Advice on writing benchmarks:

  - Move setup code outside the `iter` loop; only put the part you
    want to measure inside
  - Make the code do "the same thing" on each iteration; do not
    accumulate or change state
  - Make the outer function idempotent too; the benchmark runner is
    likely to run it many times
  - Make the inner `iter` loop short and fast so benchmark runs are
    fast and the calibrator can adjust the run-length at fine
    resolution
  - Make the code in the `iter` loop do something simple, to assist in
    pinpointing performance improvements (or regressions)

To run benchmarks, pass the `--bench` flag to the compiled
test-runner. Benchmarks are compiled-in but not executed by default.

~~~ {.notrust}
$ rustc mytests.rs -O --test
$ mytests --bench

running 2 tests
test bench_sum_1024_ints ... bench: 709 ns/iter (+/- 82)
test initialise_a_vector ... bench: 424 ns/iter (+/- 99) = 19320 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured
~~~

## Benchmarks and the optimizer

Benchmarks compiled with optimizations activated can be dramatically
changed by the optimizer so that the benchmark is no longer
benchmarking what one expects. For example, the compiler might
recognize that some calculation has no external effects and remove
it entirely.

~~~
extern crate extra;
use extra::test::BenchHarness;

#[bench]
fn bench_xor_1000_ints(bh: &mut BenchHarness) {
    bh.iter(|| {
            range(0, 1000).fold(0, |old, new| old ^ new);
        });
}
~~~

gives the following results

~~~ {.notrust}
running 1 test
test bench_xor_1000_ints ... bench:         0 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
~~~

The benchmarking runner offers two ways to avoid this. Either, the
closure that the `iter` method receives can return an arbitrary value
which forces the optimizer to consider the result used and ensures it
cannot remove the computation entirely. This could be done for the
example above by adjusting the `bh.iter` call to

~~~
bh.iter(|| range(0, 1000).fold(0, |old, new| old ^ new))
~~~

Or, the other option is to call the generic `extra::test::black_box`
function, which is an opaque "black box" to the optimizer and so
forces it to consider any argument as used.

~~~
use extra::test::black_box

bh.iter(|| {
        black_box(range(0, 1000).fold(0, |old, new| old ^ new));
    });
~~~

Neither of these read or modify the value, and are very cheap for
small values. Larger values can be passed indirectly to reduce
overhead (e.g. `black_box(&huge_struct)`).

Performing either of the above changes gives the following
benchmarking results

~~~ {.notrust}
running 1 test
test bench_xor_1000_ints ... bench:       375 ns/iter (+/- 148)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
~~~

However, the optimizer can still modify a testcase in an undesirable
manner even when using either of the above. Benchmarks can be checked
by hand by looking at the output of the compiler using the `--emit=ir`
(for LLVM IR), `--emit=asm` (for assembly) or compiling normally and
using any method for examining object code.

## Saving and ratcheting metrics

When running benchmarks or other tests, the test runner can record
per-test "metrics". Each metric is a scalar `f64` value, plus a noise
value which represents uncertainty in the measurement. By default, all
`#[bench]` benchmarks are recorded as metrics, which can be saved as
JSON in an external file for further reporting.

In addition, the test runner supports _ratcheting_ against a metrics
file. Ratcheting is like saving metrics, except that after each run,
if the output file already exists the results of the current run are
compared against the contents of the existing file, and any regression
_causes the testsuite to fail_. If the comparison passes -- if all
metrics stayed the same (within noise) or improved -- then the metrics
file is overwritten with the new values. In this way, a metrics file
in your workspace can be used to ensure your work does not regress
performance.

Test runners take 3 options that are relevant to metrics:

  - `--save-metrics=<file.json>` will save the metrics from a test run
    to `file.json`
  - `--ratchet-metrics=<file.json>` will ratchet the metrics against
    the `file.json`
  - `--ratchet-noise-percent=N` will override the noise measurements
    in `file.json`, and consider a metric change less than `N%` to be
    noise. This can be helpful if you are testing in a noisy
    environment where the benchmark calibration loop cannot acquire a
    clear enough signal.