diff options
| author | Gray Olson <gray@grayolson.com> | 2020-09-21 13:07:48 -0700 |
|---|---|---|
| committer | Gray Olson <gray@grayolson.com> | 2020-09-21 13:07:48 -0700 |
| commit | 72a5cbbe81da15e65490b24182907afcbf208aa3 (patch) | |
| tree | f8a3d713348f047cc838212f949c2277e1262790 /library/std/src | |
| parent | b01326ab033e41986d4a5c8b96ce4f40f3b38e30 (diff) | |
| download | rust-72a5cbbe81da15e65490b24182907afcbf208aa3.tar.gz rust-72a5cbbe81da15e65490b24182907afcbf208aa3.zip | |
Edit documentation for `std::{f32,f64}::mul_add`.
Makes it more clear that a performance improvement is not guaranteed when using FMA, even when the target architecture supports it natively.
Diffstat (limited to 'library/std/src')
| -rw-r--r-- | library/std/src/f32.rs | 7 | ||||
| -rw-r--r-- | library/std/src/f64.rs | 7 |
2 files changed, 10 insertions, 4 deletions
diff --git a/library/std/src/f32.rs b/library/std/src/f32.rs index 59c2da5273b..c97dac69634 100644 --- a/library/std/src/f32.rs +++ b/library/std/src/f32.rs @@ -206,8 +206,11 @@ impl f32 { /// Fused multiply-add. Computes `(self * a) + b` with only one rounding /// error, yielding a more accurate result than an unfused multiply-add. /// - /// Using `mul_add` can be more performant than an unfused multiply-add if - /// the target architecture has a dedicated `fma` CPU instruction. + /// Using `mul_add` *can* be more performant than an unfused multiply-add if + /// the target architecture has a dedicated `fma` CPU instruction. However, + /// this is not always true, and care must be taken not to overload the + /// architecture's available FMA units when using many FMA instructions + /// in a row, which can cause a stall and performance degradation. /// /// # Examples /// diff --git a/library/std/src/f64.rs b/library/std/src/f64.rs index bd094bdb55d..1ef34409437 100644 --- a/library/std/src/f64.rs +++ b/library/std/src/f64.rs @@ -206,8 +206,11 @@ impl f64 { /// Fused multiply-add. Computes `(self * a) + b` with only one rounding /// error, yielding a more accurate result than an unfused multiply-add. /// - /// Using `mul_add` can be more performant than an unfused multiply-add if - /// the target architecture has a dedicated `fma` CPU instruction. + /// Using `mul_add` *can* be more performant than an unfused multiply-add if + /// the target architecture has a dedicated `fma` CPU instruction. However, + /// this is not always true, and care must be taken not to overload the + /// architecture's available FMA units when using many FMA instructions + /// in a row, which can cause a stall and performance degradation. /// /// # Examples /// |
