Edit documentation for `std::{f32,f64}::mul_add`.

Makes it more clear that a performance improvement is not guaranteed when using FMA, even when the target architecture supports it natively.
author: Gray Olson <gray@grayolson.com> 2020-09-21 13:07:48 -0700
committer: Gray Olson <gray@grayolson.com> 2020-09-21 13:07:48 -0700
commit: 72a5cbbe81da15e65490b24182907afcbf208aa3 (patch)
tree: f8a3d713348f047cc838212f949c2277e1262790 /library/std/src
parent: b01326ab033e41986d4a5c8b96ce4f40f3b38e30 (diff)
download: rust-72a5cbbe81da15e65490b24182907afcbf208aa3.tar.gz
rust-72a5cbbe81da15e65490b24182907afcbf208aa3.zip
2 files changed, 10 insertions, 4 deletions
diff --git a/library/std/src/f32.rs b/library/std/src/f32.rs
index 59c2da5273b..c97dac69634 100644
--- a/library/std/src/f32.rs
+++ b/library/std/src/f32.rs
@@ -206,8 +206,11 @@ impl f32 {
     /// Fused multiply-add. Computes `(self * a) + b` with only one rounding
     /// error, yielding a more accurate result than an unfused multiply-add.
     ///
-    /// Using `mul_add` can be more performant than an unfused multiply-add if
-    /// the target architecture has a dedicated `fma` CPU instruction.
+    /// Using `mul_add` *can* be more performant than an unfused multiply-add if
+    /// the target architecture has a dedicated `fma` CPU instruction. However,
+    /// this is not always true, and care must be taken not to overload the
+    /// architecture's available FMA units when using many FMA instructions
+    /// in a row, which can cause a stall and performance degradation.
     ///
     /// # Examples
     ///
diff --git a/library/std/src/f64.rs b/library/std/src/f64.rs
index bd094bdb55d..1ef34409437 100644
--- a/library/std/src/f64.rs
+++ b/library/std/src/f64.rs
@@ -206,8 +206,11 @@ impl f64 {
     /// Fused multiply-add. Computes `(self * a) + b` with only one rounding
     /// error, yielding a more accurate result than an unfused multiply-add.
     ///
-    /// Using `mul_add` can be more performant than an unfused multiply-add if
-    /// the target architecture has a dedicated `fma` CPU instruction.
+    /// Using `mul_add` *can* be more performant than an unfused multiply-add if
+    /// the target architecture has a dedicated `fma` CPU instruction. However,
+    /// this is not always true, and care must be taken not to overload the
+    /// architecture's available FMA units when using many FMA instructions
+    /// in a row, which can cause a stall and performance degradation.
     ///
     /// # Examples
     ///
author	Gray Olson <gray@grayolson.com>	2020-09-21 13:07:48 -0700
committer	Gray Olson <gray@grayolson.com>	2020-09-21 13:07:48 -0700
commit	72a5cbbe81da15e65490b24182907afcbf208aa3 (patch)
tree	f8a3d713348f047cc838212f949c2277e1262790 /library/std/src
parent	b01326ab033e41986d4a5c8b96ce4f40f3b38e30 (diff)
download	rust-72a5cbbe81da15e65490b24182907afcbf208aa3.tar.gz rust-72a5cbbe81da15e65490b24182907afcbf208aa3.zip