Skip to content

Commit 72a5cbb

Browse files
committed
Edit documentation for std::{f32,f64}::mul_add.
Makes it more clear that a performance improvement is not guaranteed when using FMA, even when the target architecture supports it natively.
1 parent b01326a commit 72a5cbb

File tree

2 files changed

+10
-4
lines changed

2 files changed

+10
-4
lines changed

library/std/src/f32.rs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,11 @@ impl f32 {
206206
/// Fused multiply-add. Computes `(self * a) + b` with only one rounding
207207
/// error, yielding a more accurate result than an unfused multiply-add.
208208
///
209-
/// Using `mul_add` can be more performant than an unfused multiply-add if
210-
/// the target architecture has a dedicated `fma` CPU instruction.
209+
/// Using `mul_add` *can* be more performant than an unfused multiply-add if
210+
/// the target architecture has a dedicated `fma` CPU instruction. However,
211+
/// this is not always true, and care must be taken not to overload the
212+
/// architecture's available FMA units when using many FMA instructions
213+
/// in a row, which can cause a stall and performance degradation.
211214
///
212215
/// # Examples
213216
///

library/std/src/f64.rs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,11 @@ impl f64 {
206206
/// Fused multiply-add. Computes `(self * a) + b` with only one rounding
207207
/// error, yielding a more accurate result than an unfused multiply-add.
208208
///
209-
/// Using `mul_add` can be more performant than an unfused multiply-add if
210-
/// the target architecture has a dedicated `fma` CPU instruction.
209+
/// Using `mul_add` *can* be more performant than an unfused multiply-add if
210+
/// the target architecture has a dedicated `fma` CPU instruction. However,
211+
/// this is not always true, and care must be taken not to overload the
212+
/// architecture's available FMA units when using many FMA instructions
213+
/// in a row, which can cause a stall and performance degradation.
211214
///
212215
/// # Examples
213216
///

0 commit comments

Comments
 (0)