-
Notifications
You must be signed in to change notification settings - Fork 102
Conversation
It looks like there is about a 5x slowdown for targets without hardware FMA:
I think this is fine as long as the hardfloat is reasonable. Marked allow-regressions for the softfloat version. |
Hm, even with hardware FMA there is 2x slower. Probably still tolerable, future optimization might be possible.
|
8d3422e
to
f453c3e
Compare
fn fmaf64(x: f64, y: f64, z: f64) -> f64 { | ||
#[cfg(intrinsics_enabled)] | ||
{ | ||
return unsafe { core::intrinsics::fmaf64(x, y, z) }; | ||
} | ||
|
||
#[cfg(not(intrinsics_enabled))] | ||
{ | ||
return super::fma(x, y, z); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better as a method on support::Float
, similar to abs
and copysign
? That way the implementation could be shared between this and other (future) users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be preferable. I just did this as a temporary workaround until f16
and f128
also have an implementation, to keep the impl_float
macro a bit simpler.
(I am hoping it will be possible to make this generic by putting the magic numbers in a helper trait and recalculating the polynomials for f128
. But I'll get this cleaned up to merge before starting on that).
src/math/fenv.rs
Outdated
Nearest = 0, | ||
Downward = 1, | ||
Upward = 2, | ||
ToZero = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearest = 0, | |
Downward = 1, | |
Upward = 2, | |
ToZero = 3, | |
Nearest = FE_TONEAREST as isize, | |
Downward = FE_DOWNWARD as isize, | |
Upward = FE_UPWARD as isize, | |
ToZero = FE_TOWARDZERO as isize, |
To keep the constants specified in one place (could also do it the other way round if const FE_TONEAREST: i32 = Rounding::Nearest as i32
etc. if you prefer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that is a good idea.
I don't really know what we should or shouldn't be doing to handle rounding modes, there is a moderate amount of untested code in this repo to handle them. I opened https://github.com/rust-lang/libm/issues/480 if you have any suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
let rm = Rounding::get(); | ||
|
||
/* rm=0 for rounding to nearest, and other values for directed roundings */ | ||
let hx: u64 = x.to_bits(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this comment should be next to the let rm
above, not the let hx
below. Also the comment maybe needs modifying now that rm
is an enum, not an integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I will update this. Thank you for reviewing!
Before merging I still want to include the .wc
tests from core-math. Or maybe download/submodule those similar to musl since each is ~100k entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gotten around to this unfortunately. There is now https://github.com/rust-lang/libm/blob/e66ec88df8325fbe151939c4dc0a9f7c25759fdf/crates/libm-test/src/gen/case_list.rs, it should be reasonably easy to parse .wc
files there from a core-math submodule.
That being said, I our tests are thorough enough that I think I can just merge this now and add that later.
98f76af
to
c0b4f0a
Compare
We only round using nearest, but some incoming code has more handling of rounding modes that would be nice to `match` on. Rather than checking integer values, add an enum representation.
Replace our current implementation with one that is correctly rounded. Source: https://gitlab.inria.fr/core-math/core-math/-/blob/81d447bb1c46592291bec3476bc24fa2c2688c67/src/binary64/cbrt/cbrt.c
With the correctly rounded implementation, we can reduce the ULP requirement for `cbrt` to zero. There is still an override required for `i586` because of the imprecise FMA.
Port the CORE-MATH version of `cbrt`
Replace our current implementation with one that is correctly rounded.
Source: https://gitlab.inria.fr/core-math/core-math/-/blob/81d447bb1c46592291bec3476bc24fa2c2688c67/src/binary64/cbrt/cbrt.c
ci: allow-regressions