Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port the CORE-MATH version of cbrt #475

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Jan 24, 2025

Replace our current implementation with one that is correctly rounded.

Source: https://gitlab.inria.fr/core-math/core-math/-/blob/81d447bb1c46592291bec3476bc24fa2c2688c67/src/binary64/cbrt/cbrt.c

ci: allow-regressions

@tgross35
Copy link
Contributor Author

tgross35 commented Jan 24, 2025

It looks like there is about a 5x slowdown for targets without hardware FMA:

icount::icount_bench_cbrt_group::icount_bench_cbrt logspace:setup_cbrt()
Performance has regressed: Instructions (177374 > 28650) regressed by +519.106% (>+5.00000)
  Baselines:                      softfloat|softfloat
  Instructions:                      177374|28650                (+519.106%) [+6.19106x]
  L1 Hits:                           209370|31728                (+559.890%) [+6.59890x]
  L2 Hits:                                1|1                    (No change)
  RAM Hits:                              44|10                   (+340.000%) [+4.40000x]
  Total read+write:                  209415|31739                (+559.803%) [+6.59803x]
  Estimated Cycles:                  210915|32083                (+557.404%) [+6.57404x]

I think this is fine as long as the hardfloat is reasonable. Marked allow-regressions for the softfloat version.

@tgross35
Copy link
Contributor Author

tgross35 commented Jan 24, 2025

Hm, even with hardware FMA there is 2x slower. Probably still tolerable, future optimization might be possible.

icount::icount_bench_cbrt_group::icount_bench_cbrt logspace:setup_cbrt()
Performance has regressed: Instructions (72584 > 28650) regressed by +153.347% (>+5.00000)
  Baselines:                      hardfloat|hardfloat
  Instructions:                       72584|28650                (+153.347%) [+2.53347x]
  L1 Hits:                            95112|31730                (+199.754%) [+2.99754x]
  L2 Hits:                                3|0                    (+++inf+++) [+++inf+++]
  RAM Hits:                              29|9                    (+222.222%) [+3.22222x]
  Total read+write:                   95144|31739                (+199.770%) [+2.99770x]
  Estimated Cycles:                   96142|32045                (+200.022%) [+3.00022x]

We only round using nearest, but some incoming code has more handling of
rounding modes that would be nice to `match` on. Rather than checking
integer values, add an enum representation.
With the correctly rounded implementation, we can reduce the ULP
requirement for `cbrt` to zero. There is still an override required for
`i586` because of the imprecise FMA.
@tgross35 tgross35 changed the title core-math cbrt Port the CORE-MATH version of cbrt Jan 25, 2025
@tgross35 tgross35 marked this pull request as ready for review January 25, 2025 00:43
Comment on lines +200 to +210
fn fmaf64(x: f64, y: f64, z: f64) -> f64 {
#[cfg(intrinsics_enabled)]
{
return unsafe { core::intrinsics::fmaf64(x, y, z) };
}

#[cfg(not(intrinsics_enabled))]
{
return super::fma(x, y, z);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better as a method on support::Float, similar to abs and copysign? That way the implementation could be shared between this and other (future) users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be preferable. I just did this as a temporary workaround until f16 and f128 also have an implementation, to keep the impl_float macro a bit simpler.

(I am hoping it will be possible to make this generic by putting the magic numbers in a helper trait and recalculating the polynomials for f128. But I'll get this cleaned up to merge before starting on that).

src/math/fenv.rs Outdated
Comment on lines 34 to 37
Nearest = 0,
Downward = 1,
Upward = 2,
ToZero = 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Nearest = 0,
Downward = 1,
Upward = 2,
ToZero = 3,
Nearest = FE_TONEAREST as isize,
Downward = FE_DOWNWARD as isize,
Upward = FE_UPWARD as isize,
ToZero = FE_TOWARDZERO as isize,

To keep the constants specified in one place (could also do it the other way round if const FE_TONEAREST: i32 = Rounding::Nearest as i32 etc. if you prefer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that is a good idea.

I don't really know what we should or shouldn't be doing to handle rounding modes, there is a moderate amount of untested code in this repo to handle them. I opened #480 if you have any suggestions.

Co-authored-by: beetrees <[email protected]>
Copy link
Contributor

@beetrees beetrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +36 to +39
let rm = Rounding::get();

/* rm=0 for rounding to nearest, and other values for directed roundings */
let hx: u64 = x.to_bits();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment should be next to the let rm above, not the let hx below. Also the comment maybe needs modifying now that rm is an enum, not an integer?

Copy link
Contributor Author

@tgross35 tgross35 Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I will update this. Thank you for reviewing!

Before merging I still want to include the .wc tests from core-math. Or maybe download/submodule those similar to musl since each is ~100k entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants