Skip to content

Reimplement the generic fmod #536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

quaternic
Copy link
Contributor

Full reimplementation for fmod, that should be somewhat cleaner. This was showing a decent perf gain locally, but the major improvement will come later by implementing the reduction helper with something smarter, which does involve some tradeoffs and alternatives to consider.

@quaternic
Copy link
Contributor Author

error: consider choosing a more descriptive name
  --> src/math/generic/fmod.rs:21:9
   |
21 |     let _1 = F::Int::ONE;
   |         ^^

Thanks, clippy, very helpful

@tgross35
Copy link
Contributor

tgross35 commented Apr 16, 2025

 icount::icount_bench_fmod_group::icount_bench_fmod logspace:setup_fmod()
  Baselines:                      hardfloat|hardfloat
  Instructions:                     1027764|1103643              (-6.87532%) [-1.07383x]
  L1 Hits:                          1033134|1105843              (-6.57498%) [-1.07038x]
  L2 Hits:                                3|1                    (+200.000%) [+3.00000x]
  RAM Hits:                              13|11                   (+18.1818%) [+1.18182x]
  Total read+write:                 1033150|1105855              (-6.57455%) [-1.07037x]
  Estimated Cycles:                 1033604|1106233              (-6.56543%) [-1.07027x]
icount::icount_bench_fmodf128_group::icount_bench_fmodf128 logspace:setup_fmodf128()
  Baselines:                      hardfloat|hardfloat
  Instructions:                    30704086|31329937             (-1.99761%) [-1.02038x]
  L1 Hits:                         30720037|31364055             (-2.05336%) [-1.02096x]
  L2 Hits:                                6|11                   (-45.4545%) [-1.83333x]
  RAM Hits:                              24|36                   (-33.3333%) [-1.50000x]
  Total read+write:                30720067|31364102             (-2.05341%) [-1.02096x]
  Estimated Cycles:                30720907|31365370             (-2.05470%) [-1.02098x]
icount::icount_bench_fmodf16_group::icount_bench_fmodf16 logspace:setup_fmodf16()
  Baselines:                      hardfloat|hardfloat
  Instructions:                       78528|84682                (-7.26719%) [-1.07837x]
  L1 Hits:                            92228|101322               (-8.97535%) [-1.09860x]
  L2 Hits:                                3|1                    (+200.000%) [+3.00000x]
  RAM Hits:                              19|17                   (+11.7647%) [+1.11765x]
  Total read+write:                   92250|101340               (-8.96980%) [-1.09854x]
  Estimated Cycles:                   92908|101922               (-8.84402%) [-1.09702x]
icount::icount_bench_fmodf_group::icount_bench_fmodf logspace:setup_fmodf()
  Baselines:                      hardfloat|hardfloat
  Instructions:                      156024|187367               (-16.7281%) [-1.20089x]
  L1 Hits:                           158776|189568               (-16.2432%) [-1.19393x]
  L2 Hits:                                1|1                    (No change)
  RAM Hits:                              11|10                   (+10.0000%) [+1.10000x]
  Total read+write:                  158788|189579               (-16.2418%) [-1.19391x]
  Estimated Cycles:                  159166|189923               (-16.1945%) [-1.19324x]

Awesome! The cleanup is great too. I'll take a deeper look soon.

Cc @sjrd since you tracked this implementation.

Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks great, thanks for working on this. Left some review but it only touches superficial things, and this needs a rebase since unfortunately I gave you a conflict from #535 (edit: and also #533 now, sorry about that).

/// Given the bits of a positive float, clamp the exponent field to [0,1]
fn collapse_exponent<F: Float>(bits: F::Int) -> F::Int {
let sig = bits & F::SIG_MASK;
if sig == bits { sig } else { sig | F::IMPLICIT_BIT }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if sig == bits { sig } else { sig | F::IMPLICIT_BIT }
// Set the implicit bit the input was not subnormal
if sig == bits { sig } else { sig | F::IMPLICIT_BIT }

Comment on lines +52 to 54
if num.is_zero() {
F::from_bits(sx)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if num.is_zero() {
F::from_bits(sx)
} else {
if num.is_zero() {
// Return zero with the sign of x
return F::from_bits(sx)
}

May a well early return to get rid of the else branch block.

Comment on lines -1 to -2
/* SPDX-License-Identifier: MIT */
/* origin: musl src/math/fmod.c. Ported to generic Rust algorithm in 2025, TG. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is your own work then could you add /* SPDX-License-Identifier: MIT OR Apache-2.0 */, or leave MIT if it's still pretty derivative? I'm trying to get a bit better track of origins since licensing is pretty mixed.

Also feel free to add something like /* Authored in 2025 by quaternic */. We need to carry around copyrights from Sun in the 90s, so IMO we can afford to give some credit to anyone who authors an original implementation :)

Comment on lines +55 to +61
let ilog = num.ilog2();
let shift = (ey + ilog).min(F::SIG_BITS) - ilog;
let scale = (ey + ilog).saturating_sub(F::SIG_BITS);

ix |= sx;

F::from_bits(ix)
let normalized = num << shift;
let scaled = normalized + (F::Int::cast_from(scale) << F::SIG_BITS);
F::from_bits(sx | scaled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment about the algorithm here?

if sig == bits { sig } else { sig | F::IMPLICIT_BIT }
}

/// Computes (x << e) % y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Computes (x << e) % y
/// Computes `(x << e) % y` exactly without using bigint math

Or something to indicate why this isn't exactly (x << e) % y

Comment on lines 14 to +15
#![allow(clippy::many_single_char_names)]
#![allow(clippy::just_underscores_and_digits)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting nit

Suggested change
#![allow(clippy::many_single_char_names)]
#![allow(clippy::just_underscores_and_digits)]
#![allow(clippy::just_underscores_and_digits)]
#![allow(clippy::many_single_char_names)]

@tgross35
Copy link
Contributor

This repo has been merged into https://github.com/rust-lang/compiler-builtins and is getting archived, so I am going to close this PR; please do reopen it there! I think this should apply cleanly with git format-patch HEAD -1 then patch < /path/to/0001-whatever.patch within compiler-builtins/libm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants