-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add assembly version of simple operations on aarch64 #459
base: master
Are you sure you want to change the base?
Conversation
5d0075b
to
51718a1
Compare
@Amanieu would you mind double checking the assembly in Cc @hanna-kruppe, while I was working on the others I also replaced the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fmin
/fmax
you can use the fminnm
/fmaxnm
instructions which map to IEEE minNum/maxNum.
Also you will want to make these impls conditional on the fp
target feature so that these are not used on soft-float targets.
However I'm then questioning how useful these are on hard-float targets: the standard library will invoke the LLVM intrinsic which will lower to the instruction, so the libm function will never be called. If this is only for compiler-builtins then it might be better to keep libm soft-float only.
However I am questioning
src/math/arch/aarch64.rs
Outdated
pub fn rint(mut x: f64) -> f64 { | ||
unsafe { | ||
asm!( | ||
"frintx {x:d}, {x:d}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want either:
frintn
to use round-to-nearest, ties to even.frinti
to use the current rounding mode infpcr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, rint
should follow the rounding mode so I guess frinti
is more correct to the C spec. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually rint
may optionally raise FE_INEXACT so I think frintx
might have worked? Irrelevant for Rust in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only for Rust and should never be picked up by C code, I’d argue for frintn
. Rust does not support other rounding modes nor FP exceptions, and if someone ignores that and e.g. causes UB by configuring a non-default rounding mode then it’s better if they get unexpected results immediately than if it appears to work in simple cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If the symbols from libm-via-compiler_builtins do get picked up by C code compiled with FENV_ACCESS
enabled, then there’s a much bigger problem because none of the Rust code in libm can support that.)
At least some of these are used internally within libm by functions that still need to exist on hard-float targets. For example, floor is used by (Plus the benefits for non-compiler-builtins consumers, who are not the main point of this crate but it’s still nice-to-have.) |
For the operations that are used internally, the ideal end state that we want is for libm to use the float methods from core, which will then be lowered by LLVM to the appropriate instructions. |
Is there any harm in taking the improvement now and revisiting once those methods are actually available in core? |
For aarch64 and arm64ec with Neon, add assembly versions of the following: * `ceil` * `ceilf` * `fabs` * `fabsf` * `floor` * `floorf` * `fma` * `fmaf` * `round` * `roundf` * `sqrt` * `sqrtf` * `trunc` * `truncf` If the `fp16` target feature is available, which implies `neon`, also include the following: * `ceilf16` * `fabsf16` * `floorf16` * `rintf16` * `roundf16` * `sqrtf16` * `truncf16` Additionally, replace `core::arch` versions of the following with handwritten assembly (which avoids issues with `aarch64be`): * `rint` * `rintf` Instructions for `fmax` and `fmin` are also available but seem to provide different results based on whether NaN inputs are signaling or quiet. Our current implementation does not do this, so omit these for now.
My only motivation here is |
For aarch64 and arm64ec with Neon, add assembly versions of the following:
ceil
ceilf
fabs
fabsf
floor
floorf
fma
fmaf
round
roundf
sqrt
sqrtf
trunc
truncf
If the
fp16
target feature is available, which impliesneon
, also include the following:ceilf16
fabsf16
floorf16
rintf16
sqrtf16
truncf16
Additionally, replace
core::arch
versions of the following with handwritten assembly (which avoids issues withaarch64be
):rint
rintf
Instructions for
fmax
andfmin
are also available but seem to provide different results based on whether NaN inputs are signaling or quiet. Our current implementation does not do this, so omit these for now.