-
Notifications
You must be signed in to change notification settings - Fork 103
Add assembly version of simple operations on aarch64 #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5d0075b
to
51718a1
Compare
@Amanieu would you mind double checking the assembly in Cc @hanna-kruppe, while I was working on the others I also replaced the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fmin
/fmax
you can use the fminnm
/fmaxnm
instructions which map to IEEE minNum/maxNum.
Also you will want to make these impls conditional on the fp
target feature so that these are not used on soft-float targets.
However I'm then questioning how useful these are on hard-float targets: the standard library will invoke the LLVM intrinsic which will lower to the instruction, so the libm function will never be called. If this is only for compiler-builtins then it might be better to keep libm soft-float only.
However I am questioning
src/math/arch/aarch64.rs
Outdated
pub fn rint(mut x: f64) -> f64 { | ||
unsafe { | ||
asm!( | ||
"frintx {x:d}, {x:d}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want either:
frintn
to use round-to-nearest, ties to even.frinti
to use the current rounding mode infpcr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, rint
should follow the rounding mode so I guess frinti
is more correct to the C spec. Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually rint
may optionally raise FE_INEXACT so I think frintx
might have worked? Irrelevant for Rust in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only for Rust and should never be picked up by C code, I’d argue for frintn
. Rust does not support other rounding modes nor FP exceptions, and if someone ignores that and e.g. causes UB by configuring a non-default rounding mode then it’s better if they get unexpected results immediately than if it appears to work in simple cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If the symbols from libm-via-compiler_builtins do get picked up by C code compiled with FENV_ACCESS
enabled, then there’s a much bigger problem because none of the Rust code in libm can support that.)
At least some of these are used internally within libm by functions that still need to exist on hard-float targets. For example, floor is used by (Plus the benefits for non-compiler-builtins consumers, who are not the main point of this crate but it’s still nice-to-have.) |
For the operations that are used internally, the ideal end state that we want is for libm to use the float methods from core, which will then be lowered by LLVM to the appropriate instructions. |
Is there any harm in taking the improvement now and revisiting once those methods are actually available in core? |
58fec5c
to
8703127
Compare
My only motivation here is |
f851cb5
to
1a13caa
Compare
I dropped most of this change but kept:
|
Replace `core::arch` versions of the following with handwritten assembly, which avoids recursion issues (cg_gcc using `rint` as a fallback) as well as problems with `aarch64be`. * `rint` * `rintf` Additionally, add assembly versions of the following: * `fma` * `fmaf` * `sqrt` * `sqrtf` If the `fp16` target feature is available, which implies `neon`, also include the following: * `rintf16` * `sqrtf16` `sqrt` is added to match the implementation for `x86`. `fma` is included since it is used by many other routines. There are a handful of other operations that have assembly implementations. They are omitted here because we should have basic float math routines available in `core` in the near future, which will allow us to defer to LLVM for assembly lowering rather than implementing these ourselves.
Includes [1] and [2], which should resolve problems cg_gcc has using scalar math operations as a fallback for vector operations. [1]: rust-lang/libm#459 [2]: rust-lang/libm#534
Includes [1] and [2], which should resolve problems cg_gcc has using scalar math operations as a fallback for vector operations. [1]: rust-lang/libm#459 [2]: rust-lang/libm#534
Replace
core::arch
versions of the following with handwrittenassembly, which avoids recursion issues (cg_gcc using
rint
as afallback) as well as problems with
aarch64be
.rint
rintf
Additionally, add assembly versions of the following:
fma
fmaf
sqrt
sqrtf
If the
fp16
target feature is available, which impliesneon
, alsoinclude the following:
rintf16
sqrtf16
sqrt
is added to match the implementation forx86
.fma
is includedsince it is used by many other routines.
There are a handful of other operations that have assembly
implementations. They are omitted here because we should have basic
float math routines available in
core
in the near future, which willallow us to defer to LLVM for assembly lowering rather than implementing
these ourselves.