Skip to content

Add assembly version of simple operations on aarch64 #459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 9, 2025

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Jan 23, 2025

Replace core::arch versions of the following with handwritten
assembly, which avoids recursion issues (cg_gcc using rint as a
fallback) as well as problems with aarch64be.

  • rint
  • rintf

Additionally, add assembly versions of the following:

  • fma
  • fmaf
  • sqrt
  • sqrtf

If the fp16 target feature is available, which implies neon, also
include the following:

  • rintf16
  • sqrtf16

sqrt is added to match the implementation for x86. fma is included
since it is used by many other routines.

There are a handful of other operations that have assembly
implementations. They are omitted here because we should have basic
float math routines available in core in the near future, which will
allow us to defer to LLVM for assembly lowering rather than implementing
these ourselves.

@tgross35 tgross35 force-pushed the aarch64-asm branch 4 times, most recently from 5d0075b to 51718a1 Compare January 23, 2025 02:12
@tgross35
Copy link
Contributor Author

@Amanieu would you mind double checking the assembly in src/math/arch/aarch64.rs? I am unsure whether preserves_flags should be set, I believe some of these operations may set flags based on the exception control register.

Cc @hanna-kruppe, while I was working on the others I also replaced the rint vector implementation.

Copy link
Member

@Amanieu Amanieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fmin/fmax you can use the fminnm/fmaxnm instructions which map to IEEE minNum/maxNum.

Also you will want to make these impls conditional on the fp target feature so that these are not used on soft-float targets.

However I'm then questioning how useful these are on hard-float targets: the standard library will invoke the LLVM intrinsic which will lower to the instruction, so the libm function will never be called. If this is only for compiler-builtins then it might be better to keep libm soft-float only.

However I am questioning

pub fn rint(mut x: f64) -> f64 {
unsafe {
asm!(
"frintx {x:d}, {x:d}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want either:

  • frintn to use round-to-nearest, ties to even.
  • frinti to use the current rounding mode in fpcr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, rint should follow the rounding mode so I guess frinti is more correct to the C spec. Updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually rint may optionally raise FE_INEXACT so I think frintx might have worked? Irrelevant for Rust in any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only for Rust and should never be picked up by C code, I’d argue for frintn. Rust does not support other rounding modes nor FP exceptions, and if someone ignores that and e.g. causes UB by configuring a non-default rounding mode then it’s better if they get unexpected results immediately than if it appears to work in simple cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If the symbols from libm-via-compiler_builtins do get picked up by C code compiled with FENV_ACCESS enabled, then there’s a much bigger problem because none of the Rust code in libm can support that.)

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jan 23, 2025

However I'm then questioning how useful these are on hard-float targets: the standard library will invoke the LLVM intrinsic which will lower to the instruction, so the libm function will never be called. If this is only for compiler-builtins then it might be better to keep libm soft-float only.

At least some of these are used internally within libm by functions that still need to exist on hard-float targets. For example, floor is used by rem_pio2_large which is needed by many trigonometric functions.

(Plus the benefits for non-compiler-builtins consumers, who are not the main point of this crate but it’s still nice-to-have.)

@Amanieu
Copy link
Member

Amanieu commented Jan 23, 2025

For the operations that are used internally, the ideal end state that we want is for libm to use the float methods from core, which will then be lowered by LLVM to the appropriate instructions.

@hanna-kruppe
Copy link
Contributor

Is there any harm in taking the improvement now and revisiting once those methods are actually available in core?

@tgross35 tgross35 force-pushed the aarch64-asm branch 2 times, most recently from 58fec5c to 8703127 Compare January 24, 2025 13:30
@tgross35
Copy link
Contributor Author

My only motivation here is fma - some of the incoming CORE-math routines rely on it, I wanted to have a more accurate icount comparison without soft fma before mul_add is available in core. Nothing else is important, I just included the other simple ops since they are reasonably trivial.

@tgross35 tgross35 force-pushed the aarch64-asm branch 2 times, most recently from f851cb5 to 1a13caa Compare April 9, 2025 02:29
@tgross35
Copy link
Contributor Author

tgross35 commented Apr 9, 2025

I dropped most of this change but kept:

  • rint because the SIMD calls are preexisting, this is causing issues with cg_gcc
  • sqrt and fma because they are used for a lot of other routines. This is mostly for direct users of libm until math in core is stable.

Replace `core::arch` versions of the following with handwritten
assembly, which avoids recursion issues (cg_gcc using `rint` as a
fallback) as well as problems with `aarch64be`.

* `rint`
* `rintf`

Additionally, add assembly versions of the following:

* `fma`
* `fmaf`
* `sqrt`
* `sqrtf`

If the `fp16` target feature is available, which implies `neon`, also
include the following:

* `rintf16`
* `sqrtf16`

`sqrt` is added to match the implementation for `x86`. `fma` is included
since it is used by many other routines.

There are a handful of other operations that have assembly
implementations. They are omitted here because we should have basic
float math routines available in `core` in the near future, which will
allow us to defer to LLVM for assembly lowering rather than implementing
these ourselves.
@tgross35 tgross35 merged commit 96d1400 into rust-lang:master Apr 9, 2025
35 checks passed
@tgross35 tgross35 deleted the aarch64-asm branch April 9, 2025 05:46
tgross35 added a commit to tgross35/compiler-builtins that referenced this pull request Apr 9, 2025
Includes [1] and [2], which should resolve problems cg_gcc has using
scalar math operations as a fallback for vector operations.

[1]: rust-lang/libm#459
[2]: rust-lang/libm#534
tgross35 added a commit to rust-lang/compiler-builtins that referenced this pull request Apr 9, 2025
Includes [1] and [2], which should resolve problems cg_gcc has using
scalar math operations as a fallback for vector operations.

[1]: rust-lang/libm#459
[2]: rust-lang/libm#534
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants