Add assembly version of simple operations on aarch64 #459

tgross35 · 2025-01-23T01:57:50Z

Replace core::arch versions of the following with handwritten
assembly, which avoids recursion issues (cg_gcc using rint as a
fallback) as well as problems with aarch64be.

rint
rintf

Additionally, add assembly versions of the following:

fma
fmaf
sqrt
sqrtf

If the fp16 target feature is available, which implies neon, also
include the following:

rintf16
sqrtf16

sqrt is added to match the implementation for x86. fma is included
since it is used by many other routines.

There are a handful of other operations that have assembly
implementations. They are omitted here because we should have basic
float math routines available in core in the near future, which will
allow us to defer to LLVM for assembly lowering rather than implementing
these ourselves.

tgross35 · 2025-01-23T02:15:32Z

@Amanieu would you mind double checking the assembly in src/math/arch/aarch64.rs? I am unsure whether preserves_flags should be set, I believe some of these operations may set flags based on the exception control register.

Cc @hanna-kruppe, while I was working on the others I also replaced the rint vector implementation.

Amanieu

For fmin/fmax you can use the fminnm/fmaxnm instructions which map to IEEE minNum/maxNum.

Also you will want to make these impls conditional on the fp target feature so that these are not used on soft-float targets.

However I'm then questioning how useful these are on hard-float targets: the standard library will invoke the LLVM intrinsic which will lower to the instruction, so the libm function will never be called. If this is only for compiler-builtins then it might be better to keep libm soft-float only.

However I am questioning

Amanieu · 2025-01-23T10:55:17Z

src/math/arch/aarch64.rs

+pub fn rint(mut x: f64) -> f64 {
+    unsafe {
+        asm!(
+            "frintx {x:d}, {x:d}",


You want either:

frintn to use round-to-nearest, ties to even.

frinti to use the current rounding mode in fpcr.

Thanks, rint should follow the rounding mode so I guess frinti is more correct to the C spec. Updated.

Actually rint may optionally raise FE_INEXACT so I think frintx might have worked? Irrelevant for Rust in any case.

If this is only for Rust and should never be picked up by C code, I’d argue for frintn. Rust does not support other rounding modes nor FP exceptions, and if someone ignores that and e.g. causes UB by configuring a non-default rounding mode then it’s better if they get unexpected results immediately than if it appears to work in simple cases.

(If the symbols from libm-via-compiler_builtins do get picked up by C code compiled with FENV_ACCESS enabled, then there’s a much bigger problem because none of the Rust code in libm can support that.)

hanna-kruppe · 2025-01-23T11:49:32Z

However I'm then questioning how useful these are on hard-float targets: the standard library will invoke the LLVM intrinsic which will lower to the instruction, so the libm function will never be called. If this is only for compiler-builtins then it might be better to keep libm soft-float only.

At least some of these are used internally within libm by functions that still need to exist on hard-float targets. For example, floor is used by rem_pio2_large which is needed by many trigonometric functions.

(Plus the benefits for non-compiler-builtins consumers, who are not the main point of this crate but it’s still nice-to-have.)

Amanieu · 2025-01-23T15:44:17Z

For the operations that are used internally, the ideal end state that we want is for libm to use the float methods from core, which will then be lowered by LLVM to the appropriate instructions.

hanna-kruppe · 2025-01-23T16:27:06Z

Is there any harm in taking the improvement now and revisiting once those methods are actually available in core?

tgross35 · 2025-01-24T13:46:38Z

My only motivation here is fma - some of the incoming CORE-math routines rely on it, I wanted to have a more accurate icount comparison without soft fma before mul_add is available in core. Nothing else is important, I just included the other simple ops since they are reasonably trivial.

tgross35 · 2025-04-09T02:39:04Z

I dropped most of this change but kept:

rint because the SIMD calls are preexisting, this is causing issues with cg_gcc
sqrt and fma because they are used for a lot of other routines. This is mostly for direct users of libm until math in core is stable.

Replace `core::arch` versions of the following with handwritten assembly, which avoids recursion issues (cg_gcc using `rint` as a fallback) as well as problems with `aarch64be`. * `rint` * `rintf` Additionally, add assembly versions of the following: * `fma` * `fmaf` * `sqrt` * `sqrtf` If the `fp16` target feature is available, which implies `neon`, also include the following: * `rintf16` * `sqrtf16` `sqrt` is added to match the implementation for `x86`. `fma` is included since it is used by many other routines. There are a handful of other operations that have assembly implementations. They are omitted here because we should have basic float math routines available in `core` in the near future, which will allow us to defer to LLVM for assembly lowering rather than implementing these ourselves.

Includes [1] and [2], which should resolve problems cg_gcc has using scalar math operations as a fallback for vector operations. [1]: rust-lang/libm#459 [2]: rust-lang/libm#534

tgross35 force-pushed the aarch64-asm branch 4 times, most recently from 5d0075b to 51718a1 Compare January 23, 2025 02:12

Amanieu reviewed Jan 23, 2025

View reviewed changes

tgross35 force-pushed the aarch64-asm branch 2 times, most recently from 58fec5c to 8703127 Compare January 24, 2025 13:30

tgross35 force-pushed the aarch64-asm branch 2 times, most recently from f851cb5 to 1a13caa Compare April 9, 2025 02:29

tgross35 force-pushed the aarch64-asm branch from 1a13caa to 5d4fec0 Compare April 9, 2025 02:48

tgross35 force-pushed the aarch64-asm branch from 5d4fec0 to cd779df Compare April 9, 2025 04:42

tgross35 merged commit 96d1400 into rust-lang:master Apr 9, 2025
35 checks passed

tgross35 deleted the aarch64-asm branch April 9, 2025 05:46

tgross35 mentioned this pull request Apr 9, 2025

Update the libm submodule rust-lang/compiler-builtins#814

Merged

tgross35 mentioned this pull request May 5, 2025

Use rounding instructions on aarch64 rust-lang/compiler-builtins#903

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add assembly version of simple operations on aarch64 #459

Add assembly version of simple operations on aarch64 #459

Uh oh!

tgross35 commented Jan 23, 2025 •

edited

Loading

Uh oh!

tgross35 commented Jan 23, 2025

Uh oh!

Amanieu left a comment

Uh oh!

Amanieu Jan 23, 2025

Uh oh!

tgross35 Jan 24, 2025

Uh oh!

tgross35 Jan 24, 2025

Uh oh!

hanna-kruppe Jan 24, 2025

Uh oh!

hanna-kruppe Jan 24, 2025

Uh oh!

hanna-kruppe commented Jan 23, 2025 •

edited

Loading

Uh oh!

Amanieu commented Jan 23, 2025

Uh oh!

hanna-kruppe commented Jan 23, 2025

Uh oh!

tgross35 commented Jan 24, 2025

Uh oh!

tgross35 commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

Add assembly version of simple operations on aarch64 #459

Add assembly version of simple operations on aarch64 #459

Uh oh!

Conversation

tgross35 commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgross35 commented Jan 23, 2025

Uh oh!

Amanieu left a comment

Choose a reason for hiding this comment

Uh oh!

Amanieu Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

tgross35 Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

tgross35 Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

hanna-kruppe Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

hanna-kruppe Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

hanna-kruppe commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Amanieu commented Jan 23, 2025

Uh oh!

hanna-kruppe commented Jan 23, 2025

Uh oh!

tgross35 commented Jan 24, 2025

Uh oh!

tgross35 commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

tgross35 commented Jan 23, 2025 •

edited

Loading

hanna-kruppe commented Jan 23, 2025 •

edited

Loading