Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations #139277

Open

Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations#139277

Labels

A-floating-pointC-bugT-compilerT-opsem

DemiMarie

opened

on Apr 2, 2025

Contributor

On some Arm platforms, the FPU is not strictly conformant to IEEE-754. If the FPU is put in strict standards complance mode, some operations become traps to the OS. The OS must emulate the operation. I believe that all operations involving subnormals (before or after rounding) fall into this case. On x86-64, denormals trigger a (very slow) microcode assist on most cores. On at least Metal, the hardware may flush subnormals to zero at its discretion, and I believe GPUs generally allow this.

In these cases, it is impossible to support strict IEEE-754 behavior if one has real-time requirements, or (in the Arm case) if the OS does not include the needed support code. The real-time case is not just theoretical: when doing audio DSP, subnormals correspond to sounds that are so quiet that they can very much safely be flushed to zero, as they are below the threshold of hearing. Violating hard real-time guarantees, however, is extremely noticeable, so (unless I am very mistaken) audio DSP code generally sets the flush-to-zero and denormals-are-zero bits in the FPU. Requiring all audio DSP code to be written in assembler is silly, and nobody is actually going to do that.

I believe that at least some versions of iOS enable flush-to-zero by default, so any Rust library with a C API must expect to be called in this environment (it’s part of the platform ABI). It’s worth noting that not having FTZ and DAZ set can be a security vulnerability (denial of service) in code that operates on untrusted input, as it can make processing far, far more expensive than it would be otherwise.

added

mentioned this

Tracking Issue for float_semantics RFC 3514 #128288

added

Member

Cc @rust-lang/opsem

All of these are about subnormals, right? If so, would be good to reframe the issue (in particular the title) so that we avoid making this yet another float semantics kitchen sink issue.

My first inclination is to say that code which wants "faster" float ops that are e.g. permitted to flush subnormals should use separate operations / types / flags to opt-in to such non-standard behavior. We'll have to emit different LLVM IR for this anyway, using strict FP intrinsics or something like that.

I believe that at least some versions of iOS enable flush-to-zero by default, so any Rust library with a C API must expect to be called in this environment

Oh great, they made all C and Rust code that uses floats UB on their platform then (at least when it is built by LLVM).

madsmtm

Contributor

I believe that at least some versions of iOS enable flush-to-zero by default, so any Rust library with a C API must expect to be called in this environment

Oh great, they made all C and Rust code that uses floats UB on their platform then (at least when it is built by LLVM).

FWIW, the only reference to this I could find is https://developer.apple.com/documentation/xcode/writing-armv6-code-for-ios, and Rust doesn't support ARMv6 (and Apple hasn't supported that since iOS 4.2.1 as I understand it) (and even then, you might be able to manually enable it, so even if we were to extend support for that platform, we'd "just" have to do this in a dylib entry point or something).

DemiMarie

changed the title ~~[-]Some platforms cannot provide strict IEEE-754 conformance due to real-time guarantees and/or hardware limitations[/-]~~ Some platforms cannot provide strict IEEE-754 conformant subnormls due to real-time guarantees and/or hardware limitations

on Apr 3, 2025

DemiMarie

changed the title ~~[-]Some platforms cannot provide strict IEEE-754 conformant subnormls due to real-time guarantees and/or hardware limitations[/-]~~ Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations

on Apr 3, 2025

DemiMarie

ContributorAuthor

All of these are about subnormals, right? If so, would be good to reframe the issue (in particular the title) so that we avoid making this yet another float semantics kitchen sink issue.

For Metal, Arm, and real-time, I think so. If Rust intends to support Vulkan SPIR-V as a compilation target (which I believe it does), then the Vulkan specification applies, which provides weaker guarantees: infinities, the sign of zero, and NaNs may not be preserved, and infinities and NaNs may become undefined values. There is a standardized method to request stronger guarantees, but it can only be used if the implementation supports them, and supporting them is optional.

DemiMarie

changed the title ~~[-]Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations[/-]~~ Some platforms cannot provide strict IEEE-754 conformant subnormals, infinities, and/or NaNs due to real-time guarantees and/or hardware limitations

on Apr 3, 2025

DemiMarie

ContributorAuthor

Looks like D3D12 does not support infinities and NaNs, so layered implementations of Vulkan on top of it inherit this limitation even when the underlying hardware does not have it.

RalfJung

Member

GPU targets causing everyone a headache, as usual. ;) But those have tons of other problems as well, my understanding is not even pointers work properly there. So in terms of categorizing I would say that is a GPU target issue, not a float semantics issue. I doubt they will ever properly implement Rust semantics so we need some general system for crates to opt-in to support such cursed targets.

bstrie

Contributor

so we need some general system for crates to opt-in to support such cursed targets

An entirely new set of parallel methods on floats, unsafe { foo.add_cursed(bar) }?

DemiMarie

ContributorAuthor

Floating point math is about the most basic thing a GPU can do, so if it is unsafe then so is every GPU kernel. I think it would be better to have a crate-level attribute saying “I’m fine with any of the semantics allowed by SPIR-V,” rather than having to use clumsy operations.

The fraction of computing power on a client system that is not in one of those “cursed targets” (multiple address spaces, weird floating point, etc) is well under 50% (probably more like 20% or less) and dropping fairly quickly. Accelerators are where most of the new compute is nowadays.

CAD97

Contributor

Flush-to-zero at least has a simple definition: whenever a subnormal value would produced, non-deterministically produce either that value or zero instead. But if we want to say that GPU floating point is not unsafe, we need to be precise as to whether producing an infinity/NaN is undefined (UB, nasal demons, the entire program has no meaning) or merely unspecified (the operation that would produce such a value produces an arbitrary meaningless value instead).

But even if the hardware does something "reasonable," the LLVM semantics for floating point are that they operate in the default IEEE-754 environment, and if the hardware doesn't implement that, it is unsound and can result in arbitrary undefined behavior.

then the Vulkan specification applies, which provides weaker guarantees: infinities, the sign of zero, and NaNs may not be preserved, and infinities and NaNs may become undefined values

3.15. FP Fast Math Mode shows that the various operators specify what fast-math contractions are allowed (i.e. NotNaN, NotInf, NSZ, AllowRecip), and this explicitly notes that this enables "fast math operations which are otherwise unsafe." The FunctionFloatControlINTEL capability (SPV_INTEL_float_controls2 extension) also provides the ability to control 3.37. FP Denorm Mode and 3.38. FP Operation Mode.

If the SPIR-V backend isn't specifying -spirv-ext=+SPV_INTEL_float_controls2 to LLVM by default, it probably should be. But other divergences mean that GPU Rust is going to be a nonstandard dialect, because there are other things (like dynamic indirection) that are normal on the CPU can't be made to work on the GPU without prohibitive compromises.

DemiMarie

changed the title ~~[-]Some platforms cannot provide strict IEEE-754 conformant subnormals, infinities, and/or NaNs due to real-time guarantees and/or hardware limitations[/-]~~ Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations

on Apr 5, 2025

9 remaining items

DemiMarie

ContributorAuthor

Would it make sense to unconditionally set "denormal-fp-math"="dynamic,dynamic"? Does constant folding denormals help on non-esoteric user code? That would avoid the UB.

CAD97

Contributor

The issue generally isn't with known denormal values. It's about values that may or may not be denormal in a context. For example, -0.0 + x can be folded to x under the default fpe, but with "denormal-fp-math"="dynamic,dynamic" this folding should not occur, as the dynamic ftz state will change the result of the addition.

Setting denormal-fp-math to preserve-sign or positive-zero instead seems to allow for either nondeterministic ftz or non-ftz behavior, but the ftz sign mode must match the processor state to avoid UB. preserve-sign would be correct behavior for only setting ftz, but ftz and nsz typically come together.

This is assuming everything works as I have understood the reference document, which isn't a guarantee.

RalfJung

Member

I’d prefer to avoid a solution that requires littering making audio processing code with unsafe or defaulting to a softfloat ABI on targets where subnormals don’t work. In addition, having to save and restore floating point state around every call to libcore would be very bad, especially because so much desugars to libcore function calls.

I agree those are desirable outcomes. However, we also can't penalize code that wants proper subnormal arithmetic on targets that have it -- that must continue to work and receive the full suite of optimizations. So "denormal-fp-math"="dynamic,dynamic" on all code is not an option.

How sound is it to mix code with and without "denormal-fp-math"="dynamic,dynamic"? Hopefully, fully sound. So we could have a -C flag or a per-function or per-crate attribute that compiles to "denormal-fp-math"="dynamic,dynamic". Hopefully, we can get LLVM to agree that setting the ftz flag is fine as long as all code executed while the flag is set is inside functions compiled with "denormal-fp-math"="dynamic,dynamic". That said, since the standard library is not build with that flag, this plan relies on -Zbuild-std (or having a separate ftz-compatible target that we ship a std for).

DemiMarie

ContributorAuthor

What about building the standard library with that flag? Does the standard library include any code that would be penalized by it significantly, or even at all?

DemiMarie

ContributorAuthor

Setting denormal-fp-math to preserve-sign or positive-zero instead seems to allow for either nondeterministic ftz or non-ftz behavior, but the ftz sign mode must match the processor state to avoid UB. preserve-sign would be correct behavior for only setting ftz, but ftz and nsz typically come together.

Nondeterministic behavior might be okay in at least some applications.

RalfJung

Member

What about building the standard library with that flag? Does the standard library include any code that would be penalized by it significantly, or even at all?

That seems very hard to say, so I'd be uncomfortable making this a stable guarantee. But it'd be for t-libs-api to decide.

DemiMarie

ContributorAuthor

Given the audio situation I think it would be better to allow compile-time constant-evaluation of subnormals, but without the requirement that it match the runtime behavior. Is there any non-contrieved situation where this would cause a performance penalty?

RalfJung

Member

Given the audio situation I think it would be better to allow compile-time constant-evaluation of subnormals,

You mean constant folding? "compile-time evaluation" sounds like CTFE but I don't see how that would be relevant here.

If we want to allow const-folding of subnormals we have to specify the semantics as non-deterministically doing subnormal flushing or not. I don't know if/how LLVM can represent that.

Is there any non-contrieved situation where this would cause a performance penalty?

I'm the wrong person to answer that question. I know how to make a compiler correct, not how to make it generate fast code. ;)

It could cause correctness issues if code relying on no subnormal flushing calls standard library methods that then do subnormal flushing. So we probably couldn't use just plain non-determinism, we'd have to spell out conditions under which subnormal flushing is guaranteed not to occur.

hanna-kruppe

Contributor

Yeah, just declaring subnormals a non-deterministic free-for-all in the language sucks for code that does want them to work properly.

One ugly solution would be to lift the control register some ISAs have for this to the AM level (as thread-local state). This allows accounting for code that needs a specific mode as well as code that is fine with whatever the current mode is. Changing the mode could be fallible on some platforms where e.g. proper subnormal support would require switching from hard float to soft float. But that AM state opens up the same box of pandora for an optimizing compiler as any other deviation from “default fpenv everywhere, changing it is UB” does. Even ignoring the impact on constant folding etc., you have to start treating all floating point operations as depending on this global state, rather than being pure operations that can be scheduled freely. At least they wouldn’t have side effects in this case (in contrast to non-default exception handling), but LLVM is still poorly prepared for a language where all floating point math works that way.

I still have some hope that Rust will eventually be able to support non-default fpenvs in some way. It’ll require tricky language design decisions, but the blocker of good LLVM support will be resolved eventually, and at least the rounding mode portion is well motivated. Perhaps subnormals can piggy-back off that when it does happen. The challenges at the language level are as similar as those at the LLVM level.

RalfJung

Member

But that AM state opens up the same box of pandora for an optimizing compiler as any other deviation from “default fpenv everywhere, changing it is UB” does.

If the AM state only switches between "guaranteed subnormal preservation" and "non-deterministically either preserve subnormals or flush them", then we can still always const-fold with subnormal preservation. So the only optimizations this affects are the ones that truly need an operation to be deterministic, e.g. scalar evolution. That could still be prohibitive though...

hanna-kruppe

Contributor

That is an interesting idea but yeah I suspect it doesn’t change the calculus because you’d still have to avoid moving “guaranteed subnormals” operations into code regions where the other mode is enabled. That’s probably the biggest social and engineering challenge: migrating the IR and all code touching the IR away from “pure op that can be freely moved around subject only to SSA form’s defs-dominate-uses rule” and towards something like LLVM’s constrained intrinsics (or operand bundles on regular intrinsics) that can express such dependencies at all.

added

and removed

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

A-floating-pointC-bugT-compilerT-opsem

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!