-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
On some Arm platforms, the FPU is not strictly conformant to IEEE-754. If the FPU is put in strict standards complance mode, some operations become traps to the OS. The OS must emulate the operation. I believe that all operations involving subnormals (before or after rounding) fall into this case. On x86-64, denormals trigger a (very slow) microcode assist on most cores. On at least Metal, the hardware may flush subnormals to zero at its discretion, and I believe GPUs generally allow this.
In these cases, it is impossible to support strict IEEE-754 behavior if one has real-time requirements, or (in the Arm case) if the OS does not include the needed support code. The real-time case is not just theoretical: when doing audio DSP, subnormals correspond to sounds that are so quiet that they can very much safely be flushed to zero, as they are below the threshold of hearing. Violating hard real-time guarantees, however, is extremely noticeable, so (unless I am very mistaken) audio DSP code generally sets the flush-to-zero and denormals-are-zero bits in the FPU. Requiring all audio DSP code to be written in assembler is silly, and nobody is actually going to do that.
I believe that at least some versions of iOS enable flush-to-zero by default, so any Rust library with a C API must expect to be called in this environment (it’s part of the platform ABI). It’s worth noting that not having FTZ and DAZ set can be a security vulnerability (denial of service) in code that operates on untrusted input, as it can make processing far, far more expensive than it would be otherwise.
Activity
float_semantics
RFC 3514 #128288RalfJung commentedon Apr 3, 2025
Cc @rust-lang/opsem
All of these are about subnormals, right? If so, would be good to reframe the issue (in particular the title) so that we avoid making this yet another float semantics kitchen sink issue.
My first inclination is to say that code which wants "faster" float ops that are e.g. permitted to flush subnormals should use separate operations / types / flags to opt-in to such non-standard behavior. We'll have to emit different LLVM IR for this anyway, using strict FP intrinsics or something like that.
Oh great, they made all C and Rust code that uses floats UB on their platform then (at least when it is built by LLVM).
madsmtm commentedon Apr 3, 2025
FWIW, the only reference to this I could find is https://developer.apple.com/documentation/xcode/writing-armv6-code-for-ios, and Rust doesn't support ARMv6 (and Apple hasn't supported that since iOS 4.2.1 as I understand it) (and even then, you might be able to manually enable it, so even if we were to extend support for that platform, we'd "just" have to do this in a dylib entry point or something).
[-]Some platforms cannot provide strict IEEE-754 conformance due to real-time guarantees and/or hardware limitations[/-][+]Some platforms cannot provide strict IEEE-754 conformant subnormls due to real-time guarantees and/or hardware limitations[/+][-]Some platforms cannot provide strict IEEE-754 conformant subnormls due to real-time guarantees and/or hardware limitations[/-][+]Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations[/+]DemiMarie commentedon Apr 3, 2025
For Metal, Arm, and real-time, I think so. If Rust intends to support Vulkan SPIR-V as a compilation target (which I believe it does), then the Vulkan specification applies, which provides weaker guarantees: infinities, the sign of zero, and NaNs may not be preserved, and infinities and NaNs may become undefined values. There is a standardized method to request stronger guarantees, but it can only be used if the implementation supports them, and supporting them is optional.
[-]Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations[/-][+]Some platforms cannot provide strict IEEE-754 conformant subnormals, infinities, and/or NaNs due to real-time guarantees and/or hardware limitations[/+]DemiMarie commentedon Apr 3, 2025
Looks like D3D12 does not support infinities and NaNs, so layered implementations of Vulkan on top of it inherit this limitation even when the underlying hardware does not have it.
RalfJung commentedon Apr 4, 2025
bstrie commentedon Apr 4, 2025
An entirely new set of parallel methods on floats,
unsafe { foo.add_cursed(bar) }
?DemiMarie commentedon Apr 5, 2025
Floating point math is about the most basic thing a GPU can do, so if it is
unsafe
then so is every GPU kernel. I think it would be better to have a crate-level attribute saying “I’m fine with any of the semantics allowed by SPIR-V,” rather than having to use clumsy operations.The fraction of computing power on a client system that is not in one of those “cursed targets” (multiple address spaces, weird floating point, etc) is well under 50% (probably more like 20% or less) and dropping fairly quickly. Accelerators are where most of the new compute is nowadays.
CAD97 commentedon Apr 5, 2025
Flush-to-zero at least has a simple definition: whenever a subnormal value would produced, non-deterministically produce either that value or zero instead. But if we want to say that GPU floating point is not
unsafe
, we need to be precise as to whether producing an infinity/NaN is undefined (UB, nasal demons, the entire program has no meaning) or merely unspecified (the operation that would produce such a value produces an arbitrary meaningless value instead).But even if the hardware does something "reasonable," the LLVM semantics for floating point are that they operate in the default IEEE-754 environment, and if the hardware doesn't implement that, it is unsound and can result in arbitrary undefined behavior.
3.15. FP Fast Math Mode shows that the various operators specify what fast-math contractions are allowed (i.e. NotNaN, NotInf, NSZ, AllowRecip), and this explicitly notes that this enables "fast math operations which are otherwise unsafe." The FunctionFloatControlINTEL capability (SPV_INTEL_float_controls2 extension) also provides the ability to control 3.37. FP Denorm Mode and 3.38. FP Operation Mode.
If the SPIR-V backend isn't specifying
-spirv-ext=+SPV_INTEL_float_controls2
to LLVM by default, it probably should be. But other divergences mean that GPU Rust is going to be a nonstandard dialect, because there are other things (like dynamic indirection) that are normal on the CPU can't be made to work on the GPU without prohibitive compromises.[-]Some platforms cannot provide strict IEEE-754 conformant subnormals, infinities, and/or NaNs due to real-time guarantees and/or hardware limitations[/-][+]Some platforms cannot provide strict IEEE-754 conformant subnormals due to real-time guarantees and/or hardware limitations[/+]9 remaining items
DemiMarie commentedon Apr 9, 2025
Would it make sense to unconditionally set
"denormal-fp-math"="dynamic,dynamic"
? Does constant folding denormals help on non-esoteric user code? That would avoid the UB.CAD97 commentedon Apr 9, 2025
The issue generally isn't with known denormal values. It's about values that may or may not be denormal in a context. For example,
-0.0 + x
can be folded tox
under the default fpe, but with"denormal-fp-math"="dynamic,dynamic"
this folding should not occur, as the dynamic ftz state will change the result of the addition.Setting
denormal-fp-math
topreserve-sign
orpositive-zero
instead seems to allow for either nondeterministic ftz or non-ftz behavior, but the ftz sign mode must match the processor state to avoid UB.preserve-sign
would be correct behavior for only setting ftz, but ftz and nsz typically come together.This is assuming everything works as I have understood the reference document, which isn't a guarantee.
RalfJung commentedon Apr 9, 2025
I agree those are desirable outcomes. However, we also can't penalize code that wants proper subnormal arithmetic on targets that have it -- that must continue to work and receive the full suite of optimizations. So
"denormal-fp-math"="dynamic,dynamic"
on all code is not an option.How sound is it to mix code with and without
"denormal-fp-math"="dynamic,dynamic"
? Hopefully, fully sound. So we could have a-C
flag or a per-function or per-crate attribute that compiles to"denormal-fp-math"="dynamic,dynamic"
. Hopefully, we can get LLVM to agree that setting the ftz flag is fine as long as all code executed while the flag is set is inside functions compiled with"denormal-fp-math"="dynamic,dynamic"
. That said, since the standard library is not build with that flag, this plan relies on-Zbuild-std
(or having a separate ftz-compatible target that we ship a std for).DemiMarie commentedon Apr 9, 2025
What about building the standard library with that flag? Does the standard library include any code that would be penalized by it significantly, or even at all?
DemiMarie commentedon Apr 9, 2025
Nondeterministic behavior might be okay in at least some applications.
RalfJung commentedon Apr 9, 2025
That seems very hard to say, so I'd be uncomfortable making this a stable guarantee. But it'd be for t-libs-api to decide.
DemiMarie commentedon Apr 9, 2025
Given the audio situation I think it would be better to allow compile-time constant-evaluation of subnormals, but without the requirement that it match the runtime behavior. Is there any non-contrieved situation where this would cause a performance penalty?
RalfJung commentedon Apr 9, 2025
You mean constant folding? "compile-time evaluation" sounds like CTFE but I don't see how that would be relevant here.
If we want to allow const-folding of subnormals we have to specify the semantics as non-deterministically doing subnormal flushing or not. I don't know if/how LLVM can represent that.
I'm the wrong person to answer that question. I know how to make a compiler correct, not how to make it generate fast code. ;)
It could cause correctness issues if code relying on no subnormal flushing calls standard library methods that then do subnormal flushing. So we probably couldn't use just plain non-determinism, we'd have to spell out conditions under which subnormal flushing is guaranteed not to occur.
hanna-kruppe commentedon Apr 9, 2025
Yeah, just declaring subnormals a non-deterministic free-for-all in the language sucks for code that does want them to work properly.
One ugly solution would be to lift the control register some ISAs have for this to the AM level (as thread-local state). This allows accounting for code that needs a specific mode as well as code that is fine with whatever the current mode is. Changing the mode could be fallible on some platforms where e.g. proper subnormal support would require switching from hard float to soft float. But that AM state opens up the same box of pandora for an optimizing compiler as any other deviation from “default fpenv everywhere, changing it is UB” does. Even ignoring the impact on constant folding etc., you have to start treating all floating point operations as depending on this global state, rather than being pure operations that can be scheduled freely. At least they wouldn’t have side effects in this case (in contrast to non-default exception handling), but LLVM is still poorly prepared for a language where all floating point math works that way.
I still have some hope that Rust will eventually be able to support non-default fpenvs in some way. It’ll require tricky language design decisions, but the blocker of good LLVM support will be resolved eventually, and at least the rounding mode portion is well motivated. Perhaps subnormals can piggy-back off that when it does happen. The challenges at the language level are as similar as those at the LLVM level.
RalfJung commentedon Apr 9, 2025
If the AM state only switches between "guaranteed subnormal preservation" and "non-deterministically either preserve subnormals or flush them", then we can still always const-fold with subnormal preservation. So the only optimizations this affects are the ones that truly need an operation to be deterministic, e.g. scalar evolution. That could still be prohibitive though...
hanna-kruppe commentedon Apr 9, 2025
That is an interesting idea but yeah I suspect it doesn’t change the calculus because you’d still have to avoid moving “guaranteed subnormals” operations into code regions where the other mode is enabled. That’s probably the biggest social and engineering challenge: migrating the IR and all code touching the IR away from “pure op that can be freely moved around subject only to SSA form’s defs-dominate-uses rule” and towards something like LLVM’s constrained intrinsics (or operand bundles on regular intrinsics) that can express such dependencies at all.