Skip to content

Commit 5ef7fac

Browse files
aaronfranketgross35
andcommitted
Minor tweaks from tgross35
Co-authored-by: Trevor Gross <[email protected]>
1 parent 757e62c commit 5ef7fac

File tree

1 file changed

+9
-6
lines changed

1 file changed

+9
-6
lines changed

text/3453-f16-and-f128.md

+9-6
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,14 @@ This RFC proposes adding new IEEE-compliant floating point types `f16` and `f128
1313

1414
The IEEE 754 standard defines many binary floating point formats. The most common of these types are the binary32 and binary64 formats, available in Rust as `f32` and `f64`. However, other formats are useful in various uncommon scenarios. The binary16 format is useful for situations where storage compactness is important and low precision is acceptable, such as HDR images, mesh quantization, and AI neural networks.[^1] The binary128 format is useful for situations where high precision is needed, such as scientific computing contexts.
1515

16-
The proposal is to add `f16` and `f128` types in Rust to represent IEEE 754 binary16 and binary128 respectively. Having `f16` and `f128` types in the Rust language would make Rust an optimal environment for more advanced use cases. Unlike third-party crates, this enables the compiler to perform optimizations for hardware with native support for these types, allows defining literals for these types, and would provide one single canonical data type for these floats, making it easier to exchange data between libraries.
16+
This RFC proposes adding `f16` and `f128` primitive types in Rust to represent IEEE 754 binary16 and binary128, respectively. Having `f16` and `f128` types in the Rust language would allow Rust to better support the above mentioned use cases, allowing for optimizations and native support that may not be possible in a third party crate. Additionally, providing a single canonical data type for these floating point representations will make it easier to exchange data between libraries.
1717

1818
This RFC does not have the goal of covering the entire IEEE 754 standard, since it does not include `f256` and the decimal-float types. This RFC also does not have the goal of adding existing platform-specific float types such as x86's 80-bit double-extended-precision. This RFC does not make a judgement of whether those types should be added in the future, such discussion can be left to a future RFC, but it is not the goal of this RFC.
1919

2020
# Guide-level explanation
2121
[guide-level-explanation]: #guide-level-explanation
2222

23-
`f16` and `f128` are primitive floating types, they can be used like `f32` or `f64`. They always conform to binary16 and binary128 formats defined in the IEEE 754 standard, which means the size of `f16` is always 16-bit, the size of `f128` is always 128-bit, the amount of exponent and mantissa bits follows the standard, and all operations are IEEE 754-compliant.
23+
`f16` and `f128` are primitive floating types, they can be used like `f32` or `f64`. They always conform to binary16 and binary128 formats defined in the IEEE 754 standard, which means the size of `f16` is always 16-bit, the size of `f128` is always 128-bit, the amount of exponent and mantissa bits follows the standard, and all operations are IEEE 754-compliant. Float literals of these sizes have `f16` and `f128` suffixes respectively.
2424

2525
```rust
2626
let val1 = 1.0; // Default type is still f64
@@ -83,7 +83,7 @@ impl From<i64> for f128 { /* ... */ }
8383

8484
Conversions from `i128`/`u128` to `f128` will also be available with `as` casts, which allow for truncated conversions.
8585

86-
`f128` will generate the `fp128` type in LLVM IR. It is also equivalent to C++ `std::float128_t`, C `_Float128`, and GCC `__float128`. `f128` is ABI-compatible with all of these. `f128` values must be aligned in memory on a multiple of 128 bits, or 16 bytes.
86+
`f128` will generate the `fp128` type in LLVM IR. It is also equivalent to C++ `std::float128_t`, C `_Float128`, and GCC `__float128`. `f128` is ABI-compatible with all of these. `f128` values must be aligned in memory on a multiple of 128 bits, or 16 bytes. LLVM provides support for 128-bit float math operations.
8787

8888
On the hardware level, `f128` can be accelerated on RISC-V via [the Q extension](https://five-embeddev.com/riscv-isa-manual/latest/q.html), on IBM [S/390x G5 and later](https://doi.org/10.1147%2Frd.435.0707), and on PowerISA via [BFP128, an optional part of PowerISA v3.0C and later](https://files.openpower.foundation/s/XXFoRATEzSFtdG8). Most platforms do not have hardware support and therefore will need to use a software implementation.
8989

@@ -111,9 +111,9 @@ Crates may define their type on top of a C binding, but extended float type defi
111111

112112
As noted above, there are crates that provide these types, one for `f16` and one for `f128`. Another prior art to reference is [RFC 1504 for int128](https://rust-lang.github.io/rfcs/1504-int128.html).
113113

114-
Many other languages and compilers have support for these proposed float types. As mentioned above, C has `_Float16` and `_Float128`, and C++ has `std::float16_t` and `std::float128_t`. Glibc supports 128-bit floats in software on [many architectures](https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;hb=81325b12b14c44887f1633a2c180a413afc2b504#l143).
114+
Many other languages and compilers have support for these proposed float types. As mentioned above, C has `_Float16` and `_Float128` ([IEC 60559 WG 14 N2601](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2601.pdf)), and C++ has `std::float16_t` and `std::float128_t` ([P1467R9](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html)). Glibc supports 128-bit floats in software on [many architectures](https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;hb=81325b12b14c44887f1633a2c180a413afc2b504#l143). GCC also provides the `libquadmath` library for 128-bit float math operations.
115115

116-
This RFC is designed as a subset of [RFC 3451](https://github.com/rust-lang/rfcs/pull/3451), which proposes adding a variety of float types, including ones not in this RFC designed for interoperability with other languages.
116+
This RFC was split from [RFC 3451], which proposed adding a variety of float types beyond what is in this RFC including interoperability types like `c_longdouble`. The remaining portions [RFC 3451] has since developed into [RFC 3456].
117117

118118
Both this RFC and RFC 3451 are built upon the discussion in [issue 2629](https://github.com/rust-lang/rfcs/issues/2629).
119119

@@ -129,6 +129,9 @@ Several future questions are intentionally left unresolved, and should be handle
129129
# Future possibilities
130130
[future-possibilities]: #future-possibilities
131131

132-
See [RFC 3451](https://github.com/rust-lang/rfcs/pull/3451) for discussion about adding more float types. RFC 3451 is mostly a superset of this RFC.
132+
See [RFC 3456] for discussion about adding more float types including `f80`, `bf16`, and `c_longdouble`, which is an extension of the discussion in [RFC 3451].
133133

134134
[^1]: Existing AI neural networks often use the [16-bit brain float format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) instead of 16-bit half precision, which is a truncated version of 32-bit single precision. This is done to allow performing operations with 32-bit floats and quickly convert to 16-bit for storage.
135+
136+
[RFC 3451]: https://github.com/rust-lang/rfcs/pull/3451
137+
[RFC 3456]: https://github.com/rust-lang/rfcs/pull/3456

0 commit comments

Comments
 (0)