Rounding mode hacking discussion #2

KloudKoder · 2023-02-15T18:55:03Z

This is a placeholder issue for discussion of tasks required for phase 1 testing of new instructions with embedded rounding mode (RM).

KloudKoder · 2023-03-02T02:03:15Z

Ping @rossberg @sunfishcode @titzer @tlively <- You're all more experienced with WASM than I am. I'd like to hear your recommendations for how to hack this up. Can we maybe decide on some tasks to distribute? For starters, I'm willing to edit the instruction descriptions in the WASM manual if someone can point me to which doc I need to edit. Invite whoever else you see fit.

tlively · 2023-03-02T02:29:38Z

The first step would be to create a new proposals/rounding-mode-control/Overview.md document and link to it from the top-level README.md. It would be good to check out the overview docs on some of the other proposals to get an idea for the kind of content that usually goes in there, but that would be the best place to list the details for the newly proposed instructions at this stage.

rossberg · 2023-03-02T05:49:47Z

Here are pointers to relevant documentation:

See the howto for setting up a proposal repo properly.
As @tlively said, you can peek at other proposals for inspiration how to write the initial overview doc.
See the phases doc for how to advance a proposal forward and what the deliverables are for each phase.

Can we maybe decide on some tasks to distribute? For starters, I'm willing to edit the instruction descriptions

Just to set expectations: by default, the champion of a proposal is the main person responsible for most of the work that is described in the above docs. Of course, they can try to get other interested people on board to distribute that work. But everybody is very busy, so don't expect too much. Often the only help champions get is that other folks do code reviews. Either way, better plan to spend substantial amounts of time on it. ;)

The only work for which you usually have to depend on others is the implementation in production engines required in a later phase. For that, you basically need engine vendors prioritising the proposal high enough that they start working on incorporating it. That can take considerable time, but if the proposal is in a good shape then it will happen.

KloudKoder · 2023-03-03T01:49:48Z

@tlively Thanks, got it. I'll post an update here when I have an overview markdown to share. I'll try to follow the format of the existing proposals you linked.

@rossberg Thanks for the links, so at least I have a "map" now. I've literally never compiled anything to WASM so yeah, this could be a long learning curve. So long, in fact, that I think it would be preferable to take up Conrad Watt's offer to hack up a bunch of switch statements for the sake of quickly turning every float arithmetic instruction into an RM-sensitive one.

You there, @conrad-watt ? If so, what do you suggest? Have I misunderstood what you were proposing?

conrad-watt · 2023-03-03T02:23:59Z

To be explicit, I was imagining supporting a source/IR-level "ambient mode + op" set of intrinsics by lowering to a Wasm-level

global $rounding_mode i32

function $fesetround ...
  set $rounding_mode

function $add_float
  if $rounding_mode = 1
    then add_ceil
  elif $rounding_mode = 2
    then add_floor
  ...

In this example (with apologies for the suspicious pseudo-code), source-level fesetround + add_float intrinsics are compiled to Wasm by effectively software-emulating the fpenv.

This was meant to be a relatively low-effort suggestion to get programs with such intrinsics compiling to Wasm if we went with instruction-level rounding modes as currently proposed. However, this isn't something I have the bandwidth or expertise to engineer myself.

KloudKoder · 2023-03-04T02:24:37Z

@conrad-watt Thanks, I understand. I think it's a good hack for a proof-of-concept. I could just copy the rounding mode from the opcode byte directly into your $rounding_mode.

But in order to do that... is there any particular assembler which any of you would recommend for use with WASM, mainly for this sort of low-level test hacking? It would be easier if I could just bypass the usual runtime bloat of C++, Rust, etc. Specifically, I'm looking for the ability to write individual WASM instructions (and, ultimately, modify the assembler in order to encode the proposed new ones so that a hacked version of WASM itself could execute them). An internet search basically just pointed me back to WASM.

tlively · 2023-03-06T05:53:49Z

A couple options would be to add the new instructions to WABT to use its wat2wasm tool or Binaryen to use its wasm-as tool. You could also implement the instructions in the reference interpreter itself, right here in this repo. Here's a PR adding instructions to Binaryen that you could look at to get a feel for what changes would be necessary in that code base: WebAssembly/binaryen#5214.

KloudKoder · 2023-03-07T02:35:31Z

@tlively Thanks! Seems like I ought to be able to cobble something together from all that.

conrad-watt · 2023-03-09T13:00:32Z

Just one more thought - to reduce your exposure to the Wasm-specific parts of the compilation it may make sense to directly expose the hypothetical "per-instruction rounding" Wasm instructions as intrinsics, and then do the strategy sketched above (#2 (comment)) as more of a source-to-source or generic IR-to-IR translation on code written in the fesetround style.

KloudKoder · 2023-03-11T02:08:39Z

@conrad-watt Thanks, I think I see what you mean. So basically connect the source code to the compiled code with macro middleware that implements a "set rounding mode" instruction. That makes sense.

conrad-watt · 2023-03-11T09:03:58Z

Of course, if you know of any code that is already directly written in terms of "per-instruction rounding mode" intrinsics, that would be even better!

KloudKoder · 2023-04-24T13:50:02Z

Progress update... Finally got started on this in earnest. I've decided to go the WABT route, hoping to implement RMs in the interpreter. At present I'm still trying to figure out where the relevant instructions are actually implemented. For example, if I search for "f64.sqrt", I see some tests for it along with some SIMD implementation stuff, but I don't see the basic floating-point implementation either as an interpretation function (which literally executes it) or a compiler function (which injects machine code into an output stream). Surely it's here somewhere but it's probably obscured by renaming. Same goes for other instructions impacted by RM. If I can't make progress on this soon, I guess I'll give up and try my luck with Binaryen.

dschuff · 2023-04-24T20:38:27Z

For adding new instructions to the interpreter, there are a few layers to go through. For instructions like unary operators, the dispatching is done in https://github.com/WebAssembly/wabt/blob/main/src/interp/interp.cc#L1141 with some of the implementations bottoming out in the corresponding header (https://github.com/WebAssembly/wabt/blob/b335131b5983c055bae8e7d0e6724ea556aa843d/include/wabt/interp/interp-math.h#L250) From the top end, you'd also need to add opcodes (https://github.com/WebAssembly/wabt/blob/main/include/wabt/opcode.h) and fill in some layers in between. I'd suggest posting an issue in the WABT repo and you can get some more specific help.

KloudKoder · 2023-04-27T00:49:33Z

@dschuff Thanks! Based on those filenames you provided, I was able to track down the problem: my search script was silently truncating the result list so they never showed up. Problem between keyboard and chair. So, yeah, I think I have enough to chew on now, and will inquire with WABT folks as needed.

whirlicote · 2023-08-02T19:23:06Z

I have a first attempt at implementing a WebAssembly module that realizes the proposed instructions in WebAssembly/design#1456 (comment).

https://gitlab.com/pauldennis/rounding-fiasco/-/blob/main/RoundingFiasco.wasm?ref_type=heads

> wasmtime RoundingFiasco.wasm --invoke f32_add_floor 1 1065353216 # 1.0e-45 + 1.0
1
> wasmtime RoundingFiasco.wasm --invoke f32_add_ceil 1 1065353216 # 1.0e-45 + 1.0
1.0000001

KloudKoder · 2023-08-06T21:16:01Z

@whirlicote All I see is a 13 MB WASM file. Care to elaborate? If you've really implemented all those instructions, even in a mockup form, that would be a very big deal, so I'm probably misunderstanding.

whirlicote · 2023-08-07T08:56:23Z

sure @KloudKoder

WebAssembly has a binary format and a text format for WebAssembly modules. The file extension for binary files is *.wasm. The RoundingFiasco.wasm file is such a WebAssembly module. A WebAssembly module has exports and imports. With the wasm2wat tool you can have a look at the exports (and imports). In the browser one can use certain built in JavaScript functionality to load this file and have a so called WebAssembly module instance. This instance exports JavaScript functions according to the export statements in the wasm file.

The signature of the export statements have the same signature as the proposed intrinsics have. The module exports all proposed rounding functions. I tried out some exported functions by using a tool called wasmtime as well.

The module implements the hard part of the instructions. That means if you call the exports with real numbers (e.g. 23.023 in double precision format) the exported functions work.

The whole thing is "first try" because:

There are edge cases missing. e.g. NaN + NaN won't work, x / 0.0 does not work currently
currently i do not have tests for truncation as i did not need them for my projects (yet?)
i am going to only use the floor variations for the binary operators for performance reasons (changing the rounding mode is very expensive on consumer hardware)
- i need square-root rounding up though, since i don't know how to emulate that with square-root rounding down.
the first 100000 times or so you can call the exported functions just fine. After that the module wants to do some internal garbage collection. In this case the module calls some imports to find out the current time and the like to do some statistics. For my projects i am going to just throw away the instance and compile the module again as the module has no state of interest.

KloudKoder · 2023-08-08T01:38:27Z

@whirlicote Well cool! Unlike some other folks lurking around here, I'm distinctly unqualified to evaluate your code. However, I'm willing and able help test it (and especially assist with weird cases like square roots, which in the interval world emerge from series truncation). But this would require extensive communication that's probably too much for this thread. If that's of interest to you, then reply with some garbage email or chat ID where I might contact you, as Github doesn't seem to allow me to do that on platform. If not, feel free to continue your discussion here.

whirlicote · 2023-08-08T16:47:22Z

Unlike some other folks lurking around here, I'm distinctly unqualified to evaluate your code.

Several questions arise:

How should the rounding instruction be implemented in the reference implementation? There are several possible approaches. One approach is to only allow the reference implementation to compile for CPUs that have the corresponding instructions. For example in C++ adding two doubles with rounding down is as "easy" as something like this:

#include <fenv.h>
#pragma STDC FENV_ACCESS ON

#ifndef FE_DOWNWARD
  #error "FE_DOWNWARD not supported"
#endif

float f32_add_floor(float left, float right)
{
  const int originalRoundingMode = fegetround();
  const int didItWork = fesetround( FE_DOWNWARD );

  if (0 == didItWork)
  {
    // it worked
  }
  else
  {
    throw "could not set rounding mode ";
  }

  const float result = left + right;

  fesetround( originalRoundingMode );

  return result;
}

Another option might be to find an expert that is familiar with a suitable softfloat implementation that supports rounding and use that one. Also certain compilers have flags that allow to emulate the floating point instructions. These flags are usually used for hardware that does not have any floating point arithmetic at all.

Then there are questions concerning the behavior of NaNs. According to Wikipedia a possible anatomy of a single precision NaN is as follows:

s111 1111 1axx xxxx xxxx xxxx xxxx xxxx

s is the sign bit (yes NaNs have a sign).
a==1 the NaN is a silent NaN.
a==0 the NaN is a signaling NaN.
x the payload. The payload cannot be all zero when a==0 (as then the number would be one of ±Infinity)

The inputs of a binary operation could both be NaN. Should the left or the right payload be chosen for the payload of the resulting NaN? How should signaling NaNs be handled?

To answer these questions there are several strategies:

Check what current hardware is doing. Standardize all important variations. Make the behavior implementation defined. Define a recommendation.
Check what current hardware is doing. Find the best approach. Make it so that the most important/interesting hardware implements the instruction with native performance. This strategy influences future decisions for standardizing the currently implementation defined behavior of the already existing floating point operations.

As for the circle of blame:

hardware manufacturer: How can a (not necessarily fast) implementation of this proposal be made as easy as possible for the authors of the production engines? Having a working WebAssembly module like RoundingFiasco.wasm might help. Are there better options?
users: How can a use case be implemented faster (on popular consumer hardware) than the otherwise superior softfloat implementations? One approach might be to request performance improvements to be implemented by production engines when a module uses only rounding down (or up) floating point instructions. Interval arithmetic might be sped up that way.
compiler authors: currently no hurdle here

However, I'm willing and able help test it

I will try to fix any testcase you provide.

[...] But this would require extensive communication [...] Github doesn't seem to allow me to do that on platform [...]

The repository for the development of RoundingFiasco.wasm is hosted on a git hosting service called gitlab (https://gitlab.com/pauldennis/rounding-fiasco). There is an issue tracker to it (https://gitlab.com/pauldennis/rounding-fiasco/-/issues). I think the permissions are that anyone on gitlab can open tickets there.

KloudKoder · 2023-08-09T02:48:51Z

@whirlicote So I can provide a few hints with respect to your questions:

Regarding your f32_add_floor(), seems like you meant FE_DOWNWARD rather than FE_TOWARDZERO.

The NaN corner cases are currently the subject of debate. Therefore I don't think this needs to be a blocking issue for the first iteration of your code.

As to performance optimization, first of all, any silicon such as Intel that doesn't integrate the rounding mode (RM) into the instruction itself is just badly bottlenecked, so there's only so much that can be done by way of mitigation. One option is to reorder instructions so as to minimize the total number of RM switches. But this implies potentially inefficient reordering of memory accesses in a manner which might confuse stride detection logic, resulting in performance losses that exceed the gains, especially when most of the data ends up uncached. So not reordering might be superior overall, and even in such cases, we could still elide all redundant RM switching (like setting FE_DOWNWARD over and over again). And in the long term, once threading is commonplace and the frontside bus is often saturated, we'll have plenty more time to fiddle with the RM control register. Fortunately, the vast majority of functions will never change RM at all, in which case we won't see any negative impact from these new instructions. All considered, this is going to end up being a complex optimization problem requiring extensive profiling of what's actually being run in the wild (which itself is a moving target influenced by our own optimization). We don't necessarily need to hit maximum performance on the first pass.

As to the proposal of only implementing those instructions which are actually supported in hardware, I don't think you have to worry. I'm pretty sure all silicon which can implement WASM code is capable of handling the 4 proposed RMs without having to resort to full-blown software emulation. (Some SIMD FP operations had to be removed in order to ensure this.)

As to testing, I'm thinking of something like this: There's a test mode (even a lousy command line app or the equivalent in the browser) which allows the user to input a pair of f32s or f64s, and apply an instruction to them in the context of a given RM. So for example I could take the ratio of a pair of f64s and round the result toward positive infinity. Hex in, hex out. Then one could write a rather repetitive bash script to do the hundreds of required corner case tests and check every single corresponding result down to the last bit. Vastly better than pingponging case by care here in the discussion. But it would, obviously, require some crude UI to be constructed upfront. Gitlab tickets would be a viable but last resort.

whirlicote · 2023-08-09T11:18:09Z

Regarding your f32_add_floor(), seems like you meant FE_DOWNWARD rather than FE_TOWARDZERO.

I corrected the code example in the edit.

@KloudKoder I exported testcase function to be used with wasmtime on the command line.

A testcase is invoked as follows

wasmtime RoundingFiasco.wasm --invoke testcase $FUNCTION_ID $SUPPOSED_TO_BE_OUTPUT $LEFT_ARGUMENT $MAYBE_USED_RIGHT_ARGUMENT

FUNCTION_ID encodes the input types, the used operator, the rounding mode, etc. The mapping is defined here: https://gitlab.com/pauldennis/rounding-fiasco/-/blob/bdf556f1233f07ed5bdcaee8863fc79fdb33b754/RoundingFiasco.hs#L101

All numbers are encoded as i64. wasmtime accepts them as a bigendian decimal ASCII representation. For example the f32 0.0 is represented as 0. The smallest positive subnormal number 1.0e-45 is represented as 1.

-0.0 becomes 2147483648 for float and 9223372036854775808 for double which has bit pattern 10000000000000000000000000000000 for float and 1000000000000000000000000000000000000000000000000000000000000000 for double.
NaN with payload 8 becomes 2143289352
3.7433924e-23 becomes 439682292
1.0 becomes 4607182418800017408
3.0 becomes 4599676419421066582
0.33333333333333337 becomes 4599676419421066582

For example taking the squareRoot of 0.0 results in 0.0:

wasmtime RoundingFiasco.wasm --invoke testcase 0 0 0 0

For example taking the squareRoot of 1.0e-45 results in 3.7433924e-23:

wasmtime RoundingFiasco.wasm --invoke testcase 0 439682292 1 0

For example taking the ratio of 1.0 and 3.0 gives 0.33333333333333337:

wasmtime RoundingFiasco.wasm --invoke testcase 27 4599676419421066582 4607182418800017408 4613937818241073152

The result of the testcase is indicated by the exit code. 0 means success. something other than 0 means an unsuccessful invokation.

wasmtime RoundingFiasco.wasm --invoke testcase 27 0 4607182418800017408 4613937818241073152
echo $?

A simple testscript could than be written like this:

set -e

wasmtime RoundingFiasco.wasm --invoke testcase 0 0 0 0
wasmtime RoundingFiasco.wasm --invoke testcase 27 0 4607182418800017408 4613937818241073152
wasmtime RoundingFiasco.wasm --invoke testcase 27 4599676419421066582 4607182418800017408 4613937818241073152

hope that helps

KloudKoder · 2023-08-11T02:11:42Z

@whirlicote Really impressive! And yes I get the representation as decimal from i32 or i64, as well as your packed function type representation. That'll do. I'm thinking the best way to find out what your test results you should return is to go back to ancient 8087 assembly language and implement all of the relevant WASM instructions using some obvious corner cases (denormals, signed zeroes, infinities, etc.). Then manually set the precision control (PC) and rounding control (RC) of the control register before each instruction. Then finally print out all the results using your same decimal format, and decorate them into commands of the above format. So probably a C wrapper on 8087 assembly that dumps tons of test cases in your particular format. Then literally just post them here so you can run them on your end and see if anything fails.

Unfortunately I'm slammed for the next couple weeks but I will work on this as I can, unless you have an easier approach.

whirlicote · 2023-08-20T19:57:43Z

I wrote a c++ implementation for the proposed instructions using fesetround() found in #include <fenv.h> similar to #2 (comment).

KloudKoder · 2023-08-25T14:55:58Z

For the record I'm working offline on this with whirlicote.

whirlicote · 2023-08-30T22:03:39Z

I added rounded instructions into the reference interpreter.

The repository is here https://github.com/whirlicote/rounding-mode-control

The diff can be seen here:

whirlicote@7789e4b

KloudKoder · 2023-09-02T02:29:48Z

I'm working on automated test cases for whirlicote's code. In the process I realized that, for clarity, I should consolidate the proposed opcode list here, imported from the original issue linked above. I removed opcodes 0x31 and 0x32 because they were accidental duplicates of 0x2C and 0x2D.

1C f32.sgn (32-Bit Get Sign Bit (to i32))
1D i32.asgn_f32 (32-Bit Get Arithmetic Sign (-1/0/1))
1E f64.sgn (64-Bit Get Sign Bit (to i32))
1F i32.asgn_f64 (64-Bit Get Arithmetic Sign (-1/0/1))
20 00 f32.sqrt_ceil
20 01 f32.sqrt_floor
20 02 f32.sqrt_trunc
21 00 f32.add_ceil
21 01 f32.add_floor
21 02 f32.add_trunc
22 00 f32.sub_ceil
22 01 f32.sub_floor
22 02 f32.sub_trunc
23 00 f32.mul_ceil
23 01 f32.mul_floor
23 02 f32.mul_trunc
24 00 f32.div_ceil
24 01 f32.div_floor
24 02 f32.div_trunc
25 00 f64.sqrt_ceil
25 01 f64.sqrt_floor
25 02 f64.sqrt_trunc
26 00 f64.add_ceil
26 01 f64.add_floor
26 02 f64.add_trunc
27 00 f64.sub_ceil
27 01 f64.sub_floor
27 02 f64.sub_trunc
28 00 f64.mul_ceil
28 01 f64.mul_floor
28 02 f64.mul_trunc
29 00 f64.div_ceil
29 01 f64.div_floor
29 02 f64.div_trunc
2A 00 f32.convert_i32_ceil_s
2A 01 f32.convert_i32_floor_s
2A 02 f32.convert_i32_trunc_s
2B 00 f32.convert_i32_ceil_u
2B 01 f32.convert_i32_floor_u
2B 02 f32.convert_i32_trunc_u
2C 00 f64.convert_i64_ceil_s
2C 01 f64.convert_i64_floor_s
2C 02 f64.convert_i64_trunc_s
2D 00 f64.convert_i64_ceil_u
2D 01 f64.convert_i64_floor_u
2D 02 f64.convert_i64_trunc_u
2E 00 f32.demote_f64_ceil
2E 01 f32.demote_f64_floor
2E 02 f32.demote_f64_trunc
2F 00 f64.convert_i32_ceil_s
2F 01 f64.convert_i32_floor_s
2F 02 f64.convert_i32_trunc_s
30 00 f64.convert_i32_ceil_u
30 01 f64.convert_i32_floor_u
30 02 f64.convert_i32_trunc_u

KloudKoder · 2023-09-02T22:55:55Z

Sorry let's try that again. whirlicote realized that the real problem was that opcodes 2C and 2D were showing an f64 output when in fact it should have been f32. Given that, then the corresponding f64 forms are in fact distinct instructions. However, if you think about it, opcodes 2F and 30 in the foregoing comment are redundant because i32, whether signed or unsigned, will always convert to f64 with no loss of information (on account of 53-bit precision), so rounding is irrelevant. Putting it all together, the corrected map would look like this:

1C f32.sgn (32-Bit Get Sign Bit (to i32))
1D i32.asgn_f32 (32-Bit Get Arithmetic Sign (-1/0/1))
1E f64.sgn (64-Bit Get Sign Bit (to i32))
1F i32.asgn_f64 (64-Bit Get Arithmetic Sign (-1/0/1))
20 00 f32.sqrt_ceil
20 01 f32.sqrt_floor
20 02 f32.sqrt_trunc
21 00 f32.add_ceil
21 01 f32.add_floor
21 02 f32.add_trunc
22 00 f32.sub_ceil
22 01 f32.sub_floor
22 02 f32.sub_trunc
23 00 f32.mul_ceil
23 01 f32.mul_floor
23 02 f32.mul_trunc
24 00 f32.div_ceil
24 01 f32.div_floor
24 02 f32.div_trunc
25 00 f64.sqrt_ceil
25 01 f64.sqrt_floor
25 02 f64.sqrt_trunc
26 00 f64.add_ceil
26 01 f64.add_floor
26 02 f64.add_trunc
27 00 f64.sub_ceil
27 01 f64.sub_floor
27 02 f64.sub_trunc
28 00 f64.mul_ceil
28 01 f64.mul_floor
28 02 f64.mul_trunc
29 00 f64.div_ceil
29 01 f64.div_floor
29 02 f64.div_trunc
2A 00 f32.convert_i32_ceil_s
2A 01 f32.convert_i32_floor_s
2A 02 f32.convert_i32_trunc_s
2B 00 f32.convert_i32_ceil_u
2B 01 f32.convert_i32_floor_u
2B 02 f32.convert_i32_trunc_u
2C 00 f32.convert_i64_ceil_s
2C 01 f32.convert_i64_floor_s
2C 02 f32.convert_i64_trunc_s
2D 00 f32.convert_i64_ceil_u
2D 01 f32.convert_i64_floor_u
2D 02 f32.convert_i64_trunc_u
2E 00 f32.demote_f64_ceil
2E 01 f32.demote_f64_floor
2E 02 f32.demote_f64_trunc
2F 00 f64.convert_i64_ceil_s
2F 01 f64.convert_i64_floor_s
2F 02 f64.convert_i64_trunc_s
30 00 f64.convert_i64_ceil_u
30 01 f64.convert_i64_floor_u
30 02 f64.convert_i64_trunc_u

whirlicote · 2023-09-11T21:21:01Z

Thanks to the provided testcases from KloudKoder the RoundingFiasco.wasm WebAssembly module now consideres edgecases for rounded instructions such as:

squareRoot (-0.0) == (-0.0)
biggest_finit_float +_ceil smallest_positiv_float == +Infinity
1.0 /_ceil (-0.0) == -Infinity but not == smallest_finit_float
123.0 +_not_floor (-123.0) == 0.0 but 123.0 +_floor (-123.0) == -0.0
123.0 -_not_floor 123.0 == 0.0 but 123.0 -_floor 123.0 == -0.0

whirlicote · 2023-09-18T17:03:23Z

The provided testcases from KloudKoder pass the fork of the reference implementation that implements the proposal:

https://github.com/whirlicote/wabt/tree/rounding-proposal

KloudKoder · 2023-11-21T18:35:28Z

From the 11/21/2023 meeting, we have 2 immediate tasks per Deepti and Conrad, respectively:

Put more detail into the overview document about the performance considerations of the initial implementation and potential subsequent optimization. This mainly comes down to: (a) Hardware with per-instruction RM embedding (looking at RISC-V) can actually utilize its existing instructions which are presently ignored by WASM. (b) Hardware with global RM registers will be nonperformant with obtuse RM switching on every affected instruction, but will be more performant pursuant to future RM switching minimization due to (i) better adherence to the Intel recommendation of fewer unique modes, resulting in faster RM switching even with the same number of switches and (ii) toolchain flags resulting the negate-ceil-negate trick. We probably won't ever need to do instruction reordering if the latter gets widely implemented because there will be essentially no RM switches left to remove.
Get some wide-impact libraries on board, ideally the likes of V8, but CGAL or Inari might be constructive as well. Deepti noted that this isn't a requirement for phase 2 entry, but would become a roadblock to further advancement.

whirlicote · 2024-01-17T09:33:53Z

@dtig In the last meeting regarding rounding variants there was a request to get appraisal from the V8 project regarding the rounding variants proposal. Where/how should we get in touch with the V8 developers?

whirlicote · 2024-04-20T12:44:10Z

I did some digging. This is what I found out by now:

for firefox:

WebAssembly is a subcomponent of the JavaScript runtime. It appears to be named spidermonkey. https://spidermonkey.dev/
There is "File a bug" link to a page called bugzilla.mozilla.org.

For chrome:

WebAssembly is a subcomponent of the JavaScript runtime. It appears to be named after a car engine v8. https://v8.dev/docs/contribute
The webpage links to a mailing list at https://groups.google.com/g/v8-dev

whirlicote · 2025-01-16T21:01:38Z

Communication with v8:

https://groups.google.com/g/v8-dev/c/J5pHNIKBsGk/m/4m4hx9DyCAAJ

whirlicote · 2025-01-16T21:22:29Z

Here are the interesting line diffs of a prototype implementation of the rounding variants proposal in v8. (commit b70457462cb22753a011096c9c9be20275dc4437).

diff --git a/src/codegen/x64/assembler-x64.cc b/src/codegen/x64/assembler-x64.cc
index a9f9c2dd447..835f691a015 100644
--- a/src/codegen/x64/assembler-x64.cc
+++ b/src/codegen/x64/assembler-x64.cc
@@ -2267,6 +2267,38 @@ void Assembler::pushq_imm32(int32_t imm32) {
   emitl(imm32);
 }
 
+
+
+void Assembler::prolog_ceil() {
+  EnsureSpace ensure_space(this);
+  emit(0x0F); emit(0xAE); emit(0x15); emit(0x01); emit(0x00); emit(0x00); emit(0x00);
+  emit(0xA9); emit(0xbf); emit(0x5f); emit(0x00); emit(0x00);
+}
+void Assembler::prolog_trunc() {
+  EnsureSpace ensure_space(this);
+  emit(0x0F); emit(0xAE); emit(0x15); emit(0x01); emit(0x00); emit(0x00); emit(0x00);
+  emit(0xA9); emit(0xbf); emit(0x7f); emit(0x00); emit(0x00);
+}
+void Assembler::prolog_floor() {
+  EnsureSpace ensure_space(this);
+  emit(0x0F); emit(0xAE); emit(0x15); emit(0x01); emit(0x00); emit(0x00); emit(0x00);
+  emit(0xA9); emit(0xbf); emit(0x3f); emit(0x00); emit(0x00);
+}
+
+void Assembler::epilog_ceil() {
+  EnsureSpace ensure_space(this);
+  emit(0x0F); emit(0xAE); emit(0x15); emit(0x01); emit(0x00); emit(0x00); emit(0x00);
+  emit(0xA9); emit(0xbf); emit(0x1f); emit(0x00); emit(0x00);
+}
+void Assembler::epilog_floor() {
+  epilog_ceil();
+}
+void Assembler::epilog_trunc() {
+  epilog_ceil();
+}
+
+
+
 void Assembler::pushfq() {
   EnsureSpace ensure_space(this);
   emit(0x9C);
diff --git a/src/codegen/x64/assembler-x64.h b/src/codegen/x64/assembler-x64.h
index 49f03cb3d3a..08fb0405080 100644
--- a/src/codegen/x64/assembler-x64.h
+++ b/src/codegen/x64/assembler-x64.h
@@ -649,6 +649,14 @@ class V8_EXPORT_PRIVATE Assembler : public AssemblerBase {
   void CodeTargetAlign();
   void LoopHeaderAlign();
 
+  // rounding mode
+  void prolog_ceil();
+  void epilog_ceil();
+  void prolog_floor();
+  void epilog_floor();
+  void prolog_trunc();
+  void epilog_trunc();
+
   // Stack
   void pushfq();
   void popfq();
@@ -1839,6 +1847,28 @@ class V8_EXPORT_PRIVATE Assembler : public AssemblerBase {
 
 #undef AVX_3
 
+  void vaddss_floor(XMMRegister dst, XMMRegister src1, XMMRegister src2) {
+    prolog_floor();
+    vaddss(dst, src1, src2);
+    epilog_floor();
+  }
+  void vaddss_floor(XMMRegister dst, XMMRegister src1, Operand src2) {
+    prolog_floor();
+    vaddss(dst, src1, src2);
+    epilog_floor();
+  }
+
+  void vaddsd_floor(XMMRegister dst, XMMRegister src1, XMMRegister src2) {
+    prolog_floor();
+    vaddsd(dst, src1, src2);
+    epilog_floor();
+  }
+  void vaddsd_floor(XMMRegister dst, XMMRegister src1, Operand src2) {
+    prolog_floor();
+    vaddsd(dst, src1, src2);
+    epilog_floor();
+  }
+
 #define AVX_SSE2_SHIFT_IMM(instr, prefix, escape, opcode, extension)   \
   void v##instr(XMMRegister dst, XMMRegister src, uint8_t imm8) {      \
     XMMRegister ext_reg = XMMRegister::from_code(extension);           \

KloudKoder mentioned this issue Feb 15, 2023

Floating-point rounding mode control prototyping WebAssembly/design#1456

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rounding mode hacking discussion #2

Rounding mode hacking discussion #2

KloudKoder commented Feb 15, 2023

KloudKoder commented Mar 2, 2023

tlively commented Mar 2, 2023

rossberg commented Mar 2, 2023 •

edited

Loading

KloudKoder commented Mar 3, 2023 •

edited

Loading

conrad-watt commented Mar 3, 2023

KloudKoder commented Mar 4, 2023

tlively commented Mar 6, 2023

KloudKoder commented Mar 7, 2023

conrad-watt commented Mar 9, 2023 •

edited

Loading

KloudKoder commented Mar 11, 2023

conrad-watt commented Mar 11, 2023

KloudKoder commented Apr 24, 2023 •

edited

Loading

dschuff commented Apr 24, 2023

KloudKoder commented Apr 27, 2023

whirlicote commented Aug 2, 2023

KloudKoder commented Aug 6, 2023

whirlicote commented Aug 7, 2023

KloudKoder commented Aug 8, 2023

whirlicote commented Aug 8, 2023 •

edited

Loading

KloudKoder commented Aug 9, 2023

whirlicote commented Aug 9, 2023

KloudKoder commented Aug 11, 2023

whirlicote commented Aug 20, 2023

KloudKoder commented Aug 25, 2023

whirlicote commented Aug 30, 2023

KloudKoder commented Sep 2, 2023 •

edited

Loading

KloudKoder commented Sep 2, 2023

whirlicote commented Sep 11, 2023

whirlicote commented Sep 18, 2023

KloudKoder commented Nov 21, 2023

whirlicote commented Jan 17, 2024

whirlicote commented Apr 20, 2024

whirlicote commented Jan 16, 2025

whirlicote commented Jan 16, 2025

Rounding mode hacking discussion #2

Rounding mode hacking discussion #2

Comments

KloudKoder commented Feb 15, 2023

KloudKoder commented Mar 2, 2023

tlively commented Mar 2, 2023

rossberg commented Mar 2, 2023 • edited Loading

KloudKoder commented Mar 3, 2023 • edited Loading

conrad-watt commented Mar 3, 2023

KloudKoder commented Mar 4, 2023

tlively commented Mar 6, 2023

KloudKoder commented Mar 7, 2023

conrad-watt commented Mar 9, 2023 • edited Loading

KloudKoder commented Mar 11, 2023

conrad-watt commented Mar 11, 2023

KloudKoder commented Apr 24, 2023 • edited Loading

dschuff commented Apr 24, 2023

KloudKoder commented Apr 27, 2023

whirlicote commented Aug 2, 2023

KloudKoder commented Aug 6, 2023

whirlicote commented Aug 7, 2023

KloudKoder commented Aug 8, 2023

whirlicote commented Aug 8, 2023 • edited Loading

KloudKoder commented Aug 9, 2023

whirlicote commented Aug 9, 2023

KloudKoder commented Aug 11, 2023

whirlicote commented Aug 20, 2023

KloudKoder commented Aug 25, 2023

whirlicote commented Aug 30, 2023

KloudKoder commented Sep 2, 2023 • edited Loading

KloudKoder commented Sep 2, 2023

whirlicote commented Sep 11, 2023

whirlicote commented Sep 18, 2023

KloudKoder commented Nov 21, 2023

whirlicote commented Jan 17, 2024

whirlicote commented Apr 20, 2024

whirlicote commented Jan 16, 2025

whirlicote commented Jan 16, 2025

rossberg commented Mar 2, 2023 •

edited

Loading

KloudKoder commented Mar 3, 2023 •

edited

Loading

conrad-watt commented Mar 9, 2023 •

edited

Loading

KloudKoder commented Apr 24, 2023 •

edited

Loading

whirlicote commented Aug 8, 2023 •

edited

Loading

KloudKoder commented Sep 2, 2023 •

edited

Loading