Initial riscv64 vector support (uses standard vector instrinsics for rvv 1.0. Presently VLEN=256 only.) #1037

mjosaarinen · 2025-05-19T14:57:15Z

Summary:
rv64v support (risc-v vector extension 1.0, which is available on newer application-class silicon.)

Steps:
If your pull request consists of multiple sequential changes, please describe them here:

Performed local tests:

lint passing
tests all passing
tests bench passing
tests cbmc passing

Do you expect this change to impact performance: Yes/No
yes (risc-v only)

If yes, please provide local benchmarking results.
Roughly 2.5x perf on silicon with vector hardware.

hanno-becker · 2025-05-19T18:42:47Z

@mjosaarinen If you have nix setup, running autogen should hopefully resolve the linting issues.

rod-chapman · 2025-05-19T20:51:09Z

Note that on the "fastntt3" branch, there are layer-merged implementations of the NTT and INTT that are highly amenable to auto-vectorization with compilers like GCC 14. Benchmarks of that code on an RV64v target were encouraging, so might provide some inspiration for a fully vectorized, hand-written back-end.

mjosaarinen · 2025-05-19T21:05:08Z

Note that on the "fastntt3" branch, there are layer-merged implementations of the NTT and INTT that are highly amenable to auto-vectorization with compilers like GCC 14. Benchmarks of that code on an RV64v target were encouraging, so might provide some inspiration for a fully vectorized, hand-written back-end.

Yeah you can easily double the speed with autovectorization alone, and some Google folks were of the opinion that they wanted to rely on that entirely in BoringSSL (RISC-V Android etc), rather than maintain a hand-optimized version. The resulting code is pretty wild; I looked at that when considering RISC-V ISA extensions ( see slides 17 for example in https://mjos.fi/doc/20240325-rwc-riscv.pdf ). It was almost "too good" -- I suspect that Google has used those NTTs as a microbenchmark when developing LLVM autovectorizers :)

mjosaarinen · 2025-05-19T21:24:06Z

@mjosaarinen If you have nix setup, running autogen should hopefully resolve the linting issues.

Yeah, sorry for abusing your CI like that (I wasn't expecting it to be that extensive), I could have just read the documentation. I'll set up this nix thing.

hanno-becker · 2025-05-20T07:20:41Z

@mjosaarinen Sorry, we should have pointed that out earlier. With the nix environment, you should not need to waste anymore time making the linter happy. Just run format && autogen before pushing.

hanno-becker · 2025-05-21T03:39:58Z

mlkem/native/riscv64/src/rv64v_settings.h

+/* check-magic: off */
+
+/*  Montgomery reduction constants */
+/*  n   = 256; q   = 3329; r   = 2^16 */
+/*  qi  = lift(Mod(-q, r)^-1) */
+#define MLKEM_QI 3327
+
+/*  r1  = lift(Mod(r, q)) */
+#define MLK_MONT_R1 2285
+
+/*  r2  = lift(Mod(r, q)^2) */
+#define MLK_MONT_R2 1353
+
+/*  in  = lift(Mod(n / 2, q)^-1) */
+/*  nr  = (in * r^2) % q */
+#define MLK_MONT_NR 1441
+
+/* check-magic: on */


Suggested change

/* check-magic: off */

/* Montgomery reduction constants */

/* n = 256; q = 3329; r = 2^16 */

/* qi = lift(Mod(-q, r)^-1) */

#define MLKEM_QI 3327

/* r1 = lift(Mod(r, q)) */

#define MLK_MONT_R1 2285

/* r2 = lift(Mod(r, q)^2) */

#define MLK_MONT_R2 1353

/* in = lift(Mod(n / 2, q)^-1) */

/* nr = (in * r^2) % q */

#define MLK_MONT_NR 1441

/* check-magic: on */

/* check-magic: 3327 == pow(-MLKEM_Q, -1, 2^16) */

#define MLKEM_QI 3327

/* check-magic: 2285 == unsigned_mod(2^16, MLKEM_Q) */

#define MLK_MONT_R1 2285

/* check-magic: 1353 == pow(MLK_MONT_R1, 2, MLKEM_Q) */

#define MLK_MONT_R2 1353

/* check-magic: 1441 == pow(2,32 - 7,MLKEM_Q) */

#define MLK_MONT_NR 1441

This auto-checks the magic number explanations in CI.

hanno-becker · 2025-05-21T03:43:27Z

mlkem/native/riscv64/src/rv64v_poly.c

+{
+  /* zetas can be compiled into vector constants; don't pass as a pointer */
+  /* check-magic: off */
+  const int16_t zeta[0x80] = {


Those should ultimately be autogenerated via autogen similar to the other twiddles tables (e.g. zetas.inc). You will be able to copy-paste adjust most of it, I think -- from a cursory look, this is different from zetas.inc only in the order of the twiddles.

Do you have time to look into that, or shall me/Matthias do it as a follow-up?

I will be returning to this code in a couple of weeks, and I don't mind if someone scripts them in the meanwhile. All three tables were generated with throw-away gp-pari statements but are obvious to "reverse engineer" as you note; just a some combo of various orderings of roots-of-unity powers and Montgomery constants. Anyway, I understand what you're after here.

…sics.) Signed-off-by: Markku-Juhani O. Saarinen <[email protected]>

mjosaarinen requested a review from a team as a code owner May 19, 2025 14:57

mjosaarinen force-pushed the rv64v-dev branch 3 times, most recently from 0637941 to 6b4f845 Compare May 19, 2025 18:35

mjosaarinen force-pushed the rv64v-dev branch 2 times, most recently from dcab861 to 38e79bb Compare May 19, 2025 20:40

mjosaarinen mentioned this pull request May 19, 2025

Add RISC-V backend #1035

Open

hanno-becker reviewed May 21, 2025

View reviewed changes

Initial risc-v vector extension support (using standard vector intrin…

cb6bb10

…sics.) Signed-off-by: Markku-Juhani O. Saarinen <[email protected]>

mjosaarinen force-pushed the rv64v-dev branch from 38e79bb to cb6bb10 Compare May 21, 2025 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial riscv64 vector support (uses standard vector instrinsics for rvv 1.0. Presently VLEN=256 only.) #1037

Initial riscv64 vector support (uses standard vector instrinsics for rvv 1.0. Presently VLEN=256 only.) #1037

Uh oh!

mjosaarinen commented May 19, 2025

Uh oh!

hanno-becker commented May 19, 2025

Uh oh!

rod-chapman commented May 19, 2025

Uh oh!

mjosaarinen commented May 19, 2025

Uh oh!

mjosaarinen commented May 19, 2025

Uh oh!

hanno-becker commented May 20, 2025 •

edited

Loading

Uh oh!

hanno-becker May 21, 2025

Uh oh!

hanno-becker May 21, 2025

Uh oh!

mjosaarinen May 21, 2025

Uh oh!

Uh oh!

Initial riscv64 vector support (uses standard vector instrinsics for rvv 1.0. Presently VLEN=256 only.) #1037

Are you sure you want to change the base?

Initial riscv64 vector support (uses standard vector instrinsics for rvv 1.0. Presently VLEN=256 only.) #1037

Uh oh!

Conversation

mjosaarinen commented May 19, 2025

Uh oh!

hanno-becker commented May 19, 2025

Uh oh!

rod-chapman commented May 19, 2025

Uh oh!

mjosaarinen commented May 19, 2025

Uh oh!

mjosaarinen commented May 19, 2025

Uh oh!

hanno-becker commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanno-becker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

hanno-becker May 21, 2025

Choose a reason for hiding this comment

Uh oh!

mjosaarinen May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hanno-becker commented May 20, 2025 •

edited

Loading