Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verified Bedrock2 code for Number-Theoretic Transform #1997

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

atrieu
Copy link
Contributor

@atrieu atrieu commented Dec 31, 2024

The PR is still in a bit of a rough state, but I'm opening it to see if there is interest in adding it.

The Number-Theoretic Transform is a technique to accelerate polynomial multiplications used in recent lattice-based cryptography for PQC.

This PR defines:

  • a theory of polynomials and its Chinese Remainder Theorem in Polynomial.v
  • definition of the NTT in CyclotomicDecomposition.v which defines an homomorphism from a type Pquotl (cyclotomic_decomposition n 0) to Pquotl (cyclotomic_decomposition n k) where Pquotl ql is defined as Pquotl (ql: list P): Type := { pl: list P | List.Forall2 (fun p q => Peq p (Pmod p q)) pl ql }, and cyclotomic_decomposition n i is the i-th layer decomposition of X^n + 1. It also defines various optimizations for the NTT.
  • lower-level Gallina code of the NTT in RupicolaNTT.v
  • verified Bedrock2 code in BedrockNTT.v, I initially tried to automatically synthesize the code using Rupicola, but ended up doing the proof manually
  • Bedrock2 code for Barrett and Montgomery Reduction when the field element fits in one word in RupicolaBarrettReduction.v and RupicolaMontgomeryArithmetic.v
  • examples using all the above to synthesize C code of the (inverse) NTT for MLKEM and MLDSA in MLKEM.v and MLDSA.v.

I believe the C code should look like what someone would write after reading the NIST standards with no other reference. In terms of performance, this is slower than the handwritten C reference implementations for Kyber/Dilithium which use a so-called centered signed representation for field elements, and delay reduction of the coefficients to the end of the NTT instead of systematically doing it at each step like the synthesised code.

@andres-erbsen
Copy link
Contributor

andres-erbsen commented Feb 6, 2025

In principle, I am interested in theory of polynomials and NTTs, as well as a Bedrock2 implementation. The main reason I have not undertaken it myself is simply the amount of work I expect it to take. Already polynomial CRT is impressive, let alone the rest of the PR.

FYI, there is another exploration of NTT in Bedrock2 by @lukaszobernig at master...lukaszobernig:fiat-crypto:ntt#diff-2ba131d0bae59c3f1d7bf36175bb93917df54b968bb82545a3ad793cbecee00f

As for what to merge, and in what shape, I think the main considerations are that (1) I want to only merge things I can see myself recommending, either for use or study and (2) it would be really cool to have something on NTTs, and fiat-crypto seems like a good place for this work to live. In that spirit, let me raise some high-level questions

  • I am unsure about cyclotomic deomposition as a central abstraction. Relations in terms of evaluation at roots of unity seem more appealing to me. The file is also over 2k lines, and while some of it is generic, I have not managed to read through it. Do you think a more direct use of polynomial remainer arithmetic understood in terms of its effect on polynomial evaluation would work here? I know Lukas's branch doesn't do the all-but-last-layer NTT yet, but I am hopeful it could be adapted -- please take a look at it and share what you think.
  • Whether to use our own polynomials or math-comp is something I could go either way on. What I like about math-comp is that it defines one canonical notion of polynomials for each field, not a technical abstraction. At the same time, it seems desirable to have most of the operations be defined in terms of other operations in a manner that is blind to the representation. I believe a compromise could be found here, and what you define does look attractive. But perhaps a higher-level question is, are you looking to polish up and then maintain a new polynomial library? Doing this myself had already been on my mind for whenever ZmodZ is off my plate.
  • Bedrock2 code would be very nice to have. Ideally it would have a relatively direct proof in terms of tail-recursion interpretation of the loop and algebraic properties, as opposed to mirroring a Gallina definition, but either would be acceptable.
  • Concrete examples on standard algorithms are very appealing, and the instantiation structure looks reasonable.
  • For word-sized reductions, I would have naively thought what we have in https://github.com/mit-plv/fiat-crypto/tree/master/src/Arithmetic/BarrettReduction and https://github.com/mit-plv/fiat-crypto/tree/master/src/Arithmetic/MontgomeryReduction is close enough to be usable for proving bedrock2 code directly? Is there a serious incompatibility?
  • I am separately trying to make up my mind about what to do with Rupicola; it seems that the main authors are not working on that repository anymore and I'm wondering how much to build new stuff on it. Do you think using Rupicola has been helpful and worthwhile here?

Overall, this is an ambitious and impressive project. I would like to get something like this in, but it would require a fair bit of work and more tight collaboration between us. Please let me know what you think; we could also do a call if that sounds helpful to you.

@atrieu
Copy link
Contributor Author

atrieu commented Feb 10, 2025

I am unsure about cyclotomic deomposition as a central abstraction. Relations in terms of evaluation at roots of unity seem more appealing to me. The file is also over 2k lines, and while some of it is generic, I have not managed to read through it. Do you think a more direct use of polynomial remainer arithmetic understood in terms of its effect on polynomial evaluation would work here? I know Lukas's branch doesn't do the all-but-last-layer NTT yet, but I am hopeful it could be adapted -- please take a look at it and share what you think.

My math is quite rusty, so apologies if I'm missing something. If the aim is to prove that the NTT is a ring isomorphism using evaluation at roots of unity, then the proofs works because there is unicity of the interpolated polynomial given enough distinct values as far as I remember. With an "incomplete" NTT, there are not enough values for that.
Another point I'm not sure about is what would be the type of an incomplete NTT, currently Lukas types it as something like ntt: {p: poly F| degree p < 2^i} -> list F. Would the codomain type still be list F or something else?

That was what I was trying to solve with the cyclotomic decomposition, how to give a type to something like (taken from MLKEM specification/FIPS203)
image
I think the idea is simple enough though the lack of comments in the code clearly does not help. Given parameters n, m and w, such that w^{2^m} = -1, then we have X^{2^n} + 1 = X^{2^n} - w^{2^m} = (X^{2^{n-1}} - w^{2^{m-1}})(X^{2^{n-1}} + w^{2^{m-1}}) = (X^{2^{n-1}} - w^{2^{m-1}})(X^{2^{n-1}} - w^{2^m + 2^{m-1}}), etc.
cyclotomic_decompose i computes the list of the exponents for w at layer i, i.e. [2^m] for i = 0, [2^{m - 1}; 2^m + 2^{m-1}] at layer 1, etc.
cyclotomic_decomposition i does the same with the polynomials, i.e., [X^{2^n} + 1] for i = 0, [X^{2^{n-1}} - w^{2^{m-1}}; X^{2^{n-1}} - w^{2^m + 2^{m-1}}] for i = 1, etc.
So the NTT can be roughly given a type like { pl: list P | Forall2 (fun p q => p = p mod q) pl (cyclotomic_decomposition 0)} -> { pl: list P | Forall2 (fun p q => p = p mod q) pl (cyclotomic_decomposition (min n m)}, with the former corresponding to F[X]/(X^{2^n} + 1) and the latter to something like from the MLKEM spec.

Another small concern, Lukas assumes a primitive root of unity. So when instantiating the theorem, one will need to prove that the root is primitive, I don't know if that is a difficult to check? In my formalization, I only need to know that w^{2^m} = -1 which can be tested easily by computation.

Whether to use our own polynomials or math-comp is something I could go either way on. What I like about math-comp is that it defines one canonical notion of polynomials for each field, not a technical abstraction. At the same time, it seems desirable to have most of the operations be defined in terms of other operations in a manner that is blind to the representation. I believe a compromise could be found here, and what you define does look attractive. But perhaps a higher-level question is, are you looking to polish up and then maintain a new polynomial library? Doing this myself had already been on my mind for whenever ZmodZ is off my plate.

I only formalized enough polynomial theory to prove the NTT, so there are a lot of things lacking. I did not even define polynomial evaluation for instance. I don't think I want to develop a full-fledged polynomial library, so I would only add things if specific needs arise. Regarding representation of polynomials, I also like how much of it is blind to the representation, though I ended up having to duplicate quite a bit of code to operate on list of coefficients rather than general polynomials, which is partly why the cyclotomic decomposition file is so big, and I wonder if I could have done better to avoid that and/or if representing directly polynomials as list of coefficients could have avoided that.
I also defined R[X]/Q as a dependent type similar to how F q is defined. I don't know if defining it as {p: P| degree p < degree Q} could have made it easier.

For word-sized reductions, I would have naively thought what we have in https://github.com/mit-plv/fiat-crypto/tree/master/src/Arithmetic/BarrettReduction and https://github.com/mit-plv/fiat-crypto/tree/master/src/Arithmetic/MontgomeryReduction is close enough to be usable for proving bedrock2 code directly? Is there a serious incompatibility?

There is no problem with proving bedrock2 code directly, I just wanted to finally manage to have something working with Rupicola.

I am separately trying to make up my mind about what to do with Rupicola; it seems that the main authors are not working on that repository anymore and I'm wondering how much to build new stuff on it. Do you think using Rupicola has been helpful and worthwhile here?

I really like the idea of automatically producing verified code from Gallina code, so I tried making it work. Though, from a usability point of view, I often couldn't make sense of why it wouldn't work, so ended up directly proving the bedrock2 code after a while. I think there is also a lack of lemmas useful for cryptography in the library, like multi-word arithmetic for instance. But that's a bit of chicken-and-egg issue, I could also have contributed those lemmas. Though if it's not going to be worked on, it's probably not worthwhile to keep using it imo.

Happy to discuss how to proceed further with this.

@lukaszobernig
Copy link

Another point I'm not sure about is what would be the type of an incomplete NTT, currently Lukas types it as something like ntt: {p: poly F| degree p < 2^i} -> list F. Would the codomain type still be list F or something else?

That's a choice I made since it was easier to state the spec in terms of lists, but the underlying NTT algorithm should be easily adapted to output a polynomial instead (and the spec will have to talk about coefficients instead of list indices).

Another small concern, Lukas assumes a primitive root of unity. So when instantiating the theorem, one will need to prove that the root is primitive, I don't know if that is a difficult to check? In my formalization, I only need to know that w^{2^m} = -1 which can be tested easily by computation.

From what I remember showing that w^{2^m} = -1 is also the only place where I use that w is a "primitive" root of unity, so it should be possible to weaken the hypothesis accordingly.

@andres-erbsen
Copy link
Contributor

andres-erbsen commented Feb 10, 2025

@atrieu your explanation of cyclotomic decomposition is persuasive, and am coming around the the prespective that polynomial evaluation matching remainders of linear-polynomial divisors is a coincidence for the purposes of this work. So, yes, for the codomain of a partial NTT, I think it would indeed be a list of polynomials -- your cyclotomic decomposition. However, what I find appealing about @lukaszobernig's approach is that the NTT itself serves as the definition of the decomposition into remainders by (linear/cyclotomic) polynomials. What I am hoping for is a development structured similarly around the NTT recursion, but proven to materialize remainders with cyclotomics of degree 2^r for any r passed as an argument, rather than just r=0 where this matches polynomial evaluation.

It's possible that I am naively optimistic here, but it seems worth thinking about whether the appropriate partial-NTT generalizations from @atrieu's work could be integrated into the recursion structure of @lukaszobernig's work. I imagine that would require finding all uses of relationships between polynomial remainder and polynomial evaluation, and replacing them with more general lemmas about remainders of the same polynomial with different divisors. Maybe that's what all the theory I haven't worked through in @atrieu's work is about, but it also seems plausible to me that using carefully-chosen recursion to avoid materializing intermediate layers just avoids a lot of incidental complexity? I'm curious about what you both think.

@atrieu
Copy link
Contributor Author

atrieu commented Feb 11, 2025

Here is how I think the forward NTT could look like using cyclotomic decomposition and @lukaszobernig's recursion structure below.

The moduli is basically the equivalent of my cyclotomic_decompose. I think this works great and should avoid a lot of the administrative list manipulation I had to do/prove in my version.

Some remaining worries, I'm not sure how one would write the inverse NTT in the same style, and whether the isomorphism proof is going to be also easier.

mathcomp_ntt.v
From mathcomp Require Import all_ssreflect all_algebra.

Set Implicit Arguments.
Unset Strict Implicit.
Unset Printing Implicit Defensive.

Import GRing.Theory.
Local Open Scope ring_scope.

Section NTT.

Variable F : fieldType.
Variable w : F.

Definition modulus (k l : nat) :=
  'X^(2 ^ k) - (w ^+ l)%:P.

Section WithMN.

(* Modulus is X^{2^n} + 1 = modulus n (2^m) *)
Context (m n : nat).

Context (w_neg1 : w ^+ (2 ^ m) = -1).

Section InductiveCase.
Context(char_F_neq2 : 2 != 0 :> F).
(* Recurse on r *)
Context
  (rec_ntt : nat->nat->nat->{poly F}->seq {poly F})
  (rec_moduli : nat->nat->seq nat)
  (r' k l i : nat)
  (r := r'.+1)
  (p : {poly F}).
Let m1 := modulus (k - 1) (l%/2)%nat.
Let m2 := modulus (k - 1) ((2^m) + (l%/2))%nat.
Let p1 := p %% m1.
Let p2 := p %% m2.
Let ntt_lhs := rec_ntt r' (k - 1) (l%/2) p1.
Let ntt_rhs := rec_ntt r' (k - 1) ((2^m) + (l%/2)) p2.
Definition ntt_body' := ntt_lhs ++ ntt_rhs.
Let moduli_lhs := rec_moduli r' (l%/2).
Let moduli_rhs := rec_moduli r' ((2^m) + (l%/2)).
Definition moduli_body' := moduli_lhs ++ moduli_rhs.

Context (r_leq_k : (r <= k)%nat).
Context (r_leq_l : (l %% (2^r) = 0)%nat).
Context (p_ok : p %% (modulus k l) == p).
(* Maybe missing hypotheses but you get the idea *)
Context (h_rec_ntt :
          forall (k l: nat) (p : {poly F}),
            (r' <= k)%nat ->
            (l %% (2^r') = 0)%nat ->
            p %% (modulus k l) == p ->
            all2 (fun q l => q == p %% (modulus (k - r') l)) (rec_ntt r' k l p) (rec_moduli r' l)).

Lemma ntt_body_spec':
  all2 (fun q l => q == p %% (modulus (k - r) l)) ntt_body' moduli_body'.
Proof. Admitted.

(* Other lemmas that are probably needed about
   map (fun l => (modulus (k - r) l)) moduli_body':
   - The polynomials are pairwise coprime
   - The product of the polynomials is equal to (modulus k l)
 *)

End InductiveCase.

Definition ntt_body rec_ntt (r k l : nat) (p : {poly F}) : seq {poly F} :=
  if r is r'.+1 then
    ntt_body' rec_ntt r' k l p
  else
    [:: p].

Definition moduli_body rec_moduli (r l :nat) : seq nat :=
  if r is r'.+1 then
    moduli_body' rec_moduli r' l
  else
    [:: l].

Fixpoint ntt' r k l p := ntt_body ntt' r k l p.

Definition ntt p := ntt' (min m n) n (2^m) p.

Fixpoint moduli' r l := moduli_body moduli' r l.

Definition moduli := moduli' (min m n) (2^m).

Lemma ntt_spec p:
  p == p %% ('X^(2 ^ n) + 1%:P) ->
  all2 (fun q l => q == p %% (modulus (n - (min n m)) l)) (ntt p) moduli.
Proof. Admitted.

End WithMN.

Section SanityCheck.

(* The FIPS standards specify that the output ring of the ntt is something like
   \Prod_{i = 0}^{2^{min(n,m)} - 1} Z_q[X]/(X^{n - min(n, m)} - w^{2bitrev_m(i) + 1})
   We check below that we have the same order for the powers of w.
*)
Local Definition bitrev (n: nat) (i: nat): nat :=
    let fix aux k := match k with
                     | O => if Nat.testbit i 0%nat then PeanoNat.Nat.setbit 0%nat (n - 1)%nat else 0%nat
                     | S k' => if Nat.testbit i k then PeanoNat.Nat.setbit (aux k') (n - 1 - k)%nat else aux k'
                     end in
    aux (n - 1)%nat.

Local Notation bitrev8 := (bitrev 8%nat). (* Dilithium *)
Local Notation bitrev7 := (bitrev 7%nat). (* Kyber *)

(* Dilithium, X^{256} + 1 = X^{2^8} - w^{2^8} *)
Goal moduli 8 8 = List.map (fun k => (2 * (bitrev8 k) + 1)%nat) (List.seq 0 256%nat). reflexivity. Qed.

(* Kyber, X^{256} + 1 = X^{2^8} - w^{2^7} *)
Goal moduli 7 8 = List.map (fun k => (2 * (bitrev7 k) + 1)%nat) (List.seq 0 128%nat). reflexivity. Qed.

End SanityCheck.

End NTT.

@lukaszobernig
Copy link

I like to think of NTT/inverse NTT as a constructive version of the Chinese remainder theorem for rings. Take a principal ideal domain R and prime ideals I,J \in R such that R = I + J. Then "going down" for an element x \in R corresponds to (u,v) = (x \mod I, x \mod J). And "going up" corresponds to the computational CRT where x can be recovered (using Bezout's identity R = aI + bJ) as x = ubj + vai (where i,j generate I,J, respectively).

This generalises to the situation R = I_1 + ... I_n, and has a nice recursive description as pairwise mod-splitting if n is a power-of-2. What I proved is essentially a "going down" version specialised to R = Z_q[X]/(X^{2^m} - 1) that arrives at Z_q terms, but stopping earlier as you outlined is possible too.

Some remaining worries, I'm not sure how one would write the inverse NTT in the same style, and whether the isomorphism proof is going to be also easier.

The "going up" version I expect to look very similar, a pair-wise Bezout-reconstruction that uses the computational CRT to go up one level each recursive step. I have proved a version of Bezout-reconstruction in Poly.v.

@atrieu
Copy link
Contributor Author

atrieu commented Feb 21, 2025

I finished another version of the decomposition at https://github.com/atrieu/fiat-crypto/blob/ntt2/src/NTT/CyclotomicDecomposition2.v

https://github.com/atrieu/fiat-crypto/blob/ntt2/src/NTT/CyclotomicDecomposition2.v#L441-L450 define the split/recombine with ntt2: Pquot m0 -> Pquot m1 * Pquot m2 and intt2: Pquot m1 * Pquot m2 -> Pquot m0.

I then prove that they are ring homomorphisms:

Lemma ntt_isomorphism2:
      @Hierarchy.ring _ (EQ2 m1 m2) (ZERO2 m1 m2) (ONE2 m1 m2) (OPP2 m1 m2) (ADD2 m1 m2) (SUB2 m1 m2) (MUL2 m1 m2)
      /\ @Ring.is_homomorphism _ eq1 one add mul _ (EQ2 m1 m2) (ONE2 m1 m2) (ADD2 m1 m2) (MUL2 m1 m2) ntt2
      /\ @Ring.is_homomorphism _ (EQ2 m1 m2) (ONE2 m1 m2) (ADD2 m1 m2) (MUL2 m1 m2) _ eq1 one add mul intt2.

The recursive calls are defined here as function compositions https://github.com/atrieu/fiat-crypto/blob/ntt2/src/NTT/CyclotomicDecomposition2.v#L477-L481

Definition ntt_body' (p: Pquot m0): Pquotl decomposition_body' :=
      Pquotl_convert decomposition_body_spec' (Pquotl_app (Ring.apply_unop_pair (rec_ntt r' (k - 1) (Nat.div l 2) r_leq_k' r_leq_m' r_leq_l_lhs) (rec_ntt r' (k - 1) (Nat.pow 2 m + Nat.div l 2) r_leq_k' r_leq_m' r_leq_l_rhs) (ntt2 p))).

Definition intt_body' (pl: Pquotl decomposition_body'): Pquot m0 :=
      intt2 (Ring.apply_unop_pair (rec_intt r' (k - 1) (Nat.div l 2) r_leq_k' r_leq_m' r_leq_l_lhs) (rec_intt r' (k - 1) (Nat.pow 2 m + Nat.div l 2) r_leq_k' r_leq_m' r_leq_l_rhs) (Pquotl_split (Pquotl_convert decomposition_body_spec'' pl))).

These then form ring homomorphisms as compositions of homomorphisms.

The final theorem is https://github.com/atrieu/fiat-crypto/blob/ntt2/src/NTT/CyclotomicDecomposition2.v#L607-L610

  Lemma ntt_isomorphism:
    forall n,
      @Ring.is_homomorphism _ eq1 one add mul _ eql onel addl mull (ntt n)
      /\ @Ring.is_homomorphism _ eql onel addl mull _ eq1 one add mul (intt n).

Not sure if I like all the dependent types though.

@andres-erbsen
Copy link
Contributor

Yes, I like this!

For the dependent types, do you need them just so you can state theorems using Hierarchy definitions, or do you also use nontrivial lemmas from there? (Especially for Pquotl; Pquot does make sense on its own I guess) It is cool that the current form gives us appropriate relations for all ring operations on the residue form, though, and I don't see whether there'd be a nice way to keep it without the dependent types.

For the polynomial library, do you think there is an induction principle that would fit your use cases, maybe something along the lines of "degree induction" or maybe even of_list induction or induction on canonical minimal decompositions? I'm trying to figure out whether I like the abstracting over treatment of polynomial_ops. I am thinking of this as opposed to giving one implementation, proving the current assumed properties as lemmas, and treating it opaque thereafter -- there ought not to be two different notions of polynomials over the same coefficient ring anyway.

Are posicyclic and negacyclic genuinelt different, or would it be a simplification to treat one as the other with negated input?

@atrieu
Copy link
Contributor Author

atrieu commented Feb 28, 2025

For the dependent types, do you need them just so you can state theorems using Hierarchy definitions, or do you also use nontrivial lemmas from there? (Especially for Pquotl; Pquot does make sense on its own I guess)

It's only to use the definitions, which makes things quite short to write. My only grief is that we have definitions like ntt_body' and ntt_body which I find not very nice too read. I think the dependent types can be avoided until the very end where we state the full specification of the ntt by introducing the right predicates anyway.

For the polynomial library, do you think there is an induction principle that would fit your use cases, maybe something along the lines of "degree induction" or maybe even of_list induction or induction on canonical minimal decompositions?

As far as I remember, I only had to use a "degree induction" to prove specifications of the euclidean division and such. The to_list/of_list stuff is only useful as an isomorphism between Pquot p and lists of degree p when proving lower level code. I don't remember having used any induction on it.

I'm trying to figure out whether I like the abstracting over treatment of polynomial_ops. I am thinking of this as opposed to giving one implementation, proving the current assumed properties as lemmas, and treating it opaque thereafter -- there ought not to be two different notions of polynomials over the same coefficient ring anyway.

I don't have a strong opinion on this. I can't see either having a strong advantage over the other, even an implementation of polynomials where forall p q i, coeff p i = coeff q i <-> p = q would only avoid writing some Proper instances I think.

Are posicyclic and negacyclic genuinelt different, or would it be a simplification to treat one as the other with negated input?

I haven't thought about it before, but I think you are right that it might simplify things to define one as the other with negated input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants