Skip to content

Commit 2b37dfd

Browse files
committed
Update documentation, update python model and expected results for contest modulus, prepare for test portal.
1 parent ae1e197 commit 2b37dfd

File tree

9 files changed

+45
-17
lines changed

9 files changed

+45
-17
lines changed

README.md

+17-5
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This repository contains the modular squaring multiplier baseline design for the VDF (Verifiable Delay Function) low latency multiplier FPGA competition. For more information about the research behind VDFs see <https://vdfresearch.org/>.
44

5-
The goal of the competition is to create the fastest (lowest latency) 1024 bit modular squaring circuit possible targeting the AWS F1 FPGA platform. Up to $100k in prizes is available across two rounds of the competition. For additional detail see [FPGA Contest](https://supranational.atlassian.net/wiki/spaces/VA/pages/36569208/FPGA+Contest) on the [VDF Alliance](https://supranational.atlassian.net/wiki/spaces/VA/overview) page.
5+
The goal of the competition is to create the fastest (lowest latency) 1024 bit modular squaring circuit possible targeting the AWS F1 FPGA platform. Up to $100k in prizes is available across two rounds of the competition. For additional detail see [FPGA Contest Wiki](https://supranational.atlassian.net/wiki/spaces/VA/pages/36569208/FPGA+Contest) on the [VDF Alliance](https://supranational.atlassian.net/wiki/spaces/VA/overview) page.
66

77
Official competition rules can be found in [FPGA_Competition_Official_Rules_and_Disclosures.pdf](FPGA_Competition_Official_Rules_and_Disclosures.pdf).
88

@@ -20,13 +20,25 @@ x, N are 1024 bits
2020
t = 2^30
2121
2222
x = random
23+
24+
Decimal:
25+
N = 12406669568412474139879892740481443274469842712573568412813185506
26+
49768953373091389100150712146576743094431494074574934345790638408
27+
41220334555160125016331040933690674569571217337630239191517205721
28+
31019760838723984636436085022089677296497856968322944926681990341
29+
4117058030106528073928633017118689826625594484331
30+
31+
Hex:
32+
N = 0xb0ad4555c1ee34c8cb0577d7105a475171760330d577a0777ddcb955b302ad0
33+
803487d78ca267e8e9f5e3f46e35e10ca641a27e622b2d04bb09f3f5e3ad274b1
34+
744f34aeaf90fd45129a02a298dbc430f404f9988c862d10b58c91faba2aa2922
35+
f079229b0c8f88d86bfe6def7d026294ed9dee2504b5d30466f7b0488e2666b
2336
```
2437

2538
Here is a sample implementation in Python:
2639
```
2740
#!/usr/bin/python3
2841
29-
from Crypto.PublicKey import RSA
3042
from random import getrandbits
3143
3244
# Competition is for 1024 bits
@@ -36,10 +48,10 @@ NUM_ITERATIONS = 1000
3648
3749
# Rather than being random each time, we will provide randomly generated values
3850
x = getrandbits(NUM_BITS)
39-
N = RSA.generate(NUM_BITS).n
51+
N = 124066695684124741398798927404814432744698427125735684128131855064976895337309138910015071214657674309443149407457493434579063840841220334555160125016331040933690674569571217337630239191517205721310197608387239846364360850220896772964978569683229449266819903414117058030106528073928633017118689826625594484331
4052
4153
# t should be small for testing purposes.
42-
# For the final FPGA runs, t will be around 1 billion
54+
# For the final FPGA runs, t will be 2^30
4355
t = NUM_ITERATIONS
4456
4557
# Iterative modular squaring t times
@@ -182,7 +194,7 @@ The following are some potential optimization paths.
182194
* Try other algorithms such as Chinese Remainder Theorem, Montgomery/Barrett, etc.
183195
* Shorten the pipeline - we believe a 4-5 cycle pipeline is possible with this design
184196
* Lengthen the pipeline - insert more pipe stages, run with a faster clock
185-
* Change the partial product multiplier size. The DSPs are 26x17 bit multipliers and the modular squaring circuit supports using either by changing a define at the top.
197+
* Change the partial product multiplier size. The DSPs are 26x17 bit unsigned multipliers. The Ozturk modular squaring circuit supports using either 17x17 or 26x17 bit multipliers by changing a define at the top of the file.
186198
* This design uses lookup tables stored in BlockRAM for the reduction step. These are easy to change to distributed memory and there is support in the model to use UltraRAM. For an example using UltraRAM see https://github.com/supranational/vdf-fpga/tree/f72eb8c06eec94a09142f675cde8d1514fb72e60
187199
* Optimize the compression trees and accumulators to make the best use of FPGA LUTs and CARRY8 primitives.
188200
* Floorplan the design.

docs/aws_f1.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -136,11 +136,11 @@ sudo su
136136
source $AWS_FPGA_REPO_DIR/sdaccel_runtime_setup.sh
137137
138138
# Run a short test and verify the result in software
139-
./host -e -u 0 -f 100
139+
./host -e -f 100
140140
141141
# Run a billion iterations starting with an input of 2
142-
./host -u 0 -s 0x2 -f 1000000000
142+
./host -s 0x2 -f 1073741824
143143
```
144144

145-
The expected result of 2^2^1B using the default 1k (64 coefficient) modulus in the Makefile is:
146-
`305939394796769797811431929207587607176284037479412924905827147439718856946037842431593490055940763973150879770720223457997191020439404083394702653096083649807090448385799021330059496823106654989629199132438283594347957634468046231084628857389350823217443926925454895121571284954146032303555585511855910526`
145+
The expected result of 2^2^2^30 using the default 1k (64 coefficient) modulus in the Makefile is:
146+
`9782776834334634490446343758704728706980122657033141222406929631982781114105293252444979173994924549755313289718816652420124314107449156688222852673024696927113240716169907514261823484008194829047317452425855361884165852504086556390349991640188347831084926001670580437428161157316196941905575574310934275893`

docs/test_portal.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The online test portal dramatically lowers the bar to testing your design in AWS F1 environment.
44

5-
Rather than go through the process of enabling AWS, the F1 environment, etc., you can design, test and tune your multiplier and Vivado and submit it to the portal to make sure the results are what you expect.
5+
Rather than go through the process of enabling AWS, the F1 environment, etc., you can design, test and tune your multiplier in Vivado and submit it to the portal to make sure the results are what you expect.
66

77
Once you submit your design, the test portal will clone your repo, run simulation, hardware emulation, synthesis/place and route, and provide the results back to you in an encrypted file on S3.
88

docs/verilator.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The Ozturk design supports verilator as a simulator.
44

5-
While we're big fans of verilator, it unfortunately doesn't support 1024 bit modular squaring using * and %. As a result the default bitwidth for this design when using verilator is 128 bits. We found it can also be finicky with large bitwidths. Unpacked arrays of
5+
While we're big fans of verilator, it unfortunately doesn't support 1024 bit modular squaring using * and %. As a result the default bitwidth for this design when using verilator is 128 bits. We found it can also be finicky with large bitwidths. Unpacked arrays of smaller words seems more stable.
66

77
Enabling verilator takes just a few steps on Ubuntu 18 and AWS F1 CentOS. The setup script requires sudo access to install dependencies.
88

modular_square/model/vdf_basic.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
#!/usr/bin/python3
22

3-
from Crypto.PublicKey import RSA
43
from random import getrandbits
54

65
# Competition is for 1024 bits
@@ -10,10 +9,10 @@
109

1110
# Rather than being random each time, we will provide randomly generated values
1211
x = getrandbits(NUM_BITS)
13-
N = RSA.generate(NUM_BITS).n
12+
N = 124066695684124741398798927404814432744698427125735684128131855064976895337309138910015071214657674309443149407457493434579063840841220334555160125016331040933690674569571217337630239191517205721310197608387239846364360850220896772964978569683229449266819903414117058030106528073928633017118689826625594484331
1413

1514
# t should be small for testing purposes.
16-
# For the final FPGA runs, t will be around 1 billion
15+
# For the final FPGA runs, t will be 2^30
1716
t = NUM_ITERATIONS
1817

1918
# Iterative modular squaring t times

msu/Makefile

+6-2
Original file line numberDiff line numberDiff line change
@@ -42,15 +42,19 @@ judge:
4242

4343
hw_emu:
4444
make clean
45-
MOD_LEN=1024 SIMPLE_SQ=0 $(MAKE) -C $(SDACCEL_DIR) hw_emu
45+
OBJ=obj_hw_emu MOD_LEN=1024 SIMPLE_SQ=0 $(MAKE) -C $(SDACCEL_DIR) hw_emu
4646

4747
hw_emu_simple:
4848
make clean
49-
MOD_LEN=1024 SIMPLE_SQ=1 $(MAKE) -C $(SDACCEL_DIR) hw_emu
49+
OBJ=obj_hw_emu MOD_LEN=1024 SIMPLE_SQ=1 $(MAKE) -C $(SDACCEL_DIR) hw_emu
5050

5151
hw:
5252
MOD_LEN=1024 SIMPLE_SQ=0 $(MAKE) -C $(SDACCEL_DIR) hw
5353

54+
synthesis:
55+
make clean
56+
OBJ=obj_hw_emu MOD_LEN=1024 SIMPLE_SQ=0 $(MAKE) -C $(SDACCEL_DIR) hw_emu
57+
OBJ=obj_hw MOD_LEN=1024 SIMPLE_SQ=0 $(MAKE) -C $(SDACCEL_DIR) hw
5458

5559

5660
# Additional, mostly verilator, targets

msu/rtl/sdaccel/Makefile.sdaccel

-1
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,6 @@ LDCLFLAGS += --xp "vivado_prop:run.impl_1.{STEPS.PLACE_DESIGN.TCL.PRE}=\
6969

7070
LDCLFLAGS += --kernel_frequency 161
7171

72-
7372
############################################################################
7473
# AWS/SDAccel configuration
7574
############################################################################

msu/sw/MSU.cpp

+11
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,18 @@ void MSU::prepare_random_job(bool rrandom) {
110110
void MSU::compute_job() {
111111
struct timespec start_ts;
112112
start_ts = timer_start();
113+
114+
//////////////////////////////////////////////////////////////////////
115+
// PREPROCESSING goes below this line (Montgomery conversion, etc)
116+
//
117+
118+
// Perform the computation
113119
device.compute_job(t_start, t_final, sq_in, sq_out);
120+
121+
//
122+
// POSTPROCESSING goes above this line (Montgomery conversion, etc)
123+
//////////////////////////////////////////////////////////////////////
124+
114125
compute_time = timer_end(start_ts);
115126

116127
if(!quiet) {

vdf_portal_config.json

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"target": "liveness"
3+
}

0 commit comments

Comments
 (0)