Memory benchmark for Intel FPGAs to measure memory bandwidth of OpenCL-supported boards. Supports different blocking shapes, block overlapping and padding.
Refer to the following publication:
- Hamid Reza Zohouri, Satoshi Matsuoka, “The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface” in 5th International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'19), Denver, CO, USA, Nov. 2019. [Paper] [Slides]
Results used for this publication are included in the "results" folder in the repository. If you use this benchmark suite, please cite the above publication.
Hamid Reza Zohouri
https://www.linkedin.com/in/hamid-reza-zohouri-phd/
http://github.com/zohourih
Command:
make *make_target* *make_options*
Make targets | Description |
---|---|
std | Standard (1D blocking) kernel. |
chstd | Channelized version of the above kernel. |
blk2d | 2D overlapped (i.e. 1.5D) kernel. |
chblk2d | Channelized version of the above kernel. |
blk3d | 3D overlapped (i.e. 2.5D) kernel. |
chblk3d | Channelized version of the above kernel. |
sch | Serial channel kernel designed for the Nallatech 510T board. |
Make options | Description | Default |
---|---|---|
INTEL_FPGA=1 | Compile for Intel FPGAs. Relevant environmental variables should be set as defined in Intel's documentation. | Disabled |
AMD=1 | Compile for AMD's OpenCL SDK. Requires the "AMDAPPSDKROOT" environmental variable to be defined. Channelized kernels not supported. | Disabled |
NVIDIA=1 | Compile for NVIDIA's OpenCL SDK. Requires the "CUDA_DIR" environmental variable to be defined. Channelized kernels not supported. | Disabled |
HOST_ONLY=1 | Only compile host code without compiling kernel code. | Disabled |
KERNEL_ONLY=1 | Only compile kernel code without compiling host code. | Disabled |
EMULATOR=1 | Compile for emulation. | Disabled |
FOLDER=VALUE | Override compilation folder. | Same folder as the makefile. |
BOARD=VALUE | Override board name. If BSP supports only one board/hardware, that board will be automatically chosen without needing to supply this option. | Disabled |
NDR=1 | Compile NDRange variation of the kernel. | Unset which will compile the Single Work-item variation |
BSIZE=VALUE | Override block size. Overrides both dimensions of the block size for 2.5D blocking kernels. | 1024 (1024x1024 for 2.5D) |
VEC=VALUE | Override vector size for global memory accesses. | 1 |
FMAX=VALUE | Override the post-place-and-route operating frequency. Require Fmax hack to be enabled (see below). | Disabled |
TFMAX=VALUE | Override the target Fmax for the OpenCL compiler. Can help increase (or decrease) the post-place-and-route operating frequency by 10-50 MHz and meet timing when coupled with FMAX option. Comes at a modest increase in logic and Block RAM usage and might increase loop II if increased too much. Determined by OpenCL compiler; check the HTML report. | Disabled |
SEED=VALUE | Override placement and routing seed. Can help increase (or decrease) the operating frequency by 10-30 MHz and meet timing when coupled with FMAX option. Comes at no extra area cost. | Disabled |
DEPTH=VALUE | Override channel depth for channelized kernels. | 16 |
NO_INTER=1 | Disable interleaving of global memory arrays between external memory banks. | Disabled |
NO_CACHE=1 | Disable the cache automatically generated by the OpenCL compiler in certain cases when burst coalesced global memory ports are inferred. | Disabled |
Command:
./fpga-mem-bench *run_options*
Run options | Description | Default |
---|---|---|
-id VALUE | Target OpenCL device ID for systems with multiple OpenCL devices. | 0 |
-s VALUE | Buffer size in MiB for each golbal array. Only applicable to [ch]std and sch implementations. | 100 |
-x VALUE | Row width in indexes. Only applicable to [ch]blk2d and [ch]blk3d implementations. | [ch]blk2d: 5120, [ch]blk3d: 320 |
-y VALUE | Column height. Only applicable to [ch]blk2d and [ch]blk3d implementations. | [ch]blk2d: 5120, [ch]blk3d: 320 |
-z VALUE | Number of planes. Only applicable to [ch]blk3d implementations. | 256 |
-n VALUE | Number of iterations. Performance and run time is averaged over these number of iterations. | 1 |
-pad VALUE | Number of floats added to the start of all arrays as padding. Affect memory access alignment. | 0 |
-pad_x VALUE | Number of floats added to the start of all rows in the arrays as padding. Affect memory access alignment. Only applicable to [ch]blk2d and [ch]blk3d implementations. | 0 |
-pad_y VALUE | Number of floats added to the start of all columns in the arrays as padding. Affect memory access alignment. Only applicable to [ch]blk3d implementations. | 0 |
-hw VALUE | Halo/overlapping width. Memory accesses start from -VALUE floats outside of the grid, and blocks are overlapped by 2 * VALUE. Affects memory access alignemtn and amount of redundant memory accesses. | 0 |
--verbose | Print what the benchmark is doing at each step. Will also print details of incorrect output values if coupled with --verify. | Disabled |
--verify | Verify correctness of values in output buffers against expected values calcualted on the host CPU. | Disabled |
-h/--help | Print benchmark help and exit. | Disabled |
Bash-based benchmark scripts are provided in the repository for ease of benchmarking. However, they might or might not work on your environment out of the box and modifications will very likely be required to get them to work correctly. Specifically, the variables that are set at the top of the benchmark scripts pretty much always need to be changed.
Quartus Prime Standard:
- Backup *install_dir*/hld/ip/board/bsp/adjust_plls.tcl
- Edit as follows:
Replace
for {set i 0} {$i < $argc} {incr i} {
set v [lindex $argv $i]
if {[string compare $v "-fmax"] == 0 && $i < [expr $argc - 1]} {
set k_fmax [lindex $argv [expr $i + 1]]
post_message "Forcing kernel frequency to $k_fmax"
} elseif {[string compare $v "-skipmif"] == 0} {
set| do_update_mif 0
} elseif {[string compare $v "-skipasm"] == 0} {
set do_asm 0
} elseif {[string compare $v "-skipsta"] == 0} {
set do_sta 0
} elseif {[string compare $v "-testpll"] == 0} {
set do_sta 0
set do_asm 0
set do_update_mif 0
set do_plltest 1
set k_fmax 100.0
}
}
if {$k_fmax == -1} {
set x [get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name]
set k_fmax [ lindex $x 0 ]
set fmax1 [ lindex $x 1 ]
set k_clk_name_full [ lindex $x 2 ]
set fmax2 [ lindex $x 3 ]
set k_clk2x_name_full [ lindex $x 4 ]
}
post_message "Kernel Fmax determined to be $k_fmax\n";
With:
if {$k_fmax == -1} {
set x [get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name]
set k_fmax [ lindex $x 0 ]
set fmax1 [ lindex $x 1 ]
set k_clk_name_full [ lindex $x 2 ]
set fmax2 [ lindex $x 3 ]
set k_clk2x_name_full [ lindex $x 4 ]
}
post_message "Kernel Fmax determined to be $k_fmax\n";
for {set i 0} {$i < $argc} {incr i} {
set v [lindex $argv $i]
if {[string compare $v "-fmax"] == 0 && $i < [expr $argc - 1]} {
set k_fmax [lindex $argv [expr $i + 1]]
post_message "Forcing kernel frequency to $k_fmax"
} elseif {[string compare $v "-skipmif"] == 0} {
set do_update_mif 0
} elseif {[string compare $v "-skipasm"] == 0} {
set do_asm 0
} elseif {[string compare $v "-skipsta"] == 0} {
set do_sta 0
} elseif {[string compare $v "-testpll"] == 0} {
set do_sta 0
set do_asm 0
set do_update_mif 0
set do_plltest 1
set k_fmax 100.0
}
}
Quartus Prime Pro:
- Backup *install_dir*/hld/ip/board/bsp/adjust_plls_a10.tcl for below v18.1 and *install_dir*/hld/ip/board/bsp/adjust_plls.tcl for above.
- Edit as follows:
Replace
if {$k_fmax == -1} {
set x [get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name $iteration]
set k_fmax [ lindex $x 0 ]
set fmax1 [ lindex $x 1 ]
set k_clk_name_full [ lindex $x 2 ]
set fmax2 [ lindex $x 3 ]
set k_clk2x_name_full [ lindex $x 4 ]
}
post_message "Kernel Fmax determined to be $k_fmax";
With:
if {$k_fmax == -1} {
set x [get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name $iteration]
set k_fmax [ lindex $x 0 ]
set fmax1 [ lindex $x 1 ]
set k_clk_name_full [ lindex $x 2 ]
set fmax2 [ lindex $x 3 ]
set k_clk2x_name_full [ lindex $x 4 ]
}
post_message "Kernel Fmax determined to be $k_fmax";
for {set i 0} {$i < $argc} {incr i} {
set v [lindex $argv $i]
if {[string compare $v "-fmax"] == 0 && $i < [expr $argc - 1]} {
set k_fmax [lindex $argv [expr $i + 1]]
post_message "Forcing kernel frequency to $k_fmax"
} elseif {[string compare $v "-skipmif"] == 0} {
set do_update_mif 0
} elseif {[string compare $v "-skipasm"] == 0} {
set do_asm 0
} elseif {[string compare $v "-skipsta"] == 0} {
set do_sta 0
} elseif {[string compare $v "-testpll"] == 0} {
set do_sta 0
set do_asm 0
set do_update_mif 0
set do_plltest 1
set k_fmax 100.0
}
}