Skip to content

Memory Benchmark for OpenCL-supported Intel FPGAs

Notifications You must be signed in to change notification settings

zohourih/FPGAMemBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FPGAMemBench

Memory benchmark for Intel FPGAs to measure memory bandwidth of OpenCL-supported boards. Supports different blocking shapes, block overlapping and padding.

Detailed info

Refer to the following publication:

  • Hamid Reza Zohouri, Satoshi Matsuoka, “The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface” in 5th International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'19), Denver, CO, USA, Nov. 2019. [Paper] [Slides]

Results used for this publication are included in the "results" folder in the repository. If you use this benchmark suite, please cite the above publication.

Contact

Hamid Reza Zohouri
https://www.linkedin.com/in/hamid-reza-zohouri-phd/
http://github.com/zohourih

Make

Command:

make *make_target* *make_options*

 

Make targets Description
std Standard (1D blocking) kernel.
chstd Channelized version of the above kernel.
blk2d 2D overlapped (i.e. 1.5D) kernel.
chblk2d Channelized version of the above kernel.
blk3d 3D overlapped (i.e. 2.5D) kernel.
chblk3d Channelized version of the above kernel.
sch Serial channel kernel designed for the Nallatech 510T board.

 

Make options Description Default
INTEL_FPGA=1 Compile for Intel FPGAs. Relevant environmental variables should be set as defined in Intel's documentation. Disabled
AMD=1 Compile for AMD's OpenCL SDK. Requires the "AMDAPPSDKROOT" environmental variable to be defined. Channelized kernels not supported. Disabled
NVIDIA=1 Compile for NVIDIA's OpenCL SDK. Requires the "CUDA_DIR" environmental variable to be defined. Channelized kernels not supported. Disabled
HOST_ONLY=1 Only compile host code without compiling kernel code. Disabled
KERNEL_ONLY=1 Only compile kernel code without compiling host code. Disabled
EMULATOR=1 Compile for emulation. Disabled
FOLDER=VALUE Override compilation folder. Same folder as the makefile.
BOARD=VALUE Override board name. If BSP supports only one board/hardware, that board will be automatically chosen without needing to supply this option. Disabled
NDR=1 Compile NDRange variation of the kernel. Unset which will compile the Single Work-item variation
BSIZE=VALUE Override block size. Overrides both dimensions of the block size for 2.5D blocking kernels. 1024 (1024x1024 for 2.5D)
VEC=VALUE Override vector size for global memory accesses. 1
FMAX=VALUE Override the post-place-and-route operating frequency. Require Fmax hack to be enabled (see below). Disabled
TFMAX=VALUE Override the target Fmax for the OpenCL compiler. Can help increase (or decrease) the post-place-and-route operating frequency by 10-50 MHz and meet timing when coupled with FMAX option. Comes at a modest increase in logic and Block RAM usage and might increase loop II if increased too much. Determined by OpenCL compiler; check the HTML report. Disabled
SEED=VALUE Override placement and routing seed. Can help increase (or decrease) the operating frequency by 10-30 MHz and meet timing when coupled with FMAX option. Comes at no extra area cost. Disabled
DEPTH=VALUE Override channel depth for channelized kernels. 16
NO_INTER=1 Disable interleaving of global memory arrays between external memory banks. Disabled
NO_CACHE=1 Disable the cache automatically generated by the OpenCL compiler in certain cases when burst coalesced global memory ports are inferred. Disabled

Run

Command:

./fpga-mem-bench *run_options*

 

Run options Description Default
-id VALUE Target OpenCL device ID for systems with multiple OpenCL devices. 0
-s VALUE Buffer size in MiB for each golbal array. Only applicable to [ch]std and sch implementations. 100
-x VALUE Row width in indexes. Only applicable to [ch]blk2d and [ch]blk3d implementations. [ch]blk2d: 5120, [ch]blk3d: 320
-y VALUE Column height. Only applicable to [ch]blk2d and [ch]blk3d implementations. [ch]blk2d: 5120, [ch]blk3d: 320
-z VALUE Number of planes. Only applicable to [ch]blk3d implementations. 256
-n VALUE Number of iterations. Performance and run time is averaged over these number of iterations. 1
-pad VALUE Number of floats added to the start of all arrays as padding. Affect memory access alignment. 0
-pad_x VALUE Number of floats added to the start of all rows in the arrays as padding. Affect memory access alignment. Only applicable to [ch]blk2d and [ch]blk3d implementations. 0
-pad_y VALUE Number of floats added to the start of all columns in the arrays as padding. Affect memory access alignment. Only applicable to [ch]blk3d implementations. 0
-hw VALUE Halo/overlapping width. Memory accesses start from -VALUE floats outside of the grid, and blocks are overlapped by 2 * VALUE. Affects memory access alignemtn and amount of redundant memory accesses. 0
--verbose Print what the benchmark is doing at each step. Will also print details of incorrect output values if coupled with --verify. Disabled
--verify Verify correctness of values in output buffers against expected values calcualted on the host CPU. Disabled
-h/--help Print benchmark help and exit. Disabled

Benchmark scripts

Bash-based benchmark scripts are provided in the repository for ease of benchmarking. However, they might or might not work on your environment out of the box and modifications will very likely be required to get them to work correctly. Specifically, the variables that are set at the top of the benchmark scripts pretty much always need to be changed.

To enable Fmax Override:

Quartus Prime Standard:

  • Backup *install_dir*/hld/ip/board/bsp/adjust_plls.tcl
  • Edit as follows:

Replace

for {set i 0} {$i < $argc} {incr i} {
   set v [lindex $argv $i]

   if {[string compare $v "-fmax"] == 0 && $i < [expr $argc - 1]} {
      set k_fmax [lindex $argv [expr $i + 1]]
      post_message "Forcing kernel frequency to $k_fmax"
   } elseif {[string compare $v "-skipmif"] == 0} {
      set| do_update_mif 0
   } elseif {[string compare $v "-skipasm"] == 0} {
      set do_asm 0
   } elseif {[string compare $v "-skipsta"] == 0} {
      set do_sta 0
   } elseif {[string compare $v "-testpll"] == 0} {
      set do_sta 0
      set do_asm 0
      set do_update_mif 0
      set do_plltest 1
      set k_fmax 100.0
   }
}

if {$k_fmax == -1} {
    set x [get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name]
    set k_fmax       [ lindex $x 0 ]
    set fmax1        [ lindex $x 1 ]
    set k_clk_name_full   [ lindex $x 2 ]
    set fmax2        [ lindex $x 3 ]
    set k_clk2x_name_full [ lindex $x 4 ]
}

post_message "Kernel Fmax determined to be $k_fmax\n";

With:

if {$k_fmax == -1} {
    set x [get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name]
    set k_fmax       [ lindex $x 0 ]
    set fmax1        [ lindex $x 1 ]
    set k_clk_name_full   [ lindex $x 2 ]
    set fmax2        [ lindex $x 3 ]
    set k_clk2x_name_full [ lindex $x 4 ]
}

post_message "Kernel Fmax determined to be $k_fmax\n";

for {set i 0} {$i < $argc} {incr i} {
   set v [lindex $argv $i]

   if {[string compare $v "-fmax"] == 0 && $i < [expr $argc - 1]} {
      set k_fmax [lindex $argv [expr $i + 1]]
      post_message "Forcing kernel frequency to $k_fmax"
   } elseif {[string compare $v "-skipmif"] == 0} {
      set do_update_mif 0
   } elseif {[string compare $v "-skipasm"] == 0} {
      set do_asm 0
   } elseif {[string compare $v "-skipsta"] == 0} {
      set do_sta 0
   } elseif {[string compare $v "-testpll"] == 0} {
      set do_sta 0
      set do_asm 0
      set do_update_mif 0
      set do_plltest 1
      set k_fmax 100.0
   }
}

Quartus Prime Pro:

  • Backup *install_dir*/hld/ip/board/bsp/adjust_plls_a10.tcl for below v18.1 and *install_dir*/hld/ip/board/bsp/adjust_plls.tcl for above.
  • Edit as follows:

Replace

if {$k_fmax == -1} {
    set x [get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name $iteration]
    set k_fmax       [ lindex $x 0 ]
    set fmax1        [ lindex $x 1 ]
    set k_clk_name_full   [ lindex $x 2 ]
    set fmax2        [ lindex $x 3 ]
    set k_clk2x_name_full [ lindex $x 4 ]
}

post_message "Kernel Fmax determined to be $k_fmax";

With:

if {$k_fmax == -1} {
    set x [get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name $iteration]
    set k_fmax       [ lindex $x 0 ]
    set fmax1        [ lindex $x 1 ]
    set k_clk_name_full   [ lindex $x 2 ]
    set fmax2        [ lindex $x 3 ]
    set k_clk2x_name_full [ lindex $x 4 ]
}

post_message "Kernel Fmax determined to be $k_fmax";

for {set i 0} {$i < $argc} {incr i} {
   set v [lindex $argv $i]

   if {[string compare $v "-fmax"] == 0 && $i < [expr $argc - 1]} {
      set k_fmax [lindex $argv [expr $i + 1]]
      post_message "Forcing kernel frequency to $k_fmax"
   } elseif {[string compare $v "-skipmif"] == 0} {
      set do_update_mif 0
   } elseif {[string compare $v "-skipasm"] == 0} {
      set do_asm 0
   } elseif {[string compare $v "-skipsta"] == 0} {
      set do_sta 0
   } elseif {[string compare $v "-testpll"] == 0} {
      set do_sta 0
      set do_asm 0
      set do_update_mif 0
      set do_plltest 1
      set k_fmax 100.0
   }
}

About

Memory Benchmark for OpenCL-supported Intel FPGAs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published