This repository is based on original work from Naoya Maruyma on implementing and optimizing first-order Diffusion 2D and 3D stencils for CPUs, GPUs and Xeon Phi. Original repository can be found here. The original implementation has been extended with an OpenCL implementation targeting Intel FPGAs using Intel FPGA SDK for OpenCL. This version implements a highly-optimized design for FPGA with combined spatial and temporal blocking and numerous FPGA-specific optimizations. Moreover, the OpenCL version has been extended to support up to fourth-order stencils.
Common make configuration and timer, helper and power functions for Intel FPGAs, Nvidia GPUs and Intel CPUs
Diffusion 2D/3D implementation for different hardware. Only OpenCL version supports high-order stencils and can be run on FPGAs. Refer to main README in each folder for details of compiling and running the FPGA version, and original README for other versions.
Refer to the following publications:
- Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka, "Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL," Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'18), Feb. 2018. [arXiv][ACM][slides]
- Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka, "High-Performance High-Order Stencil Computation on FPGAs Using OpenCL," 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW'18), May 2018. [arXiv][IEEE]
- Hamid Reza Zohouri, "High Performance Computing with FPGAs and OpenCL," PhD thesis, Tokyo Institute of Technology, Tokyo, Japan, Aug. 2018. [PDF]
The thesis has the most up-to-date results.
Hamid Reza Zohouri
https://www.linkedin.com/in/hamid-reza-zohouri-phd/
http://github.com/zohourih