-
Notifications
You must be signed in to change notification settings - Fork 449
oneAPI backend update: kernel and layer optimizations #1246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// and send to the sink. Adaptive to SYCL HLS and hardware acceleration flow. | ||
template <class src_T, class dest_pipe> struct DMA_convert_data { | ||
#if !defined(IS_BSP) | ||
// When targeting a device family, we instantiate an Avalon Memory Mapped Host for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if all the DMA_convert_data things should be moved to a different file. In the SYCL HLS style they are effectively part of the testbench, so I think should be in a different file. In the accelerator flow, they still are different kernels, utility kernels in a way, so I think they should be separate.
@@ -13,22 +13,28 @@ | |||
namespace nnet { | |||
|
|||
template <class srcType, class dest_pipe, size_t SIZE> void convert_data(sycl::queue &q, srcType *src) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss what happens with this function vs the new DMA versions of these.
I noticed, by the way, that ReLU uses blocking reads, and all the components use blocking writes. Is there a requirement to use nonblocking reads and writes? Note, we do need to handle back-pressure, which is much more natural to do with blocking I/O. |
Description
This is a replacement of #1218, moving the branch to the main repository for easier contribution by others.
Type of change
This PR introduces improvements to the oneAPI inference backend, focusing on:
Sideband Signal Support
Updated Dense and ReLU Layer for Always-Running Execution
sop/eop
sideband signals for synchronization.while
loop for always-on kernel execution.Added DMA Kernels for Hardware Execution
DMA_convert_data
andDMA_convert_data_back
move data between host and FPGA efficiently.Utility Functions for Compile-Time Type Extraction
Tests
Tested the updated layers in emulation, simulation, and hardware run. Tests conducted by generating the project file using the oneAPI backend code generator, and compiling for the binary using cmake.
Test Configuration:
setvars
script.Checklist
pre-commit
on the files I edited or added.