oneAPI backend update: kernel and layer optimizations #1246

jmitrevs · 2025-03-26T19:27:57Z

Description

This is a replacement of #1218, moving the branch to the main repository for easier contribution by others.

Type of change

Breaking change (fix or feature that would cause existing functionality to not work as expected)

This PR introduces improvements to the oneAPI inference backend, focusing on:

Utilizing sideband signals (sop and eop) in StreamingBeat for multi-kernel synchronization.
Refactoring core layers (Dense & ReLU) to employ always-run kernels and non-blocking I/O.
Introducing compile-time type extraction utilities for streamlined template handling.
Adding DMA-based data movement for generic execution.
Automated code generation.

Sideband Signal Support

Added start-of-packet (sop) and end-of-packet (eop) signals for kernel synchronization.

The following using-directive is generated per inter-kernel pipe and hostpipe. This ensures multiple kernels can operate in sync.

using InputBeatT = sycl::ext::intel::experimental::StreamingBeat<
    data_T, // Data type
    true,    // Enable start-of-packet
    true>;   // Enable end-of-packet

Updated Dense and ReLU Layer for Always-Running Execution

Uses sop/eop sideband signals for synchronization.
Implements non-blocking reads for seamless streaming.
Utilizes while loop for always-on kernel execution.

Added DMA Kernels for Hardware Execution

DMA-based data movement for improved memory transfer:

DMA_convert_data and DMA_convert_data_back move data between host and FPGA efficiently.

template <class srcType, class dest_pipe, size_t num_iterations> struct DMA_convert_data {};
template <class src_pipe, class dstType, size_t num_iterations> struct DMA_convert_data_back {};

Modification to the way that testbench starts

q.single_task(DMA_convert_data<float, Conv1DInputPipe, num_iterations>{vals_ptr});
q.single_task(Myproject{});
q.single_task(DMA_convert_data_back<Layer4OutPipe, float, num_iterations>{output_ptr}).wait();

Utility Functions for Compile-Time Type Extraction

Added helper structs to extract data types from pipes and StreamingBeat:

Tests

Tested the updated layers in emulation, simulation, and hardware run. Tests conducted by generating the project file using the oneAPI backend code generator, and compiling for the binary using cmake.

Test Configuration:

Configure the Quartus Prime Pro software with environment variables correctly setup (needed for simulation and bitstream generation.)
Configure the oneAPI environment with the extension Environment Configurator for oneAPI Toolkits.
Source the setvars script.

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

hls4ml/backends/oneapi/oneapi_backend.py

oneAPI BSP Support

jmitrevs · 2025-04-17T01:26:21Z

hls4ml/templates/oneapi/firmware/myproject.h

+// and send to the sink. Adaptive to SYCL HLS and hardware acceleration flow.
+template <class src_T, class dest_pipe> struct DMA_convert_data {
+#if !defined(IS_BSP)
+    // When targeting a device family, we instantiate an Avalon Memory Mapped Host for


I wonder if all the DMA_convert_data things should be moved to a different file. In the SYCL HLS style they are effectively part of the testbench, so I think should be in a different file. In the accelerator flow, they still are different kernels, utility kernels in a way, so I think they should be separate.

jmitrevs · 2025-04-17T14:41:09Z

hls4ml/templates/oneapi/firmware/nnet_utils/nnet_helpers.h

@@ -13,22 +13,28 @@
 namespace nnet {

 template <class srcType, class dest_pipe, size_t SIZE> void convert_data(sycl::queue &q, srcType *src) {


We should discuss what happens with this function vs the new DMA versions of these.

jmitrevs · 2025-05-02T21:22:47Z

I noticed, by the way, that ReLU uses blocking reads, and all the components use blocking writes. Is there a requirement to use nonblocking reads and writes? Note, we do need to handle back-pressure, which is much more natural to do with blocking I/O.

haoyanwa and others added 13 commits February 20, 2025 14:36

Init: add examples

70323c9

Input and output DMA.

4162599

Added streaming beat control signal.

34f0d82

Restartable kernel for io_parallel.

951a1f6

Updated oneAPI backend testbench.

8445de7

Updated oneAPI template: io_stream kernel template.

0d21e99

Remove temp files.

257385a

Refactoring oneAPI backend myproject_test.

0b8ef13

Merge branch 'fastmachinelearning:main' into oneapi_backend/experiment

cf98216

Cosmetic change.

70054aa

oneAPI backend simulation support.

c307715

Merge branch 'main' into oneapi_backend/experiment

454d556

pre-commit fixes

7e028e6

jmitrevs changed the title ~~oneAPI backend update: kernel and layer optimizations (replace #1218)~~ oneAPI backend update: kernel and layer optimizations Mar 26, 2025

Merge branch 'main' into oneapi_backend/experiment

97c187d

jmitrevs marked this pull request as draft March 26, 2025 19:28

jmitrevs mentioned this pull request Mar 26, 2025

oneAPI backend update: kernel and layer optimizations #1218

Closed

7 tasks

jmitrevs commented Mar 26, 2025

View reviewed changes

hls4ml/backends/oneapi/oneapi_backend.py Outdated Show resolved Hide resolved

haoyanwa added 2 commits April 1, 2025 13:36

oneAPI BSP support.

00f82a3

User API and documentation.

496846d

haoyanwa mentioned this pull request Apr 1, 2025

oneAPI BSP Support #1254

Merged

jmitrevs and others added 3 commits April 2, 2025 08:40

Merge pull request #1254 from haoyanwa/oneapi_backend/experiment

84ad787

oneAPI BSP Support

pre-commit fixes

120c2e4

Merge branch 'main' into oneapi_backend/experiment

e2cec76

jmitrevs commented Apr 17, 2025

View reviewed changes

update convert_data and convert_data_back to use packets

d869a5c

jmitrevs commented Apr 17, 2025

View reviewed changes

jmitrevs added 3 commits May 1, 2025 15:56

consolidate convert_data and DMA_convert_data in nnet_data_movement.h

7e2e747

update all the activations

0b3dbeb

migrate batchnorm to restartatabe

36881e0

jmitrevs added 2 commits June 11, 2025 18:22

Merge remote-tracking branch 'upstream/main' into jm_oneAPI_experiment

60c0f42

pre-commit fix

44ee08f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

oneAPI backend update: kernel and layer optimizations #1246

oneAPI backend update: kernel and layer optimizations #1246

Uh oh!

jmitrevs commented Mar 26, 2025

Uh oh!

Uh oh!

jmitrevs Apr 17, 2025 •

edited

Loading

Uh oh!

jmitrevs Apr 17, 2025

Uh oh!

jmitrevs commented May 2, 2025

Uh oh!

Uh oh!

		@@ -13,22 +13,28 @@
		namespace nnet {

		template <class srcType, class dest_pipe, size_t SIZE> void convert_data(sycl::queue &q, srcType *src) {

oneAPI backend update: kernel and layer optimizations #1246

Are you sure you want to change the base?

oneAPI backend update: kernel and layer optimizations #1246

Uh oh!

Conversation

jmitrevs commented Mar 26, 2025

Description

Type of change

Sideband Signal Support

Updated Dense and ReLU Layer for Always-Running Execution

Added DMA Kernels for Hardware Execution

Utility Functions for Compile-Time Type Extraction

Tests

Checklist

Uh oh!

Uh oh!

jmitrevs Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmitrevs Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

jmitrevs commented May 2, 2025

Uh oh!

Uh oh!

jmitrevs Apr 17, 2025 •

edited

Loading