Skip to content

oneAPI backend update: kernel and layer optimizations #1218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

haoyanwa
Copy link

@haoyanwa haoyanwa commented Mar 6, 2025

Description

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

This PR introduces improvements to the oneAPI inference backend, focusing on:

  • Utilizing sideband signals (sop and eop) in StreamingBeat for multi-kernel synchronization.
  • Refactoring core layers (Dense & ReLU) to employ always-run kernels and non-blocking I/O.
  • Introducing compile-time type extraction utilities for streamlined template handling.
  • Adding DMA-based data movement for generic execution.
  • Automated code generation.

Sideband Signal Support

  • Added start-of-packet (sop) and end-of-packet (eop) signals for kernel synchronization.
  • The following using-directive is generated per inter-kernel pipe and hostpipe. This ensures multiple kernels can operate in sync.
    using InputBeatT = sycl::ext::intel::experimental::StreamingBeat<
        data_T, // Data type
        true,    // Enable start-of-packet
        true>;   // Enable end-of-packet

Updated Dense and ReLU Layer for Always-Running Execution

  • Uses sop/eop sideband signals for synchronization.
  • Implements non-blocking reads for seamless streaming.
  • Utilizes while loop for always-on kernel execution.

Added DMA Kernels for Hardware Execution

  • DMA-based data movement for improved memory transfer:
    • DMA_convert_data and DMA_convert_data_back move data between host and FPGA efficiently.
    template <class srcType, class dest_pipe, size_t num_iterations> struct DMA_convert_data {};
    template <class src_pipe, class dstType, size_t num_iterations> struct DMA_convert_data_back {};
  • Modification to the way that testbench starts
    q.single_task(DMA_convert_data<float, Conv1DInputPipe, num_iterations>{vals_ptr});
    q.single_task(Myproject{});
    q.single_task(DMA_convert_data_back<Layer4OutPipe, float, num_iterations>{output_ptr}).wait();

Utility Functions for Compile-Time Type Extraction

  • Added helper structs to extract data types from pipes and StreamingBeat:

Tests

Tested the updated layers in emulation, simulation, and hardware run. Tests conducted by generating the project file using the oneAPI backend code generator, and compiling for the binary using cmake.

Test Configuration:

  • Configure the Quartus Prime Pro software with environment variables correctly setup (needed for simulation and bitstream generation.)
  • Configure the oneAPI environment with the extension Environment Configurator for oneAPI Toolkits.
  • Source the setvars script.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@jmitrevs jmitrevs added the please test Trigger testing by creating local PR branch label Mar 6, 2025
@haoyanwa haoyanwa marked this pull request as draft March 6, 2025 15:01
@@ -153,6 +153,9 @@ def create_initial_config(self, part='Arria10', clock_period=5, io_type='io_para
# TODO: add namespace
'WriteTar': write_tar,
}

if 'use_bsp' in _:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably extract this as an actual parameter (and add info about it in the docstring). The _ are generally meant to be ignored, I think. We should also use the same capitalization convention. Also, I am not sure I see how this is used. I saw the define in the C++, but not if this is used to set the define (though I may have missed it.)

@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Mar 6, 2025
@jmitrevs
Copy link
Contributor

Replaced by #1246, with the branch in the main repository

@jmitrevs jmitrevs closed this Mar 26, 2025
@haoyanwa haoyanwa mentioned this pull request Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants