[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747

sommerlukas · 2023-03-23T08:39:56Z

Extend kernel fusion for the CUDA backend.

In contrast to the existing SPIR-V based backends, the default binary format for the CUDA backend (PTX or CUBIN) is not suitable as input for the kernel fusion JIT compiler.

This PR therefore extends the driver to additionally embed LLVM IR in the fat binary if the user specifies the -fsycl-embed-ir during compilation, by taking the output of the sycl-post-link step for the CUDA backend.

The JIT compiler has been extended to handle LLVM IR as input format and PTX assembly as output format (including translation via the NVPTX backend). Target-specific parts of the fusion process have been refactored to TargetFusionInformation.

The connecting logic to the JIT compiler in the SYCL RT has been extended to produce valid PI device binaries for the CUDA backend/PI.

Heterogeneous ND ranges are not yet supported for the CUDA backend.

sommerlukas · 2023-03-23T08:42:24Z

/verify with intel/llvm-test-suite#1683

sommerlukas · 2023-03-23T08:47:48Z

@sergey-semenov: The changes to graph_builder.cpp and commands.hpp are necessary to avoid deletion of dependencies.

With fusion, the individual commands for each kernel are replaced by a single command for the fused kernel, and the original commands are deleted without execution. Without this modification, the destructor of the command would call cleanDepEventsThroughOneLevel, which would not only delete the dependency edges of the original command, but also its dependencies.

This would yield an incomplete dependency graph. Instead, the dependencies of the original command are deleted before removal, but only in the case a command is removed without execution during fusion.

sycl-fusion/jit-compiler/lib/translation/KernelTranslation.cpp

sycl-fusion/passes/target/TargetFusionInfo.cpp

clang/lib/Driver/Driver.cpp

sycl-fusion/jit-compiler/lib/translation/KernelTranslation.cpp

sycl-fusion/passes/target/TargetFusionInfo.cpp

Parse each input binary only once. Groom the nvvm annotations for functions deleted before fusion. Signed-off-by: Lukas Sommer <[email protected]>

Signed-off-by: Lukas Sommer <[email protected]>

steffenlarsen

Both the runtime and design doc changes LGTM. Only a couple minor nits.

sycl/doc/design/KernelFusionJIT.md

sycl/source/detail/scheduler/commands.hpp

Signed-off-by: Lukas Sommer <[email protected]>

bader · 2023-05-04T22:49:44Z

@intel/dpcpp-clang-driver-reviewers, could you review driver's changes, please?

mdtoguchi

OK for driver

sommerlukas · 2023-05-05T15:39:01Z

@intel/llvm-gatekeepers This is now approved, could someone please merge this?

aelovikov-intel · 2023-05-08T22:57:14Z

I'm seeing

Unexpectedly Passed Tests (1):
  SYCL :: KernelFusion/device_info_descriptor.cpp

in CUDA pre-commit CI tasks on unrelated PRs. @sommerlukas would you take a look at it, please?

sommerlukas · 2023-05-09T07:27:01Z

I'm seeing
Unexpectedly Passed Tests (1):
  SYCL :: KernelFusion/device_info_descriptor.cpp
in CUDA pre-commit CI tasks on unrelated PRs. @sommerlukas would you take a look at it, please?

@aelovikov-intel Is pre-commit CI running with the latest version of the e2e tests, specifically the device_info_descriptor test? Prior to this PR, the test was marked XFAIL for cuda, but that was removed in this PR (only XFAIL on hip now).

If the version of the e2e tests in pre-commit CI still expects XFAIL for cuda, but the test now passed, that would explain the unexpected pass on CUDA.

sommerlukas requested a review from sergey-semenov March 23, 2023 08:39

sommerlukas requested a review from victor-eds as a code owner March 23, 2023 08:39

sommerlukas self-assigned this Mar 23, 2023

sommerlukas requested review from Naghasan and a team as code owners March 23, 2023 08:39

This was referenced Mar 23, 2023

[SYCL][Fusion] Kernel Fusion support for CUDA backend intel/llvm-test-suite#1683

Open

Kernel Fusion support for CUDA backend sommerlukas/llvm#2

Closed

sommerlukas temporarily deployed to aws March 23, 2023 09:06 — with GitHub Actions Inactive

sommerlukas temporarily deployed to aws March 23, 2023 09:37 — with GitHub Actions Inactive

sommerlukas requested review from mdtoguchi and hchilama March 23, 2023 12:08

victor-eds reviewed Mar 27, 2023

View reviewed changes

sycl-fusion/jit-compiler/lib/translation/KernelTranslation.cpp Show resolved Hide resolved

sycl-fusion/passes/target/TargetFusionInfo.cpp Show resolved Hide resolved

sycl-fusion/passes/target/TargetFusionInfo.cpp Show resolved Hide resolved

sommerlukas force-pushed the experiments/cuda-fusion branch from d323beb to 2737182 Compare March 28, 2023 14:31

sommerlukas temporarily deployed to aws March 28, 2023 14:58 — with GitHub Actions Inactive

mdtoguchi reviewed Mar 28, 2023

View reviewed changes

clang/lib/Driver/Driver.cpp Show resolved Hide resolved

sommerlukas temporarily deployed to aws March 28, 2023 17:54 — with GitHub Actions Inactive

victor-eds reviewed Mar 29, 2023

View reviewed changes

sycl-fusion/jit-compiler/lib/translation/KernelTranslation.cpp Show resolved Hide resolved

sycl-fusion/passes/target/TargetFusionInfo.cpp Show resolved Hide resolved

sommerlukas force-pushed the experiments/cuda-fusion branch from 2737182 to e865d0b Compare April 5, 2023 14:35

sommerlukas requested a review from a team as a code owner April 5, 2023 14:35

sommerlukas requested review from mdtoguchi and victor-eds April 5, 2023 14:36

sommerlukas temporarily deployed to aws April 5, 2023 15:10 — with GitHub Actions Inactive

sommerlukas temporarily deployed to aws April 5, 2023 15:57 — with GitHub Actions Inactive

Naghasan approved these changes Apr 11, 2023

View reviewed changes

victor-eds approved these changes Apr 11, 2023

View reviewed changes

sommerlukas temporarily deployed to aws April 18, 2023 12:17 — with GitHub Actions Inactive

sommerlukas force-pushed the experiments/cuda-fusion branch from 15a6928 to 574763b Compare April 19, 2023 17:19

sommerlukas added 14 commits May 4, 2023 09:08

[SYCL][Fusion] Cache and groom input binaries

1559b85

Parse each input binary only once. Groom the nvvm annotations for functions deleted before fusion. Signed-off-by: Lukas Sommer <[email protected]>

[SYCL][Fusion] Disable heterogeneous ND ranges on CUDA

fd34124

Signed-off-by: Lukas Sommer <[email protected]>

[SYCL][Fusion] Enable JIT caching for CUDA fusion

fc5efbc

Signed-off-by: Lukas Sommer <[email protected]>

[SYCL][Fusion] Catch empty standard arguments

6c14311

Signed-off-by: Lukas Sommer <[email protected]>

[SYCL][Fusion] Rebase and address feedback

980d36d

Signed-off-by: Lukas Sommer <[email protected]>

[SYCL][Fusion] Update linkage graph diagram

4ba8e44

Signed-off-by: Lukas Sommer <[email protected]>

Don't compile NVPTX-specifics if not supported

8fcd4c7

Signed-off-by: Lukas Sommer <[email protected]>

Migrate test changes from intel/llvm-test-suite

75a77fd

Signed-off-by: Lukas Sommer <[email protected]>

Address more PR feedback

a8afe1d

Signed-off-by: Lukas Sommer <[email protected]>

Add test for kernel fusion with math function

f7df423

Signed-off-by: Lukas Sommer <[email protected]>

Document CUDA kernel fusion in design documentation

bc32fad

Signed-off-by: Lukas Sommer <[email protected]>

Update kernel fusion design document

b4d3968

Signed-off-by: Lukas Sommer <[email protected]>

Fix formatting for test

a7e1369

Signed-off-by: Lukas Sommer <[email protected]>

Rebase on branch 'sycl'

88b4ada

Signed-off-by: Lukas Sommer <[email protected]>

sommerlukas force-pushed the experiments/cuda-fusion branch from 9ae366b to 88b4ada Compare May 4, 2023 09:11

steffenlarsen approved these changes May 4, 2023

View reviewed changes

sycl/doc/design/KernelFusionJIT.md Outdated Show resolved Hide resolved

sycl/source/detail/scheduler/commands.hpp Show resolved Hide resolved

Address PR feedback and formatting

4877a40

Signed-off-by: Lukas Sommer <[email protected]>

steffenlarsen approved these changes May 4, 2023

View reviewed changes

sommerlukas temporarily deployed to aws May 4, 2023 15:29 — with GitHub Actions Inactive

sommerlukas temporarily deployed to aws May 4, 2023 20:46 — with GitHub Actions Inactive

mdtoguchi approved these changes May 5, 2023

View reviewed changes

aelovikov-intel merged commit a93e59d into intel:sycl May 8, 2023

This was referenced May 9, 2023

[SYCL][Test E2E] Use %{build}/%{run} in DeviceLib tests #9337

Merged

[SYCL][Test E2E] Use %{build}/%{run} in Basic tests #9332

Merged

[SYCL][Test E2E] Use %{build}/%{run} in Plugin tests #9331

Merged

ayylol mentioned this pull request Mar 25, 2025

[SYCL] Fix 'move instead of copy' Coverity hits #17619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747

[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747

Uh oh!

sommerlukas commented Mar 23, 2023

Uh oh!

sommerlukas commented Mar 23, 2023

Uh oh!

sommerlukas commented Mar 23, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steffenlarsen left a comment

Uh oh!

Uh oh!

Uh oh!

bader commented May 4, 2023

Uh oh!

mdtoguchi left a comment

Uh oh!

sommerlukas commented May 5, 2023

Uh oh!

aelovikov-intel commented May 8, 2023

Uh oh!

sommerlukas commented May 9, 2023

Uh oh!

Uh oh!

[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747

[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747

Uh oh!

Conversation

sommerlukas commented Mar 23, 2023

Uh oh!

sommerlukas commented Mar 23, 2023

Uh oh!

sommerlukas commented Mar 23, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bader commented May 4, 2023

Uh oh!

mdtoguchi left a comment

Choose a reason for hiding this comment

Uh oh!

sommerlukas commented May 5, 2023

Uh oh!

aelovikov-intel commented May 8, 2023

Uh oh!

sommerlukas commented May 9, 2023

Uh oh!

Uh oh!