Skip to content

[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 8, 2023

Conversation

sommerlukas
Copy link
Contributor

Extend kernel fusion for the CUDA backend.

In contrast to the existing SPIR-V based backends, the default binary format for the CUDA backend (PTX or CUBIN) is not suitable as input for the kernel fusion JIT compiler.

This PR therefore extends the driver to additionally embed LLVM IR in the fat binary if the user specifies the -fsycl-embed-ir during compilation, by taking the output of the sycl-post-link step for the CUDA backend.

The JIT compiler has been extended to handle LLVM IR as input format and PTX assembly as output format (including translation via the NVPTX backend). Target-specific parts of the fusion process have been refactored to TargetFusionInformation.

The connecting logic to the JIT compiler in the SYCL RT has been extended to produce valid PI device binaries for the CUDA backend/PI.

Heterogeneous ND ranges are not yet supported for the CUDA backend.

@sommerlukas sommerlukas requested a review from victor-eds as a code owner March 23, 2023 08:39
@sommerlukas sommerlukas self-assigned this Mar 23, 2023
@sommerlukas sommerlukas requested review from Naghasan and a team as code owners March 23, 2023 08:39
@sommerlukas
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1683

@sommerlukas
Copy link
Contributor Author

@sergey-semenov: The changes to graph_builder.cpp and commands.hpp are necessary to avoid deletion of dependencies.

With fusion, the individual commands for each kernel are replaced by a single command for the fused kernel, and the original commands are deleted without execution. Without this modification, the destructor of the command would call cleanDepEventsThroughOneLevel, which would not only delete the dependency edges of the original command, but also its dependencies.

This would yield an incomplete dependency graph. Instead, the dependencies of the original command are deleted before removal, but only in the case a command is removed without execution during fusion.

@sommerlukas sommerlukas temporarily deployed to aws March 23, 2023 09:06 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws March 23, 2023 09:37 — with GitHub Actions Inactive
@sommerlukas sommerlukas force-pushed the experiments/cuda-fusion branch from d323beb to 2737182 Compare March 28, 2023 14:31
@sommerlukas sommerlukas temporarily deployed to aws March 28, 2023 14:58 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws March 28, 2023 17:54 — with GitHub Actions Inactive
@sommerlukas sommerlukas force-pushed the experiments/cuda-fusion branch from 2737182 to e865d0b Compare April 5, 2023 14:35
@sommerlukas sommerlukas requested a review from a team as a code owner April 5, 2023 14:35
@sommerlukas sommerlukas temporarily deployed to aws April 5, 2023 15:10 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws April 5, 2023 15:57 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws April 18, 2023 12:17 — with GitHub Actions Inactive
@sommerlukas sommerlukas force-pushed the experiments/cuda-fusion branch from 15a6928 to 574763b Compare April 19, 2023 17:19
sommerlukas added 14 commits May 4, 2023 09:08
Parse each input binary only once.

Groom the nvvm annotations for functions deleted before fusion.

Signed-off-by: Lukas Sommer <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Lukas Sommer <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Lukas Sommer <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Lukas Sommer <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
@sommerlukas sommerlukas force-pushed the experiments/cuda-fusion branch from 9ae366b to 88b4ada Compare May 4, 2023 09:11
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the runtime and design doc changes LGTM. Only a couple minor nits.

@sommerlukas sommerlukas temporarily deployed to aws May 4, 2023 15:29 — with GitHub Actions Inactive
@sommerlukas sommerlukas temporarily deployed to aws May 4, 2023 20:46 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented May 4, 2023

@intel/dpcpp-clang-driver-reviewers, could you review driver's changes, please?

Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for driver

@sommerlukas
Copy link
Contributor Author

@intel/llvm-gatekeepers This is now approved, could someone please merge this?

@aelovikov-intel aelovikov-intel merged commit a93e59d into intel:sycl May 8, 2023
@aelovikov-intel
Copy link
Contributor

I'm seeing

Unexpectedly Passed Tests (1):
  SYCL :: KernelFusion/device_info_descriptor.cpp

in CUDA pre-commit CI tasks on unrelated PRs. @sommerlukas would you take a look at it, please?

@sommerlukas
Copy link
Contributor Author

I'm seeing

Unexpectedly Passed Tests (1):
  SYCL :: KernelFusion/device_info_descriptor.cpp

in CUDA pre-commit CI tasks on unrelated PRs. @sommerlukas would you take a look at it, please?

@aelovikov-intel Is pre-commit CI running with the latest version of the e2e tests, specifically the device_info_descriptor test? Prior to this PR, the test was marked XFAIL for cuda, but that was removed in this PR (only XFAIL on hip now).

If the version of the e2e tests in pre-commit CI still expects XFAIL for cuda, but the test now passed, that would explain the unexpected pass on CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants