-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Fusion] Kernel Fusion support for CUDA backend #8747
Conversation
/verify with intel/llvm-test-suite#1683 |
@sergey-semenov: The changes to With fusion, the individual commands for each kernel are replaced by a single command for the fused kernel, and the original commands are deleted without execution. Without this modification, the destructor of the command would call This would yield an incomplete dependency graph. Instead, the dependencies of the original command are deleted before removal, but only in the case a command is removed without execution during fusion. |
d323beb
to
2737182
Compare
2737182
to
e865d0b
Compare
15a6928
to
574763b
Compare
Parse each input binary only once. Groom the nvvm annotations for functions deleted before fusion. Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
Signed-off-by: Lukas Sommer <[email protected]>
9ae366b
to
88b4ada
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the runtime and design doc changes LGTM. Only a couple minor nits.
Signed-off-by: Lukas Sommer <[email protected]>
@intel/dpcpp-clang-driver-reviewers, could you review driver's changes, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for driver
@intel/llvm-gatekeepers This is now approved, could someone please merge this? |
I'm seeing
in CUDA pre-commit CI tasks on unrelated PRs. @sommerlukas would you take a look at it, please? |
@aelovikov-intel Is pre-commit CI running with the latest version of the e2e tests, specifically the If the version of the e2e tests in pre-commit CI still expects |
Extend kernel fusion for the CUDA backend.
In contrast to the existing SPIR-V based backends, the default binary format for the CUDA backend (PTX or CUBIN) is not suitable as input for the kernel fusion JIT compiler.
This PR therefore extends the driver to additionally embed LLVM IR in the fat binary if the user specifies the
-fsycl-embed-ir
during compilation, by taking the output of thesycl-post-link
step for the CUDA backend.The JIT compiler has been extended to handle LLVM IR as input format and PTX assembly as output format (including translation via the NVPTX backend). Target-specific parts of the fusion process have been refactored to
TargetFusionInformation
.The connecting logic to the JIT compiler in the SYCL RT has been extended to produce valid PI device binaries for the CUDA backend/PI.
Heterogeneous ND ranges are not yet supported for the CUDA backend.