[SYCL-TLA] Enable SYCL-TLA build #2030

LuFinch · 2025-09-11T01:17:13Z

This is a draft PR to enable SYCL-TLA build in torch-xpu-ops so that we can test SYCL-TLA kernels' accuracy/performance in Pytorch when SDPA/GEMM kernels are ready.

After discussion with Eikan, we decided to put build logic in torch-xpu-ops while put kernels source code in Pytorch in-tree. Please put your SYCL-TLA kernel source code in Pytorch and set its path as part of ATen_XPU_SYCLTLA_SRCS in torch-xpu-ops/src/ATen/CMakeLists.txt.

Since SYCL-TLA has different compilation options compared with normal SYCL kernels in torch-xpu-ops, I make the logic in cmake/BuildFlags.cmake as a macro so that I can reuse the common compilation options.

Since there is not a determined plan of how to import sycl-tla repo, I git clone the main branch in cmake for debug convinence. We can pin commit after sycl-tla has first release tag

Depend on g++ upgrading to gcc13, otherwise the sycltla kernel won't build

cmake/Modules/FindSYCL/run_sycl.cmake

guangyey · 2025-10-21T05:57:12Z

src/ATen/CMakeLists.txt

-file(GLOB xpu_native_cpp "native/xpu/*.cpp" "native/sparse/*.cpp" "native/sparse/xpu/*.cpp" "native/nested/*.cpp" "native/nested/xpu/*.cpp" "native/transformers/*.cpp" "native/quantized/*.cpp")
+file(GLOB xpu_native_cpp "native/xpu/*.cpp" "native/sparse/*.cpp" "native/sparse/xpu/*.cpp" "native/nested/*.cpp" "native/nested/xpu/*.cpp" "native/transformers/*.cpp" "native/quantized/*.cpp" ${TORCH_ROOT}/aten/src/ATen/native/transformers/xpu/flash_attn/*.cpp)
 file(GLOB xpu_sycl "native/xpu/sycl/*.cpp" "native/sparse/xpu/sycl/*.cpp" "native/nested/xpu/sycl/*.cpp" "native/transformers/sycl/*.cpp" "native/quantized/sycl/*.cpp")
+file(GLOB xpu_sycltla "${TORCH_ROOT}/aten/src/ATen/native/transformers/xpu/flash_attn/sycltla/*.cpp")


I can't find the folder under ${TORCH_ROOT}/aten/src/ATen/native/transformers/xpu in https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native/transformers, is there any dependency?

The PR containing ${TORCH_ROOT}/aten/src/ATen/native/transformers/xpu depends on this build enabling PR. I will create a new PR to put sycl-tla flash attention kernel in Pytorch after this PR merged.

It’s a bit unusual that we are building some files from PyTorch and some from torch-xpu-ops into a single .so.
Could we decouple them? For example, we could keep the implementations in torch-xpu-ops and provide a header file for PyTorch. PyTorch would then only use the APIs exposed in the header file.

For example, we could keep the implementations in torch-xpu-ops and provide a header file for PyTorch. PyTorch would then only use the APIs exposed in the header file.

=> I used to implement sycltla sdpa with this method: put kernel in torch-xpu-ops, expose a header file and call from Pytorch. However, it is hard to maintain the code after upstream. For example, we need to prepare a Pytorch PR and a torch-xpu-ops PR when we want to change the APIs after first upstream. The torch-xpu-ops PR can't be built in CI because the Pytorch PR hasn't merged yet. The Pytorch PR can't be built in CI because the torch-xpu-ops PR hasn't merged yet.

It is more reasonable to put all code in Pytorch only or in torch-xpu-ops only. Since the SDPA_overrideable is registered in Pytorch already, I think put sycltla kernel in Pytorch intree is more convinent.

It doesn’t seem reasonable that a file located in PyTorch is being built into a third-party library. This design doesn’t make much sense to me.

cmake/BuildFlags.cmake

cmake/Modules/FindSYCL.cmake

LuFinch · 2025-10-22T06:58:02Z

@fengyuan14 @EikanWang Could you help review and give some comments?

LuFinch force-pushed the lfq/cutlass branch from 8b015d5 to 4ae8003 Compare September 11, 2025 02:08

LuFinch force-pushed the lfq/cutlass branch 6 times, most recently from aa10375 to e9800af Compare October 20, 2025 07:51

LuFinch changed the title ~~[Cutlass] Enable Cutlass with host compiler~~ [SYCL-TLA] Enable SYCL-TLA build with host compiler Oct 20, 2025

LuFinch force-pushed the lfq/cutlass branch from e9800af to 2ff6ab8 Compare October 21, 2025 02:31

LuFinch marked this pull request as ready for review October 21, 2025 02:33

LuFinch requested review from Copilot, fengyuan14 and guangyey October 21, 2025 02:33

This comment was marked as abuse.

Sign in to view

LuFinch requested a review from EikanWang October 21, 2025 02:34

LuFinch changed the title ~~[SYCL-TLA] Enable SYCL-TLA build with host compiler~~ [SYCL-TLA] Enable SYCL-TLA build Oct 21, 2025

guangyey reviewed Oct 21, 2025

View reviewed changes

cmake/Modules/FindSYCL/run_sycl.cmake Outdated Show resolved Hide resolved

guangyey reviewed Oct 21, 2025

View reviewed changes

cmake/BuildFlags.cmake Outdated Show resolved Hide resolved

guangyey reviewed Oct 22, 2025

View reviewed changes

cmake/Modules/FindSYCL.cmake Show resolved Hide resolved

LuFinch added 5 commits October 31, 2025 01:43

enable sycltla build and add xpu flash_attn as build dir

7e93059

fix lint

498e12b

fix lint

cccd39d

fix comments

2e8becc

update tag and include path

f249c3f

LuFinch force-pushed the lfq/cutlass branch from 7a4a3c9 to f249c3f Compare October 31, 2025 08:43

update sycl-tla tag to v0.6

84296da

LuFinch mentioned this pull request Nov 5, 2025

[xpu][feature][1/N] Integrate SYCL-TLA FlashAttention forward/backward kernels pytorch/pytorch#167056

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL-TLA] Enable SYCL-TLA build #2030

[SYCL-TLA] Enable SYCL-TLA build #2030

Uh oh!

LuFinch commented Sep 11, 2025 •

edited

Loading

Uh oh!

This comment was marked as abuse.

Uh oh!

Uh oh!

guangyey Oct 21, 2025

Uh oh!

LuFinch Oct 21, 2025 •

edited

Loading

Uh oh!

guangyey Oct 22, 2025

Uh oh!

LuFinch Oct 22, 2025 •

edited

Loading

Uh oh!

guangyey Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

LuFinch commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SYCL-TLA] Enable SYCL-TLA build #2030

Are you sure you want to change the base?

[SYCL-TLA] Enable SYCL-TLA build #2030

Uh oh!

Conversation

LuFinch commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as abuse.

Uh oh!

Uh oh!

guangyey Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

LuFinch Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangyey Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

LuFinch Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangyey Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LuFinch commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LuFinch commented Sep 11, 2025 •

edited

Loading

LuFinch Oct 21, 2025 •

edited

Loading

LuFinch Oct 22, 2025 •

edited

Loading