-
Notifications
You must be signed in to change notification settings - Fork 63
[SYCL-TLA] Enable SYCL-TLA build #2030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8b015d5 to
4ae8003
Compare
aa10375 to
e9800af
Compare
| file(GLOB xpu_native_cpp "native/xpu/*.cpp" "native/sparse/*.cpp" "native/sparse/xpu/*.cpp" "native/nested/*.cpp" "native/nested/xpu/*.cpp" "native/transformers/*.cpp" "native/quantized/*.cpp") | ||
| file(GLOB xpu_native_cpp "native/xpu/*.cpp" "native/sparse/*.cpp" "native/sparse/xpu/*.cpp" "native/nested/*.cpp" "native/nested/xpu/*.cpp" "native/transformers/*.cpp" "native/quantized/*.cpp" ${TORCH_ROOT}/aten/src/ATen/native/transformers/xpu/flash_attn/*.cpp) | ||
| file(GLOB xpu_sycl "native/xpu/sycl/*.cpp" "native/sparse/xpu/sycl/*.cpp" "native/nested/xpu/sycl/*.cpp" "native/transformers/sycl/*.cpp" "native/quantized/sycl/*.cpp") | ||
| file(GLOB xpu_sycltla "${TORCH_ROOT}/aten/src/ATen/native/transformers/xpu/flash_attn/sycltla/*.cpp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find the folder under ${TORCH_ROOT}/aten/src/ATen/native/transformers/xpu in https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native/transformers, is there any dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR containing ${TORCH_ROOT}/aten/src/ATen/native/transformers/xpu depends on this build enabling PR. I will create a new PR to put sycl-tla flash attention kernel in Pytorch after this PR merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s a bit unusual that we are building some files from PyTorch and some from torch-xpu-ops into a single .so.
Could we decouple them? For example, we could keep the implementations in torch-xpu-ops and provide a header file for PyTorch. PyTorch would then only use the APIs exposed in the header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, we could keep the implementations in torch-xpu-ops and provide a header file for PyTorch. PyTorch would then only use the APIs exposed in the header file.
=> I used to implement sycltla sdpa with this method: put kernel in torch-xpu-ops, expose a header file and call from Pytorch. However, it is hard to maintain the code after upstream. For example, we need to prepare a Pytorch PR and a torch-xpu-ops PR when we want to change the APIs after first upstream. The torch-xpu-ops PR can't be built in CI because the Pytorch PR hasn't merged yet. The Pytorch PR can't be built in CI because the torch-xpu-ops PR hasn't merged yet.
It is more reasonable to put all code in Pytorch only or in torch-xpu-ops only. Since the SDPA_overrideable is registered in Pytorch already, I think put sycltla kernel in Pytorch intree is more convinent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn’t seem reasonable that a file located in PyTorch is being built into a third-party library. This design doesn’t make much sense to me.
|
@fengyuan14 @EikanWang Could you help review and give some comments? |
This is a draft PR to enable SYCL-TLA build in torch-xpu-ops so that we can test SYCL-TLA kernels' accuracy/performance in Pytorch when SDPA/GEMM kernels are ready.
After discussion with Eikan, we decided to put build logic in torch-xpu-ops while put kernels source code in Pytorch in-tree. Please put your SYCL-TLA kernel source code in Pytorch and set its path as part of
ATen_XPU_SYCLTLA_SRCSintorch-xpu-ops/src/ATen/CMakeLists.txt.Since SYCL-TLA has different compilation options compared with normal SYCL kernels in torch-xpu-ops, I make the logic in
cmake/BuildFlags.cmakeas a macro so that I can reuse the common compilation options.Since there is not a determined plan of how to import sycl-tla repo, I git clone the main branch in cmake for debug convinence. We can pin commit after sycl-tla has first release tag
Depend on g++ upgrading to gcc13, otherwise the sycltla kernel won't build