-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(CUDA): add new target for testing #30
feat(CUDA): add new target for testing #30
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
dcbdc5e
to
bd809d2
Compare
bd809d2
to
d7bf5df
Compare
Im still a little confused about what pipelines we want to run now. The KernelConfig i could find the following getting set: CodeGenPipeline::LLVMGPUMatmulTensorCore my assumption was we only want: CodeGenPipeline::LLVMGPUTileAndFuse Im also pretty sure that these two pipelines dont work for nvidia (but i could be wrong) CodeGenPipeline::LLVMGPUVectorDistribute |
@ziereis so we decided to go for |
i thought we agreed on not using LLVMGPUVectorDistribute as its super complex and more or less "deprecated" as upstream IREE also wants to move to only using TileAndFuse for matmuls (e.g iree-org#19854, https://github.com/iree-org/iree/blob/00e88733e6b8c8cdb351d4516509f56daebdf604/compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp#L2391) i might be misunderstanding something but it thought the main goal was to use TileAndFuse for almost everything Also in this case now we still have 4 pipeline that are supposed to handle matmuls: CodeGenPipeline::LLVMGPUMatmulTensorCore |
I don't mind reducing the complexity by sticking to 2 pipelines (we need GPUDistribute as a fallback). These pipelines are starting points we will probably hard-fork them as well and create our own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice first step! Looking even more into it, forking is the only sane choice. We do quite a radical trimming
We want to try a simplified GPU pipeline for CUDA and test on top of it
d7bf5df
to
a658bcb
Compare
This reverts commit f39b1f1.
This reverts commit f39b1f1.
We want to try a simplified GPU pipeline for CUDA and test on top of it
This adds a new
cellar-cuda
target backend that implements a simplified CUDA pipeline. The key changes include:Most of the code is copied from the original target. I would say review and check how everything is structured rather than looking at the details themselves.