Skip to content
forked from iree-org/iree
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(CUDA): add new target for testing #30

Merged
merged 1 commit into from
Mar 12, 2025

Conversation

chrsmcgrr
Copy link
Collaborator

@chrsmcgrr chrsmcgrr commented Feb 28, 2025

We want to try a simplified GPU pipeline for CUDA and test on top of it

This adds a new cellar-cuda target backend that implements a simplified CUDA pipeline. The key changes include:

  • A new CUDA CodeGen library that will be used to house our new pipeline for CUDA.
  • CUDAKernelConfig contains all the current strategies for lowering down to PTX. We will have for now TileAndFuse, VectorDistribute and GPUDistribute as our main three strategies for now.

Most of the code is copied from the original target. I would say review and check how everything is structured rather than looking at the details themselves.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@chrsmcgrr chrsmcgrr force-pushed the chris/roo-201-new-cuda-target branch 4 times, most recently from dcbdc5e to bd809d2 Compare March 7, 2025 08:06
@chrsmcgrr chrsmcgrr marked this pull request as ready for review March 10, 2025 09:20
@chrsmcgrr chrsmcgrr force-pushed the chris/roo-201-new-cuda-target branch from bd809d2 to d7bf5df Compare March 11, 2025 09:53
Copy link

ziereis commented Mar 12, 2025

Im still a little confused about what pipelines we want to run now.

The KernelConfig i could find the following getting set:

CodeGenPipeline::LLVMGPUMatmulTensorCore
CodeGenPipeline::LLVMGPUMatmulTensorCoreMmaSync
CodeGenPipeline::LLVMGPUVectorDistribute
CodeGenPipeline::LLVMGPUPadAndVectorDistribute
CodeGenPipeline::LLVMGPUTileAndFuse
CodeGenPipeline::LLVMGPUDistribute
CodeGenPipeline::LLVMGPUVectorize
CodeGenPipeline::LLVMGPUBaseLowering

my assumption was we only want:

CodeGenPipeline::LLVMGPUTileAndFuse
CodeGenPipeline::LLVMGPUDistribute
CodeGenPipeline::LLVMGPUVectorize

Im also pretty sure that these two pipelines dont work for nvidia (but i could be wrong)

CodeGenPipeline::LLVMGPUVectorDistribute
CodeGenPipeline::LLVMGPUPadAndVectorDistribute

Copy link
Collaborator Author

@ziereis so we decided to go for LLVMGPUTileAndFuse, LLVMGPUVectorDistribute, LLVMGPUDistribute as per our ticket. If they don't work the idea is we figure out why and fix them :D

Copy link

ziereis commented Mar 12, 2025

i thought we agreed on not using LLVMGPUVectorDistribute as its super complex and more or less "deprecated" as upstream IREE also wants to move to only using TileAndFuse for matmuls (e.g iree-org#19854, https://github.com/iree-org/iree/blob/00e88733e6b8c8cdb351d4516509f56daebdf604/compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp#L2391)

i might be misunderstanding something but it thought the main goal was to use TileAndFuse for almost everything

Also in this case now we still have 4 pipeline that are supposed to handle matmuls:

CodeGenPipeline::LLVMGPUMatmulTensorCore
CodeGenPipeline::LLVMGPUMatmulTensorCoreMmaSync
CodeGenPipeline::LLVMGPUTileAndFuse
CodeGenPipeline::LLVMGPUVectorDistribute

Copy link
Collaborator Author

I don't mind reducing the complexity by sticking to 2 pipelines (we need GPUDistribute as a fallback). These pipelines are starting points we will probably hard-fork them as well and create our own.

Copy link

@maxbartel maxbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice first step! Looking even more into it, forking is the only sane choice. We do quite a radical trimming

We want to try a simplified GPU pipeline for CUDA and test on top of it
@chrsmcgrr chrsmcgrr force-pushed the chris/roo-201-new-cuda-target branch from d7bf5df to a658bcb Compare March 12, 2025 12:05
@chrsmcgrr chrsmcgrr merged commit f39b1f1 into integrate-iree-20250217 Mar 12, 2025
1 check passed
chrsmcgrr added a commit that referenced this pull request Mar 18, 2025
chrsmcgrr added a commit that referenced this pull request Mar 18, 2025
chrsmcgrr added a commit that referenced this pull request Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants