feat(CUDA): add new target for testing #30

chrsmcgrr · 2025-02-28T16:41:14Z

We want to try a simplified GPU pipeline for CUDA and test on top of it

This adds a new cellar-cuda target backend that implements a simplified CUDA pipeline. The key changes include:

A new CUDA CodeGen library that will be used to house our new pipeline for CUDA.
CUDAKernelConfig contains all the current strategies for lowering down to PTX. We will have for now TileAndFuse, VectorDistribute and GPUDistribute as our main three strategies for now.

Most of the code is copied from the original target. I would say review and check how everything is structured rather than looking at the details themselves.

chrsmcgrr · 2025-02-28T16:41:29Z

feat(CUDA): add new target for testing #30 👈 (View in Graphite)
integrate-iree-20250217

This stack of pull requests is managed by Graphite. Learn more about stacking.

ziereis · 2025-03-12T08:59:27Z

Im still a little confused about what pipelines we want to run now.

The KernelConfig i could find the following getting set:

CodeGenPipeline::LLVMGPUMatmulTensorCore
CodeGenPipeline::LLVMGPUMatmulTensorCoreMmaSync
CodeGenPipeline::LLVMGPUVectorDistribute
CodeGenPipeline::LLVMGPUPadAndVectorDistribute
CodeGenPipeline::LLVMGPUTileAndFuse
CodeGenPipeline::LLVMGPUDistribute
CodeGenPipeline::LLVMGPUVectorize
CodeGenPipeline::LLVMGPUBaseLowering

my assumption was we only want:

CodeGenPipeline::LLVMGPUTileAndFuse
CodeGenPipeline::LLVMGPUDistribute
CodeGenPipeline::LLVMGPUVectorize

Im also pretty sure that these two pipelines dont work for nvidia (but i could be wrong)

CodeGenPipeline::LLVMGPUVectorDistribute
CodeGenPipeline::LLVMGPUPadAndVectorDistribute

chrsmcgrr · 2025-03-12T09:02:31Z

@ziereis so we decided to go for LLVMGPUTileAndFuse, LLVMGPUVectorDistribute, LLVMGPUDistribute as per our ticket. If they don't work the idea is we figure out why and fix them :D

ziereis · 2025-03-12T09:24:15Z

i thought we agreed on not using LLVMGPUVectorDistribute as its super complex and more or less "deprecated" as upstream IREE also wants to move to only using TileAndFuse for matmuls (e.g iree-org#19854, https://github.com/iree-org/iree/blob/00e88733e6b8c8cdb351d4516509f56daebdf604/compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp#L2391)

i might be misunderstanding something but it thought the main goal was to use TileAndFuse for almost everything

Also in this case now we still have 4 pipeline that are supposed to handle matmuls:

CodeGenPipeline::LLVMGPUMatmulTensorCore
CodeGenPipeline::LLVMGPUMatmulTensorCoreMmaSync
CodeGenPipeline::LLVMGPUTileAndFuse
CodeGenPipeline::LLVMGPUVectorDistribute

chrsmcgrr · 2025-03-12T09:37:38Z

I don't mind reducing the complexity by sticking to 2 pipelines (we need GPUDistribute as a fallback). These pipelines are starting points we will probably hard-fork them as well and create our own.

maxbartel

Nice first step! Looking even more into it, forking is the only sane choice. We do quite a radical trimming

compiler/plugins/target/CellarCUDA/CUDATarget.cpp

compiler/plugins/target/CellarCUDA/CodeGen/CudaKernelConfig.cpp

We want to try a simplified GPU pipeline for CUDA and test on top of it

This reverts commit f39b1f1.

chrsmcgrr force-pushed the chris/roo-201-new-cuda-target branch 4 times, most recently from dcbdc5e to bd809d2 Compare March 7, 2025 08:06

chrsmcgrr marked this pull request as ready for review March 10, 2025 09:20

chrsmcgrr requested review from maxbartel, ziereis and devtbi March 10, 2025 09:20

chrsmcgrr force-pushed the chris/roo-201-new-cuda-target branch from bd809d2 to d7bf5df Compare March 11, 2025 09:53

maxbartel approved these changes Mar 12, 2025

View reviewed changes

feat(CUDA): add new target for testing

a658bcb

We want to try a simplified GPU pipeline for CUDA and test on top of it

chrsmcgrr force-pushed the chris/roo-201-new-cuda-target branch from d7bf5df to a658bcb Compare March 12, 2025 12:05

chrsmcgrr merged commit f39b1f1 into integrate-iree-20250217 Mar 12, 2025
1 check passed

chrsmcgrr added a commit that referenced this pull request Mar 18, 2025

Revert "feat(CUDA): add new target for testing (#30)"

fb4b26e

This reverts commit f39b1f1.

chrsmcgrr added a commit that referenced this pull request Mar 18, 2025

Revert "feat(CUDA): add new target for testing (#30)"

860c83b

This reverts commit f39b1f1.

chrsmcgrr added a commit that referenced this pull request Mar 21, 2025

Revert "feat(CUDA): add new target for testing (#30)" (#42)

64d99de

This reverts commit f39b1f1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(CUDA): add new target for testing #30

feat(CUDA): add new target for testing #30

chrsmcgrr commented Feb 28, 2025 •

edited

Loading

chrsmcgrr commented Feb 28, 2025

ziereis commented Mar 12, 2025

chrsmcgrr commented Mar 12, 2025

ziereis commented Mar 12, 2025

chrsmcgrr commented Mar 12, 2025

maxbartel left a comment

feat(CUDA): add new target for testing #30

feat(CUDA): add new target for testing #30

Conversation

chrsmcgrr commented Feb 28, 2025 • edited Loading

chrsmcgrr commented Feb 28, 2025

ziereis commented Mar 12, 2025

chrsmcgrr commented Mar 12, 2025

ziereis commented Mar 12, 2025

chrsmcgrr commented Mar 12, 2025

maxbartel left a comment

Choose a reason for hiding this comment

chrsmcgrr commented Feb 28, 2025 •

edited

Loading