Skip to content
forked from iree-org/iree
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(iree,cpu): put linalg producers for iree_linalg_ext.scan into separate dispatch #40

Merged

Conversation

AGindinson
Copy link

This fix is necessitated by lack of fusion support for LinalgExt Scan - since
the default LLVMCPU tiling does not respect the fusability of producer ops, nor
the limit for stack-bound allocations, we force this non-fusable "cumulative
reduction"-style operation to be dispatched separately from non-trivial linalg
operations.

Further enhancements should include tiling config fine-tuning for LinalgExt
operations, conscious restrictions of work-group level tiling depending on the
predicted size of stack-bound allocas within a dispatch, and further adoption
of LinalgFusionOpInterface for LinalgExt operations that cannot be expressed
through simple reduction iterators. Possible adjustments to the
FormDispatchRegion algorithm itself are noted as TODO items.

Copy link
Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@AGindinson AGindinson marked this pull request as ready for review March 17, 2025 10:47
@AGindinson AGindinson force-pushed the artem/cpu-stack/dispatch-region-wa branch 6 times, most recently from 6242331 to 4d8702f Compare March 17, 2025 15:00
Copy link
Collaborator

@chrsmcgrr chrsmcgrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just nits

…eparate dispatch

This fix is necessitated by lack of fusion support for LinalgExt Scan - since
the default LLVMCPU tiling does not respect the fusability of producer ops, nor
the limit for stack-bound allocations, we force this non-fusable "cumulative
reduction"-style operation to be dispatched separately from non-trivial linalg
operations.

Further enhancements should include tiling config fine-tuning for LinalgExt
operations, conscious restrictions of work-group level tiling depending on the
predicted size of stack-bound allocas within a dispatch, and further adoption
of `LinalgFusionOpInterface` for LinalgExt operations that cannot be expressed
through simple reduction iterators. Possible adjustments to the
FormDispatchRegion algorithm itself are noted as TODO items.
@AGindinson AGindinson force-pushed the artem/cpu-stack/dispatch-region-wa branch from 52737ce to eacf16f Compare March 17, 2025 17:19
@AGindinson AGindinson merged commit 8daa67e into integrate-iree-20250217 Mar 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants