fix(iree,cpu): put linalg producers for iree_linalg_ext.scan
into separate dispatch
#40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fix is necessitated by lack of fusion support for LinalgExt Scan - since
the default LLVMCPU tiling does not respect the fusability of producer ops, nor
the limit for stack-bound allocations, we force this non-fusable "cumulative
reduction"-style operation to be dispatched separately from non-trivial linalg
operations.
Further enhancements should include tiling config fine-tuning for LinalgExt
operations, conscious restrictions of work-group level tiling depending on the
predicted size of stack-bound allocas within a dispatch, and further adoption
of
LinalgFusionOpInterface
for LinalgExt operations that cannot be expressedthrough simple reduction iterators. Possible adjustments to the
FormDispatchRegion algorithm itself are noted as TODO items.