Fallback SVE depending on vector lanes#9084
Open
stevesuzuki-arm wants to merge 2 commits intohalide:mainfrom
Open
Fallback SVE depending on vector lanes#9084stevesuzuki-arm wants to merge 2 commits intohalide:mainfrom
stevesuzuki-arm wants to merge 2 commits intohalide:mainfrom
Conversation
While Halide accepts arbitrary factor of vectorization, compiling it to SVE target with LLVM scalable vector type has some challenges: - Only vectors with lanes multiple-of-vscale is representable - Backend compiler is crashy for odd number of lanes This problem happens more frequently when running existing Halide unit tests on target with vector_bits longer than 128bit, because the vectorization factor is too short in some case, or unusual value for the purpose of corner case testing. Lowering everything with predicates might be an option, however, that would require invasive changes and the feasibility is unknown. The other option is to convert to/from fixed sized vector, but this causes the issue of mixing fixed and scalable in a intrin, , performance penalty, and also infeasible on target without NEON. To workaround this problem, we inspect vector lanes used in a function, and if we find any problematic number for given target.vector_bits, we stop using scalable vector entirely (i.e. set effective_vscale = 0). Intrinsic map for Call is created for both NEON and SVE at init_module, and dynamically selected for each function to compile. In case this fallback happens, user is warned that SVE is disabled. Note this situation can be typically resolved by vectorizing with a longer and power-of-two factor.
Member
|
I'm hoping the vector width legalization pass in #8629 might make this unnecessary. Let's try to get that merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While Halide accepts arbitrary factor of vectorization, compiling
it to SVE target with LLVM scalable vector type has some challenges:
This problem happens more frequently when running existing
Halide unit tests on target with vector_bits longer than 128bit,
because the vectorization factor is too short in some case, or
unusual value for the purpose of corner case testing.
Lowering everything with predicates might be an option, however,
that would require invasive changes and the feasibility is unknown.
The other option is to convert to/from fixed sized vector, but
this causes the issue of mixing fixed and scalable in a intrin,
, performance penalty, and also infeasible on target without NEON.
To workaround this problem, we inspect vector lanes used in a function,
and if we find any problematic number for given target.vector_bits,
we stop using scalable vector entirely (i.e. set effective_vscale = 0).
Intrinsic map for Call is created for both NEON and SVE at init_module,
and dynamically selected for each function to compile.
In case this fallback happens, user is warned that SVE is disabled.
Note this situation can be typically resolved by vectorizing with
a longer and power-of-two factor.
Checklist