Skip to content

Fallback SVE depending on vector lanes#9084

Open
stevesuzuki-arm wants to merge 2 commits intohalide:mainfrom
stevesuzuki-arm:pr-fallback_sve
Open

Fallback SVE depending on vector lanes#9084
stevesuzuki-arm wants to merge 2 commits intohalide:mainfrom
stevesuzuki-arm:pr-fallback_sve

Conversation

@stevesuzuki-arm
Copy link
Copy Markdown
Contributor

While Halide accepts arbitrary factor of vectorization, compiling
it to SVE target with LLVM scalable vector type has some challenges:

  • Only vectors with lanes multiple-of-vscale is representable
  • Backend compiler is crashy for odd number of lanes

This problem happens more frequently when running existing
Halide unit tests on target with vector_bits longer than 128bit,
because the vectorization factor is too short in some case, or
unusual value for the purpose of corner case testing.

Lowering everything with predicates might be an option, however,
that would require invasive changes and the feasibility is unknown.
The other option is to convert to/from fixed sized vector, but
this causes the issue of mixing fixed and scalable in a intrin,
, performance penalty, and also infeasible on target without NEON.

To workaround this problem, we inspect vector lanes used in a function,
and if we find any problematic number for given target.vector_bits,
we stop using scalable vector entirely (i.e. set effective_vscale = 0).

Intrinsic map for Call is created for both NEON and SVE at init_module,
and dynamically selected for each function to compile.

In case this fallback happens, user is warned that SVE is disabled.
Note this situation can be typically resolved by vectorizing with
a longer and power-of-two factor.

Checklist

  • Tests added or updated (not required for docs, CI config, or typo fixes)
  • Documentation updated (if public API changed)
  • Python bindings updated (if public API changed)
  • Benchmarks are included here if the change is intended to affect performance.
  • Commits include AI attribution where applicable (see Code of Conduct)

While Halide accepts arbitrary factor of vectorization, compiling
it to SVE target with LLVM scalable vector type has some challenges:
- Only vectors with lanes multiple-of-vscale is representable
- Backend compiler is crashy for odd number of lanes

This problem happens more frequently when running existing
Halide unit tests on target with vector_bits longer than 128bit,
because the vectorization factor is too short in some case, or
unusual value for the purpose of corner case testing.

Lowering everything with predicates might be an option, however,
that would require invasive changes and the feasibility is unknown.
The other option is to convert to/from fixed sized vector, but
this causes the issue of mixing fixed and scalable in a intrin,
, performance penalty, and also infeasible on target without NEON.

To workaround this problem, we inspect vector lanes used in a function,
and if we find any problematic number for given target.vector_bits,
we stop using scalable vector entirely (i.e. set effective_vscale = 0).

Intrinsic map for Call is created for both NEON and SVE at init_module,
and dynamically selected for each function to compile.

In case this fallback happens, user is warned that SVE is disabled.
Note this situation can be typically resolved by vectorizing with
a longer and power-of-two factor.
@abadams
Copy link
Copy Markdown
Member

abadams commented Mar 30, 2026

I'm hoping the vector width legalization pass in #8629 might make this unnecessary. Let's try to get that merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants