Open
Description
Got a report of a simple loop that should autovectorize but does not do so on aarch64 (but does on x86 / AVX512). Repro:
#include <stdint.h>
#include <stdlib.h>
void noAutovec(uint32_t* __restrict ip, float* __restrict src, float* __restrict dst, size_t n) {
// If you encourage the compiler with the `#pragma` this does autovectorize.
// #pragma clang loop vectorize(enable)
for (size_t i=0; i<n; ++i) {
uint32_t idx = ip[i];
dst[i] = src[idx];
}
}
This vectorizes on x86 (clang -march=haswell -mavx512f -O3
) but does not on aarch64 in my experiments (clang -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16
).
Using #pragma clang loop vectorize(enable)
makes vectorization work on aarch64. So this hints at the cost-model rejecting things (I assume vectorization should be beneficial when SVE is available).
(this mirrors meta T222824954 )