Missed autovectorization opportunity (aarch64, SVE, gather load)

Got a report of a simple loop that should autovectorize but does not do so on aarch64 (but does on x86 / AVX512). Repro:

```
#include <stdint.h>
#include <stdlib.h>

void noAutovec(uint32_t* __restrict ip, float* __restrict src, float* __restrict dst, size_t n) {
   //  If you encourage the compiler with the `#pragma` this does autovectorize.
   //  #pragma clang loop vectorize(enable)
    for (size_t i=0; i<n; ++i) {
      uint32_t idx = ip[i];
      dst[i] = src[idx];
    }
}
```

This vectorizes on x86 (`clang -march=haswell -mavx512f -O3`) but does not on aarch64 in my experiments (`clang -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16`).

Using `#pragma clang loop vectorize(enable)` makes vectorization work on aarch64. So this hints at the cost-model rejecting things (I assume vectorization should be beneficial when SVE is available).

(this mirrors meta T222824954 )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missed autovectorization opportunity (aarch64, SVE, gather load) #137894

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missed autovectorization opportunity (aarch64, SVE, gather load) #137894

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions