Skip to content

Missed autovectorization opportunity (aarch64, SVE, gather load) #137894

Open
@MatzeB

Description

@MatzeB

Got a report of a simple loop that should autovectorize but does not do so on aarch64 (but does on x86 / AVX512). Repro:

#include <stdint.h>
#include <stdlib.h>

void noAutovec(uint32_t* __restrict ip, float* __restrict src, float* __restrict dst, size_t n) {
   //  If you encourage the compiler with the `#pragma` this does autovectorize.
   //  #pragma clang loop vectorize(enable)
    for (size_t i=0; i<n; ++i) {
      uint32_t idx = ip[i];
      dst[i] = src[idx];
    }
}

This vectorizes on x86 (clang -march=haswell -mavx512f -O3) but does not on aarch64 in my experiments (clang -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16).

Using #pragma clang loop vectorize(enable) makes vectorization work on aarch64. So this hints at the cost-model rejecting things (I assume vectorization should be beneficial when SVE is available).

(this mirrors meta T222824954 )

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions