Skip to content

AArch64 SVE: Multiple ptrue instructions not merged #137040

Open
@MatzeB

Description

@MatzeB

SVE code operating with different vector types has a tendency to produce multiple ptrue instructions.

For example I got this from an internal user:

#include <arm_sve.h>
svuint32_t getSveVec(const uint32_t* inputPtr) {
    svuint64_t vec = svld1uw_u64(svptrue_b64(), inputPtr);
    svuint32_t clzV1 = svclz_u32_x(svptrue_b32(), svreinterpret_u32_u64(vec));
    return clzV1;
}

Producing something like this:

$ clang++ -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16 -O3 -S -o - dup2.cpp
...
        ptrue   p0.d
        ld1w    { z0.d }, p0/z, [x0]
        ptrue   p0.s
        clz     z0.s, p0/m, z0.s
        ret

My understanding is that a ptrue p0.b would suffice here and in fact GCC is producing that code.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions