Open
Description
SVE code operating with different vector types has a tendency to produce multiple ptrue
instructions.
For example I got this from an internal user:
#include <arm_sve.h>
svuint32_t getSveVec(const uint32_t* inputPtr) {
svuint64_t vec = svld1uw_u64(svptrue_b64(), inputPtr);
svuint32_t clzV1 = svclz_u32_x(svptrue_b32(), svreinterpret_u32_u64(vec));
return clzV1;
}
Producing something like this:
$ clang++ -target aarch64-redhat-linux-gnu -march=armv9-a+sve2+fp16 -O3 -S -o - dup2.cpp
...
ptrue p0.d
ld1w { z0.d }, p0/z, [x0]
ptrue p0.s
clz z0.s, p0/m, z0.s
ret
My understanding is that a ptrue p0.b
would suffice here and in fact GCC is producing that code.