-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Open
Description
The following code doesn't get auto vectorized on clang trunk when compiled with
-O3 -march=znver4
struct BitBoard
{
alignas(64) std::uint64_t chunks[4];
BitBoard& operator&=(const BitBoard& other);
};
BitBoard& BitBoard::operator&=(const BitBoard& other)
{
chunks[0] &= other.chunks[0];
chunks[1] &= other.chunks[1];
chunks[2] &= other.chunks[2];
chunks[3] &= other.chunks[3];
return *this;
}
resulting in
BitBoard::operator&=(BitBoard const&):
mov rcx, qword ptr [rsi]
mov rax, rdi
and qword ptr [rdi], rcx
mov rcx, qword ptr [rsi + 8]
and qword ptr [rdi + 8], rcx
mov rcx, qword ptr [rsi + 16]
and qword ptr [rdi + 16], rcx
mov rcx, qword ptr [rsi + 24]
and qword ptr [rdi + 24], rcx
ret
GCC performs such auto vectorization
see:
https://godbolt.org/z/8rKccboe1
same for arm
https://godbolt.org/z/9rT66P4hx