Weird codegen with avx512 feature and masks > 8 lanes

I'm investigating performance differences between `packed_simd` and this crate. My assumption was that the separate bitmask representation used for avx512 could lead to better performance but instead I see some rather weird assembly for mask handling. This seems to be specific to masks with more than 8 lanes. The following is a reduced example. The same behavior can also be seen with other comparison operations like `simd_eq` instead of `is_nan`.

```rust
#[inline(never)]
fn nan_bitmask_16(data: &[f32; 16]) -> u16 {
    let chunk = f32x16::from_slice(data);

    let is_nan = chunk.is_nan();
    is_nan.to_bitmask()
}

#[inline(never)]
fn nan_bitmask_8(data: &[f32; 8]) -> u8 {
    let chunk = f32x8::from_slice(data);

    let is_nan = chunk.is_nan();
    is_nan.to_bitmask()
}
```

The generated code for 8 lanes looks very good:

```asm
vxorps xmm0,xmm0,xmm0
vcmpunordps k0,ymm0,YMMWORD PTR [rdi]
kmovd  eax,k0
vzeroupper 
ret
```

But for 16 lanes something strange is happening:

```asm
push   rax
vxorps xmm0,xmm0,xmm0
vcmpunordps k0,zmm0,ZMMWORD PTR [rdi]
kmovw  WORD PTR [rsp],k0
kmovd  eax,k0
movzx  ecx,BYTE PTR [rsp+0x1]
shl    ecx,0x8
movzx  eax,al
or eax,ecx
pop rcx
vzeroupper 
ret 
```

The mask seems to get spilled to the stack, the high byte gets reloaded and then both halves get combined back into a 16bit word.

This was observed using portable_simd as a crate (actually as an example binary inside a repo checkout) as it was easier to try and simplify the example. I observed similar code in a bigger example using the version from the standard library with `Zbuild-std`.

### Meta

```
$ rustc +nightly --version
rustc 1.66.0-nightly (81f391930 2022-10-09)
```

portable_simd commit aad8f0aba586c209a7bbc1678d5115cc66c785b4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weird codegen with avx512 feature and masks > 8 lanes #312

Meta

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Weird codegen with avx512 feature and masks > 8 lanes #312

Description

Meta

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions