Bad codegen for boolean reductions on thumbv7neon

Using `cargo build --release --target thumbv7neon-unknown-linux-gnueabihf`:

With `packed_simd_2`, this:

```rust
#[no_mangle]
#[inline(never)]
fn packed_any(s: m8x16) -> bool {
    s.any()
}
```
compiles to this:
```asm
00003e66 <packed_any>:
    3e66:       f960 0acf       vld1.64 {d16-d17}, [r0]
    3e6a:       ff40 0aa1       vpmax.u8        d16, d16, d17
    3e6e:       ec51 0b30       vmov    r0, r1, d16
    3e72:       4308            orrs    r0, r1
    3e74:       bf18            it      ne
    3e76:       2001            movne   r0, #1
    3e78:       4770            bx      lr
        ...
```

With `core_simd`, this:

```rust
#[no_mangle]
#[inline(never)]
fn core_any(s: mask8x16) -> bool {
    s.any()
}
```
compiles to this:
```asm
00003e66 <core_any>:
    3e66:       b5f0            push    {r4, r5, r6, r7, lr}
    3e68:       f960 0acf       vld1.64 {d16-d17}, [r0]
    3e6c:       eed0 0bb0       vmov.u8 r0, d16[1]
    3e70:       eed0 1b90       vmov.u8 r1, d16[0]
    3e74:       eed0 2bd0       vmov.u8 r2, d16[2]
    3e78:       eed0 3bf0       vmov.u8 r3, d16[3]
    3e7c:       eef0 cb90       vmov.u8 ip, d16[4]
    3e80:       eef0 ebb0       vmov.u8 lr, d16[5]
    3e84:       eef0 4bd0       vmov.u8 r4, d16[6]
    3e88:       eef0 7bf0       vmov.u8 r7, d16[7]
    3e8c:       eed1 5bf0       vmov.u8 r5, d17[3]
    3e90:       eef1 6b90       vmov.u8 r6, d17[4]
    3e94:       4308            orrs    r0, r1
    3e96:       eed1 1b90       vmov.u8 r1, d17[0]
    3e9a:       4310            orrs    r0, r2
    3e9c:       eed1 2bb0       vmov.u8 r2, d17[1]
    3ea0:       4318            orrs    r0, r3
    3ea2:       eed1 3bd0       vmov.u8 r3, d17[2]
    3ea6:       ea40 000c       orr.w   r0, r0, ip
    3eaa:       ea40 000e       orr.w   r0, r0, lr
    3eae:       4320            orrs    r0, r4
    3eb0:       eef1 4bb0       vmov.u8 r4, d17[5]
    3eb4:       4338            orrs    r0, r7
    3eb6:       eef1 7bd0       vmov.u8 r7, d17[6]
    3eba:       4308            orrs    r0, r1
    3ebc:       eef1 1bf0       vmov.u8 r1, d17[7]
    3ec0:       4310            orrs    r0, r2
    3ec2:       4318            orrs    r0, r3
    3ec4:       4328            orrs    r0, r5
    3ec6:       4330            orrs    r0, r6
    3ec8:       4320            orrs    r0, r4
    3eca:       4338            orrs    r0, r7
    3ecc:       4308            orrs    r0, r1
    3ece:       f000 0001       and.w   r0, r0, #1
    3ed2:       bdf0            pop     {r4, r5, r6, r7, pc}
```

### Additional info

This [seriously regresses performance](https://bug1719896.bmoattachments.org/attachment.cgi?id=9230565) (`packed_simd_2` vs. `core_simd`) for `encoding_rs`.

[Previously](https://github.com/rust-lang/packed_simd/issues/215) when migrating from `simd` to `packed_simd`.

### Meta

`rustc --version --verbose`:
```
rustc 1.55.0-nightly (b1f8e27b7 2021-07-15)
binary: rustc
commit-hash: b1f8e27b74c541d3d555149c8efa4bfe9385cd56
commit-date: 2021-07-15
host: armv7-unknown-linux-gnueabihf
release: 1.55.0-nightly
LLVM version: 12.0.1
```

`stdsimd` rev 715f9ac4e36ee303c3d464121ebb65df8f92416e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bad codegen for boolean reductions on thumbv7neon #146

Additional info

Meta

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bad codegen for boolean reductions on thumbv7neon #146

Description

Additional info

Meta

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions