Skip to content

Bad codegen for boolean reductions on thumbv7neon #146

Open
@hsivonen

Description

@hsivonen

Using cargo build --release --target thumbv7neon-unknown-linux-gnueabihf:

With packed_simd_2, this:

#[no_mangle]
#[inline(never)]
fn packed_any(s: m8x16) -> bool {
    s.any()
}

compiles to this:

00003e66 <packed_any>:
    3e66:       f960 0acf       vld1.64 {d16-d17}, [r0]
    3e6a:       ff40 0aa1       vpmax.u8        d16, d16, d17
    3e6e:       ec51 0b30       vmov    r0, r1, d16
    3e72:       4308            orrs    r0, r1
    3e74:       bf18            it      ne
    3e76:       2001            movne   r0, #1
    3e78:       4770            bx      lr
        ...

With core_simd, this:

#[no_mangle]
#[inline(never)]
fn core_any(s: mask8x16) -> bool {
    s.any()
}

compiles to this:

00003e66 <core_any>:
    3e66:       b5f0            push    {r4, r5, r6, r7, lr}
    3e68:       f960 0acf       vld1.64 {d16-d17}, [r0]
    3e6c:       eed0 0bb0       vmov.u8 r0, d16[1]
    3e70:       eed0 1b90       vmov.u8 r1, d16[0]
    3e74:       eed0 2bd0       vmov.u8 r2, d16[2]
    3e78:       eed0 3bf0       vmov.u8 r3, d16[3]
    3e7c:       eef0 cb90       vmov.u8 ip, d16[4]
    3e80:       eef0 ebb0       vmov.u8 lr, d16[5]
    3e84:       eef0 4bd0       vmov.u8 r4, d16[6]
    3e88:       eef0 7bf0       vmov.u8 r7, d16[7]
    3e8c:       eed1 5bf0       vmov.u8 r5, d17[3]
    3e90:       eef1 6b90       vmov.u8 r6, d17[4]
    3e94:       4308            orrs    r0, r1
    3e96:       eed1 1b90       vmov.u8 r1, d17[0]
    3e9a:       4310            orrs    r0, r2
    3e9c:       eed1 2bb0       vmov.u8 r2, d17[1]
    3ea0:       4318            orrs    r0, r3
    3ea2:       eed1 3bd0       vmov.u8 r3, d17[2]
    3ea6:       ea40 000c       orr.w   r0, r0, ip
    3eaa:       ea40 000e       orr.w   r0, r0, lr
    3eae:       4320            orrs    r0, r4
    3eb0:       eef1 4bb0       vmov.u8 r4, d17[5]
    3eb4:       4338            orrs    r0, r7
    3eb6:       eef1 7bd0       vmov.u8 r7, d17[6]
    3eba:       4308            orrs    r0, r1
    3ebc:       eef1 1bf0       vmov.u8 r1, d17[7]
    3ec0:       4310            orrs    r0, r2
    3ec2:       4318            orrs    r0, r3
    3ec4:       4328            orrs    r0, r5
    3ec6:       4330            orrs    r0, r6
    3ec8:       4320            orrs    r0, r4
    3eca:       4338            orrs    r0, r7
    3ecc:       4308            orrs    r0, r1
    3ece:       f000 0001       and.w   r0, r0, #1
    3ed2:       bdf0            pop     {r4, r5, r6, r7, pc}

Additional info

This seriously regresses performance (packed_simd_2 vs. core_simd) for encoding_rs.

Previously when migrating from simd to packed_simd.

Meta

rustc --version --verbose:

rustc 1.55.0-nightly (b1f8e27b7 2021-07-15)
binary: rustc
commit-hash: b1f8e27b74c541d3d555149c8efa4bfe9385cd56
commit-date: 2021-07-15
host: armv7-unknown-linux-gnueabihf
release: 1.55.0-nightly
LLVM version: 12.0.1

stdsimd rev 715f9ac

Metadata

Metadata

Labels

A-LLVMArea: LLVMA-codegenArea: Code generationC-bugCategory: Bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions