Skip to content

Testing Github actions as a contributor #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 13, 2025

Conversation

PlaidCat
Copy link
Collaborator

THIS IS BREAKING AND NOT INTENDED TO BE SHIPPED

This is to test the PR checker and the builders

THIS IS BREAKING AND NOT INTENDED TO BE SHIPPED
@PlaidCat PlaidCat merged commit 935c85c into ctrliq:{jmaple}_ciqlts8_8 Jan 13, 2025
1 check passed
github-actions bot pushed a commit that referenced this pull request May 1, 2025
JIRA: https://issues.redhat.com/browse/RHEL-68940

commit 031af50
Author: Mark Rutland <[email protected]>
Date:   Wed, 4 Jan 2023 15:16:26 +0000

    arm64: cmpxchg_double*: hazard against entire exchange variable

    The inline assembly for arm64's cmpxchg_double*() implementations use a
    +Q constraint to hazard against other accesses to the memory location
    being exchanged. However, the pointer passed to the constraint is a
    pointer to unsigned long, and thus the hazard only applies to the first
    8 bytes of the location.

    GCC can take advantage of this, assuming that other portions of the
    location are unchanged, leading to a number of potential problems.

    This is similar to what we fixed back in commit:

      fee960b ("arm64: xchg: hazard against entire exchange variable")

    ... but we forgot to adjust cmpxchg_double*() similarly at the same
    time.

    The same problem applies, as demonstrated with the following test:

    | struct big {
    |         u64 lo, hi;
    | } __aligned(128);
    |
    | unsigned long foo(struct big *b)
    | {
    |         u64 hi_old, hi_new;
    |
    |         hi_old = b->hi;
    |         cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
    |         hi_new = b->hi;
    |
    |         return hi_old ^ hi_new;
    | }

    ... which GCC 12.1.0 compiles as:

    | 0000000000000000 <foo>:
    |    0:   d503233f        paciasp
    |    4:   aa0003e4        mov     x4, x0
    |    8:   1400000e        b       40 <foo+0x40>
    |    c:   d2800240        mov     x0, #0x12                       // #18
    |   10:   d2800681        mov     x1, #0x34                       // #52
    |   14:   aa0003e5        mov     x5, x0
    |   18:   aa0103e6        mov     x6, x1
    |   1c:   d2800ac2        mov     x2, #0x56                       // #86
    |   20:   d2800f03        mov     x3, #0x78                       // #120
    |   24:   48207c82        casp    x0, x1, x2, x3, [x4]
    |   28:   ca050000        eor     x0, x0, x5
    |   2c:   ca060021        eor     x1, x1, x6
    |   30:   aa010000        orr     x0, x0, x1
    |   34:   d2800000        mov     x0, #0x0                        // #0    <--- BANG
    |   38:   d50323bf        autiasp
    |   3c:   d65f03c0        ret
    |   40:   d2800240        mov     x0, #0x12                       // #18
    |   44:   d2800681        mov     x1, #0x34                       // #52
    |   48:   d2800ac2        mov     x2, #0x56                       // #86
    |   4c:   d2800f03        mov     x3, #0x78                       // #120
    |   50:   f9800091        prfm    pstl1strm, [x4]
    |   54:   c87f1885        ldxp    x5, x6, [x4]
    |   58:   ca0000a5        eor     x5, x5, x0
    |   5c:   ca0100c6        eor     x6, x6, x1
    |   60:   aa0600a6        orr     x6, x5, x6
    |   64:   b5000066        cbnz    x6, 70 <foo+0x70>
    |   68:   c8250c82        stxp    w5, x2, x3, [x4]
    |   6c:   35ffff45        cbnz    w5, 54 <foo+0x54>
    |   70:   d2800000        mov     x0, #0x0                        // #0     <--- BANG
    |   74:   d50323bf        autiasp
    |   78:   d65f03c0        ret

    Notice that at the lines with "BANG" comments, GCC has assumed that the
    higher 8 bytes are unchanged by the cmpxchg_double() call, and that
    `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
    LL/SC versions of cmpxchg_double().

    This patch fixes the issue by passing a pointer to __uint128_t into the
    +Q constraint, ensuring that the compiler hazards against the entire 16
    bytes being modified.

    With this change, GCC 12.1.0 compiles the above test as:

    | 0000000000000000 <foo>:
    |    0:   f9400407        ldr     x7, [x0, #8]
    |    4:   d503233f        paciasp
    |    8:   aa0003e4        mov     x4, x0
    |    c:   1400000f        b       48 <foo+0x48>
    |   10:   d2800240        mov     x0, #0x12                       // #18
    |   14:   d2800681        mov     x1, #0x34                       // #52
    |   18:   aa0003e5        mov     x5, x0
    |   1c:   aa0103e6        mov     x6, x1
    |   20:   d2800ac2        mov     x2, #0x56                       // #86
    |   24:   d2800f03        mov     x3, #0x78                       // #120
    |   28:   48207c82        casp    x0, x1, x2, x3, [x4]
    |   2c:   ca050000        eor     x0, x0, x5
    |   30:   ca060021        eor     x1, x1, x6
    |   34:   aa010000        orr     x0, x0, x1
    |   38:   f9400480        ldr     x0, [x4, #8]
    |   3c:   d50323bf        autiasp
    |   40:   ca0000e0        eor     x0, x7, x0
    |   44:   d65f03c0        ret
    |   48:   d2800240        mov     x0, #0x12                       // #18
    |   4c:   d2800681        mov     x1, #0x34                       // #52
    |   50:   d2800ac2        mov     x2, #0x56                       // #86
    |   54:   d2800f03        mov     x3, #0x78                       // #120
    |   58:   f9800091        prfm    pstl1strm, [x4]
    |   5c:   c87f1885        ldxp    x5, x6, [x4]
    |   60:   ca0000a5        eor     x5, x5, x0
    |   64:   ca0100c6        eor     x6, x6, x1
    |   68:   aa0600a6        orr     x6, x5, x6
    |   6c:   b5000066        cbnz    x6, 78 <foo+0x78>
    |   70:   c8250c82        stxp    w5, x2, x3, [x4]
    |   74:   35ffff45        cbnz    w5, 5c <foo+0x5c>
    |   78:   f9400480        ldr     x0, [x4, #8]
    |   7c:   d50323bf        autiasp
    |   80:   ca0000e0        eor     x0, x7, x0
    |   84:   d65f03c0        ret

    ... sampling the high 8 bytes before and after the cmpxchg, and
    performing an EOR, as we'd expect.

    For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
    that linux-4.9.y is oldest currently supported stable release, and
    mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
    on my machines due to library incompatibilities.

    I've also used a standalone test to check that we can use a __uint128_t
    pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
    3.9.1.

    Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double")
    Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
    Reported-by: Boqun Feng <[email protected]>
    Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
    Reported-by: Peter Zijlstra <[email protected]>
    Link: https://lore.kernel.org/lkml/[email protected]/
    Signed-off-by: Mark Rutland <[email protected]>
    Cc: [email protected]
    Cc: Arnd Bergmann <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: Steve Capper <[email protected]>
    Cc: Will Deacon <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Will Deacon <[email protected]>

Signed-off-by: Waiman Long <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant