Skip to content

Inefficient implementation of PartialEq for nested (fieldless) enums #132628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zaneduffield opened this issue Nov 5, 2024 · 12 comments
Open
Assignees
Labels
A-codegen Area: Code generation A-enum Area: Enums (discriminated unions, or more generally ADTs (algebraic data types)) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@zaneduffield
Copy link

zaneduffield commented Nov 5, 2024

I'm not sure if this is the right place for this (might be LLVM to blame), just a bit of inefficient code that I noticed.

https://godbolt.org/z/K57orYj5h

The only difference between the two functions eq and matches is the use of == and matches! for the enum comparison. The generated code for eq includes two calls to PartialEq for Outer, whereas the code for matches has a much simpler (inline) comparison.

Also, the generated code for PartialEq seems very inefficient, given the enum is just a two-byte value that can be directly compared.

If you tinker with the enum definitions it's not hard to cause eq to optimise exactly like matches.

Example copied here

#[derive(PartialEq)]
pub enum InnerInner {
    One,
    Two,
}

#[derive(PartialEq)]
pub enum Inner {
    One,
    Two,
    Const(InnerInner),
}

#[derive(PartialEq)]
pub enum Outer {
    One(Inner),
    Two,
    Three,
    Four,
    Five(Inner),
    Six(Inner),
    Seven(Inner),
    Eight(Inner),
}

#[no_mangle]
pub fn eq(t: Outer, b: &mut bool) {
    let token_type = if t == Outer::One(Inner::One) {
        Outer::One(Inner::One)
    } else {
        Outer::Two
    };

    *b = token_type == Outer::Two;
}

#[no_mangle]
pub fn matches(t: Outer, b: &mut bool) {
    let token_type = if matches!(t, Outer::One(Inner::One)) {
        Outer::One(Inner::One)
    } else {
        Outer::Two
    };

    *b = matches!(token_type, Outer::Two);
}
@zaneduffield zaneduffield added the C-bug Category: This is a bug. label Nov 5, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Nov 5, 2024
@zaneduffield
Copy link
Author

This missed optimisation appears to have been introduced in Rust 1.65.

@workingjubilee
Copy link
Member

Rust 1.65 was an LLVM upgrade.

@workingjubilee workingjubilee added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. I-heavy Issue: Problems and improvements with respect to binary size of generated code. A-enum Area: Enums (discriminated unions, or more generally ADTs (algebraic data types)) and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 5, 2024
@workingjubilee
Copy link
Member

I was going to say "gonna need a benchmark" but then I actually looked at the size of that regression. Damn.

@dianqk
Copy link
Member

dianqk commented Nov 5, 2024

One difference is that in MIR, Rust 1.65 inlined <InnerInner as PartialEq>::eq.

@workingjubilee
Copy link
Member

cc @saethlin

@saethlin
Copy link
Member

saethlin commented Nov 5, 2024

One difference is that in MIR, Rust 1.65 inlined <InnerInner as PartialEq>::eq.

It looks to me like that function is correctly inlined in 1.65, 1.66, and current nightly. I don't see any problematic MIR inliner behavior in any version. Not that the MIR is neat and tidy or anything like that, but it looks like the inliner is behaving right.

@saethlin
Copy link
Member

saethlin commented Nov 5, 2024

Oh I see, you meant that the problem is that we are inlining <InnerInner as PartialEq>::eq and maybe shouldn't be.

I can confirm that indeed making that function not inline in MIR fixes the regression. Though I have no idea how/why: https://godbolt.org/z/8j3MhjdW4

@saethlin
Copy link
Member

saethlin commented Nov 5, 2024

@scottmcm I wonder if this is a case where the MIR inliner is blowing up the size of the caller and that's making LLVM optimizations (perhaps just one key optimization) give up on the caller because there's too much IR in it. Note that this reproducer has been properly minimized; deleting any of the variants causes it to optimize correctly. I think you were working on an inliner tweak that considered the size of the caller?

@dianqk
Copy link
Member

dianqk commented Nov 5, 2024

Oh I see, you meant that the problem is that we are inlining <InnerInner as PartialEq>::eq and maybe shouldn't be.

I can confirm that indeed making that function not inline in MIR fixes the regression. Though I have no idea how/why: https://godbolt.org/z/8j3MhjdW4

I'm documenting what little progress I've made. It's weird, but inline shouldn't be a problem.

@dianqk dianqk self-assigned this Nov 6, 2024
@clubby789
Copy link
Contributor

Seems fixed as of current nightly:

eq:
        xor     sil, 2
        or      sil, dil
        setne   byte ptr [rdx]
        ret
define void @eq(i8 noundef range(i8 0, 8) %0, i8 %1, ptr noalias nocapture noundef writeonly align 1 dereferenceable(1) initializes((0, 1)) %b) unnamed_addr {
start:
  %_5.i = icmp ne i8 %0, 0
  %_68.i = icmp ne i8 %1, 2
  %or.cond.not = select i1 %_5.i, i1 true, i1 %_68.i
  %_0.sroa.0.0.i6 = zext i1 %or.cond.not to i8
  store i8 %_0.sroa.0.0.i6, ptr %b, align 1
  ret void
}

@clubby789 clubby789 added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label May 3, 2025
@zaneduffield
Copy link
Author

Seems fixed as of current nightly:

So it does! I wonder what has changed 🤔

@moxian
Copy link
Contributor

moxian commented May 3, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-enum Area: Enums (discriminated unions, or more generally ADTs (algebraic data types)) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants