-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Inefficient implementation of PartialEq
for nested (fieldless) enums
#132628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This missed optimisation appears to have been introduced in Rust 1.65. |
Rust 1.65 was an LLVM upgrade. |
I was going to say "gonna need a benchmark" but then I actually looked at the size of that regression. Damn. |
One difference is that in MIR, Rust 1.65 inlined |
cc @saethlin |
It looks to me like that function is correctly inlined in 1.65, 1.66, and current nightly. I don't see any problematic MIR inliner behavior in any version. Not that the MIR is neat and tidy or anything like that, but it looks like the inliner is behaving right. |
Oh I see, you meant that the problem is that we are inlining I can confirm that indeed making that function not inline in MIR fixes the regression. Though I have no idea how/why: https://godbolt.org/z/8j3MhjdW4 |
@scottmcm I wonder if this is a case where the MIR inliner is blowing up the size of the caller and that's making LLVM optimizations (perhaps just one key optimization) give up on the caller because there's too much IR in it. Note that this reproducer has been properly minimized; deleting any of the variants causes it to optimize correctly. I think you were working on an inliner tweak that considered the size of the caller? |
I'm documenting what little progress I've made. It's weird, but inline shouldn't be a problem. |
Seems fixed as of current nightly: eq:
xor sil, 2
or sil, dil
setne byte ptr [rdx]
ret define void @eq(i8 noundef range(i8 0, 8) %0, i8 %1, ptr noalias nocapture noundef writeonly align 1 dereferenceable(1) initializes((0, 1)) %b) unnamed_addr {
start:
%_5.i = icmp ne i8 %0, 0
%_68.i = icmp ne i8 %1, 2
%or.cond.not = select i1 %_5.i, i1 true, i1 %_68.i
%_0.sroa.0.0.i6 = zext i1 %or.cond.not to i8
store i8 %_0.sroa.0.0.i6, ptr %b, align 1
ret void
} |
So it does! I wonder what has changed 🤔 |
|
I'm not sure if this is the right place for this (might be LLVM to blame), just a bit of inefficient code that I noticed.
https://godbolt.org/z/K57orYj5h
The only difference between the two functions
eq
andmatches
is the use of==
andmatches!
for the enum comparison. The generated code foreq
includes two calls toPartialEq
forOuter
, whereas the code formatches
has a much simpler (inline) comparison.Also, the generated code for
PartialEq
seems very inefficient, given the enum is just a two-byte value that can be directly compared.If you tinker with the enum definitions it's not hard to cause
eq
to optimise exactly likematches
.Example copied here
The text was updated successfully, but these errors were encountered: