Skip to content

Match on two-variant enum optimizes poorly #122734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
clubby789 opened this issue Mar 19, 2024 · 15 comments
Open

Match on two-variant enum optimizes poorly #122734

clubby789 opened this issue Mar 19, 2024 · 15 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes

Comments

@clubby789
Copy link
Contributor

clubby789 commented Mar 19, 2024

https://godbolt.org/z/f3TMe3rcW

good gets optimal codegen since the LLVM 18 bump, but bad has had poor codegen for some time. Codegen also becomes optimal if a and b are references rather than passed by value.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 19, 2024
@clubby789 clubby789 added I-heavy Issue: Problems and improvements with respect to binary size of generated code. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. and removed I-heavy Issue: Problems and improvements with respect to binary size of generated code. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 19, 2024
@clubby789
Copy link
Contributor Author

clubby789 commented Mar 19, 2024

Not sure if this is necessarily A-LLVM actually; it looks like we don't give enough information to codegen to improve this

We actually do:

%0 = alloca ...
%1 = load i8, ptr %0, align 1, !range <range info>

Somewhere in SROA that range information seems to get lost

@erikdesjardins
Copy link
Contributor

Codegen also becomes optimal if a and b are references

Looks like this is because of the range metadata on the load: https://godbolt.org/z/bPfezGhW6.

So this may be fixed after LLVM 19, since we'll be able to put range metadata on the by-value arguments.

(range on arguments has been merged upstream, but optimization support is being implemented right now, so the following isn't optimized yet, but it may be soon: https://godbolt.org/z/17s6jTfMv)

@dianqk
Copy link
Member

dianqk commented Mar 21, 2024

#122726 should be a similar issue.

@dianqk
Copy link
Member

dianqk commented Mar 21, 2024

So this may be fixed after LLVM 19, since we'll be able to put range metadata on the by-value arguments.

(range on arguments has been merged upstream, but optimization support is being implemented right now, so the following isn't optimized yet, but it may be soon: https://godbolt.org/z/17s6jTfMv)

@erikdesjardins Are you adding relevant support? Or someone has started.

I'm curious why we don't put the range metadata on the non-load instruction.

@dianqk
Copy link
Member

dianqk commented Mar 21, 2024

Reference: llvm/llvm-project#83171.

@erikdesjardins
Copy link
Contributor

Yeah, that's the original PR that added support, and there have been a few follow ups so far to have different passes check for range attributes. Since there's still quite a while until LLVM 19, I imagine that support will be added to every pass or analysis that currently uses range metadata by that time.

I'm curious why we don't put the range metadata on the non-load instruction.

LLVM generally only checks for such metadata on loads and calls (e.g.), so I think adding it to other instructions won't have much of an effect.

The only usage that allows any kind of instruction appears to be here in InstSimplify. It's used only to optimize comparisons that are always true or always false.

@dianqk
Copy link
Member

dianqk commented Mar 21, 2024

I'm curious why we don't put the range metadata on the non-load instruction.

LLVM generally only checks for such metadata on loads and calls (e.g.), so I think adding it to other instructions won't have much of an effect.

IIUC, we can add range metadata to other instructions. We just don't do that now. We can quickly get the range of arbitrary values when the source value provides range metadata. Adding range metadata for other instructions doesn't seem necessary if that's correct.

@scottmcm
Copy link
Member

Somewhere in SROA that range information seems to get lost

I'm pretty sure that's because we store to the alloca then load from it again. So a very early pass replaces the load-store with just using the argument, but in doing so doesn't preserve the !range metadata anywhere.

It'd need to insert an assume of some sort to keep it, and that has enough other issues that LLVM usually doesn't do that, AFAIK.

@dianqk
Copy link
Member

dianqk commented Mar 21, 2024

Somewhere in SROA that range information seems to get lost

I'm pretty sure that's because we store to the alloca then load from it again. So a very early pass replaces the load-store with just using the argument, but in doing so doesn't preserve the !range metadata anywhere.

It'd need to insert an assume of some sort to keep it, and that has enough other issues that LLVM usually doesn't do that, AFAIK.

I've tried it on #122728. I got a huge change in the perf result. 🫠

@scottmcm
Copy link
Member

I got a huge change in the perf result. 🫠

I'd been thinking about the possibility of LLVM doing the !rangeassume translation as part of removing the load. But yeah, that may well have the same perf result problem.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Feb 14, 2025
@veera-sivarajan
Copy link
Contributor

@rustbot claim

veera-sivarajan added a commit to llvm/llvm-project that referenced this issue Mar 12, 2025
Proof: https://alive2.llvm.org/ce/z/U-G7yV

Helps: rust-lang/rust#72646 and
rust-lang/rust#122734

  Rust compiler's current output: https://godbolt.org/z/7E3fET6Md

IPSCCP can do this transform but it does not help the motivating issue
since it runs only once early in the optimization pipeline.

Reimplementing this in CVP folds the motivating issue into a simple
`icmp eq` instruction.
  
  Fixes #130100
@veera-sivarajan
Copy link
Contributor

@rustbot label llvm-fixed-upstream

@rustbot rustbot added the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes label Mar 12, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Mar 12, 2025
Proof: https://alive2.llvm.org/ce/z/U-G7yV

Helps: rust-lang/rust#72646 and
rust-lang/rust#122734

  Rust compiler's current output: https://godbolt.org/z/7E3fET6Md

IPSCCP can do this transform but it does not help the motivating issue
since it runs only once early in the optimization pipeline.

Reimplementing this in CVP folds the motivating issue into a simple
`icmp eq` instruction.

  Fixes #130100
frederik-h pushed a commit to frederik-h/llvm-project that referenced this issue Mar 18, 2025
Proof: https://alive2.llvm.org/ce/z/U-G7yV

Helps: rust-lang/rust#72646 and
rust-lang/rust#122734

  Rust compiler's current output: https://godbolt.org/z/7E3fET6Md

IPSCCP can do this transform but it does not help the motivating issue
since it runs only once early in the optimization pipeline.

Reimplementing this in CVP folds the motivating issue into a simple
`icmp eq` instruction.
  
  Fixes llvm#130100
@scottmcm scottmcm marked this as a duplicate of #139078 Mar 28, 2025
@scottmcm
Copy link
Member

bad is still poor as of rustc 1.87.0-nightly (a2e63569f 2025-03-26), even though rustc itself adds nuws now.

But with llvm/llvm-project#133344 fixed upstream, after the LLVM 21 version upgrade we should revisit this and see whether we can enable this test:

// FIXME: This should work too
// // FIXME-CHECK-LABEL: @bool_eq
// #[no_mangle]
// pub fn bool_eq(l: Option<bool>, r: Option<bool>) -> bool {
// // FIXME-CHECK: start:
// // FIXME-CHECK-NEXT: icmp eq i8
// // FIXME-CHECK-NEXT: ret i1
// l == r
// }

@scottmcm scottmcm marked this as a duplicate of #72646 Mar 28, 2025
@scottmcm
Copy link
Member

Looks like it'll be enough! With trunk LLVM and the current no-prepopulate-passes IR from rustc, it all collapses: https://llvm.godbolt.org/z/h7TW1qnzT

Still can't go back to the derive, though, since with the derived eq (rather than the custom one on Option), we still get https://llvm.godbolt.org/z/x151E9j3T which is improved from current nightly, but not yet fully optimized.

@scottmcm
Copy link
Member

scottmcm commented Apr 2, 2025

I looked into this some more, and found another LLVM bug: llvm/llvm-project#134024

Looks like it needs at least three variants before it impacts us, so it's not the solution to fixing the derive either :(


EDIT: ...and another one for the >2 variants case, llvm/llvm-project#134028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes
Projects
None yet
Development

No branches or pull requests

7 participants