Skip to content

Missed optimization: re-checking enum variants after std::mem::replace is not eliminated (somtimes) #142951

@oxalica

Description

@oxalica
Contributor

Minified example: (godbolt)

use std::task::{Poll, Waker};

pub enum State<T> {
    Inactive,
    Active(Waker),
    Signalled(T),
}

#[unsafe(no_mangle)]
pub fn poll_state(st: &mut State<String>, w: &Waker) -> Poll<String> { // a
    match st {
        State::Signalled(_) => {
            // Just checked the variant, take the value out.
            let State::Signalled(v) = std::mem::replace(st, State::Inactive) else {
                unreachable!() // This panic should be eliminated.
            };
            Poll::Ready(v)
        }
        _ => {
            *st = State::Active(w.clone()); // b
            Poll::Pending
        }
    }
}

The optimization is fragile. If we remove the state assignment on the second branch (b), or change function signature to operate on State<u8>, then the unreachable panic in the first branch will be correctly eliminated.

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Jun 24, 2025
Berrysoft

Berrysoft commented on Jun 24, 2025

@Berrysoft
Contributor

Seems that the panic won't be eliminated if assignment (b) is removed.

ashivaram23

ashivaram23 commented on Jun 24, 2025

@ashivaram23
Contributor

I've come across something similar. I think MIR optimizations don't propagate known enum discriminants across blocks so it's left to LLVM, which might be unable to do more with it. In the case I've seen, LLVM's inability to optimize further could be traced down to how it handles the enum's niche optimization, but it could be something totally different here.

A MIR pass that propagates enum discriminants (through dataflow analysis or just by checking the direct predecessors of each block) would probably fix all these cases.

oxalica

oxalica commented on Jun 25, 2025

@oxalica
ContributorAuthor

known enum discriminants

So is it related to that known value ranges are not propagated through some boundaries (functions, for example)? I also encountered an issue about the discriminant of Option<NonZero<usize>> in #49572 (comment)? Not sure if they are related.

ashivaram23

ashivaram23 commented on Jun 25, 2025

@ashivaram23
Contributor

I'm mainly thinking of propagating across blocks within a single function. Some kind of logic that would allow the first match arm block to consider the panic block unreachable since it can only be reached if (a copy of) st is not Signalled, which can't possibly be true at that point.

The example in the comment you linked should also be fixed by that logic since you set to Some and unwrap within a single block. If the function pointer call had range metadata saying it's not zero, or if it was a separate function with a return value range attribute, then LLVM would probably eliminate the unwrap failure call on its own. But I think the benefit of doing this optimization in MIR is that it wouldn't depend on the frontend giving LLVM the right metadata for all these cases and LLVM managing to understand all the different kinds of niches.

nikic

nikic commented on Jun 25, 2025

@nikic
Contributor

This fails to optimize because LLVM currently fails to track loads through memcpys.

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
I-slowIssue: Problems and improvements with respect to performance of generated code.
C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing such
and removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Jun 25, 2025
ashivaram23

ashivaram23 commented on Jun 29, 2025

@ashivaram23
Contributor

This fails to optimize because LLVM currently fails to track loads through memcpys.

In the very similar case I encountered (second example in #142705 but actually unrelated to the rest of the issue), this wasn't a problem because SROA replaced a load after a memcpy with a load of the original. I think here it doesn't do that because the pointer being memcpy'd into is used later to call Drop. It also was in my case, but the drop function got inlined beforehand and the way the inlined code used the memcpy'd memory is apparently okay for SROA?

If SROA created a new load of the original anyway and used that for the conditional br that's currently failing to be optimized into an unconditional br, and still kept the memcpy around for the later drop, then maybe this optimization could go through (assuming it doesn't hit another obstacle like it did in my case). Would that be reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @nikic@oxalica@Berrysoft@rustbot@ashivaram23

        Issue actions

          Missed optimization: re-checking enum variants after `std::mem::replace` is not eliminated (somtimes) · Issue #142951 · rust-lang/rust