-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.
Description
Minified example: (godbolt)
use std::task::{Poll, Waker};
pub enum State<T> {
Inactive,
Active(Waker),
Signalled(T),
}
#[unsafe(no_mangle)]
pub fn poll_state(st: &mut State<String>, w: &Waker) -> Poll<String> { // a
match st {
State::Signalled(_) => {
// Just checked the variant, take the value out.
let State::Signalled(v) = std::mem::replace(st, State::Inactive) else {
unreachable!() // This panic should be eliminated.
};
Poll::Ready(v)
}
_ => {
*st = State::Active(w.clone()); // b
Poll::Pending
}
}
}
The optimization is fragile. If we remove the state assignment on the second branch (b), or change function signature to operate on State<u8>
, then the unreachable panic in the first branch will be correctly eliminated.
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
Berrysoft commentedon Jun 24, 2025
Seems that the panic won't be eliminated if assignment (b) is removed.
ashivaram23 commentedon Jun 24, 2025
I've come across something similar. I think MIR optimizations don't propagate known enum discriminants across blocks so it's left to LLVM, which might be unable to do more with it. In the case I've seen, LLVM's inability to optimize further could be traced down to how it handles the enum's niche optimization, but it could be something totally different here.
A MIR pass that propagates enum discriminants (through dataflow analysis or just by checking the direct predecessors of each block) would probably fix all these cases.
oxalica commentedon Jun 25, 2025
So is it related to that known value ranges are not propagated through some boundaries (functions, for example)? I also encountered an issue about the discriminant of
Option<NonZero<usize>>
in #49572 (comment)? Not sure if they are related.ashivaram23 commentedon Jun 25, 2025
I'm mainly thinking of propagating across blocks within a single function. Some kind of logic that would allow the first
match
arm block to consider the panic block unreachable since it can only be reached if (a copy of)st
is notSignalled
, which can't possibly be true at that point.The example in the comment you linked should also be fixed by that logic since you set to
Some
and unwrap within a single block. If the function pointer call had range metadata saying it's not zero, or if it was a separate function with a return value range attribute, then LLVM would probably eliminate the unwrap failure call on its own. But I think the benefit of doing this optimization in MIR is that it wouldn't depend on the frontend giving LLVM the right metadata for all these cases and LLVM managing to understand all the different kinds of niches.nikic commentedon Jun 25, 2025
This fails to optimize because LLVM currently fails to track loads through memcpys.
ashivaram23 commentedon Jun 29, 2025
In the very similar case I encountered (second example in #142705 but actually unrelated to the rest of the issue), this wasn't a problem because SROA replaced a load after a memcpy with a load of the original. I think here it doesn't do that because the pointer being memcpy'd into is used later to call Drop. It also was in my case, but the drop function got inlined beforehand and the way the inlined code used the memcpy'd memory is apparently okay for SROA?
If SROA created a new load of the original anyway and used that for the conditional
br
that's currently failing to be optimized into an unconditionalbr
, and still kept the memcpy around for the later drop, then maybe this optimization could go through (assuming it doesn't hit another obstacle like it did in my case). Would that be reasonable?