Closed
Description
Consider following functions
pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
a.map(|t| t >= b)
.unwrap_or(false)
}
pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
match a {
Some(t) => t >= b,
None => false
}
}
The first pattern is what we often write and the second one is the most efficient manually unrolled version. Surprisingly rustc fails to optimize the former one into the latter as you can see in godbolt listing:
example::unwrap_combinators:
xor eax, eax
cmp edx, esi
setle al
test edi, edi
mov ecx, 2
cmovne ecx, eax
cmp cl, 2
setne al
and al, cl
ret
example::unwrap_manual:
test edi, edi
setne cl
cmp esi, edx
setge al
and al, cl
ret
P.S. Yes, I'm aware of map_or
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Memory layout of typesArea: MIR optimizationsArea: MIR inliningCategory: This is a bug.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.Issue: Problems and improvements with respect to performance of generated code.Relevant to the compiler team, which will review and decide on the PR/issue.
Activity
Aaron1011 commentedon Jan 30, 2020
It looks like Rust's niche-filling optimization is defeating LLVM's optimizations.
In
unwrap_combinators
, the temporaryOption<bool>
uses the niche inbool
for the discriminant, resulting in the following layout:LLVM ends up inlining the calls to
map
andunwrap_or
, but is unable to optimize the particular combination oficmp
andselect
that ends up in the LLVM IR.This can be seen by changing the type to a tuple of
(bool, ())
, which inhibits Rust's niche-filling optimization: (playground):which generates the following ASM:
LLVM has all of the information it needs to optimize the original IR - however, it doesn't seem to have an instcombine special-case that would allow it to do so.
Unfortunately, neither
map
norunwrap_or
gets MIR-inlined with-Z mir-opt-level=3
, due to both of them having too high of a computed cost:The fact that LLVM decides to inline these functions suggets that we might be overly conservative in how we calculating inlining cost.
Hopefully, this situation will be improved by #68528, which specifically calls out
unwrap_or
as having improved MIR generation.ecstatic-morse commentedon Jan 30, 2020
@Aaron1011 I can confirm that, with #68528,
unwrap_or
becomes eligible for MIR inlining inunwrap_combinators
, and the two functions compile to the same assembly with-Z mir-opt-level=3
.felix91gr commentedon Apr 27, 2020
Should this issue be closed now? Since it's been solved by #68528. Or maybe since it's still under
mir-opt-3
, and therefore unstable, it's still worth left open? :)ecstatic-morse commentedon Apr 27, 2020
This won't be fixed until MIR inlining becomes more usable and should remain open. I would like to see an "I-slow-fixed-by-MIR-inlining" tag so issues like this and #66234 can be triaged more efficiently.
felix91gr commentedon Apr 28, 2020
That makes a lot of sense 🙂
Kobzol commentedon Aug 9, 2022
It looks like the code is now optimized properly in recent
nightly
: https://rust.godbolt.org/z/sMhr83EMo, maybe because of enabled MIR inlining. Maybe we could add a codegen test for this?add codegen test for rust-lang#68667
add codegen test for rust-lang#68667
add codegen test for rust-lang#68667
add codegen test for rust-lang#68667
Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic
Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic
Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic
Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic
Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic