Skip to content

Rustc fails to optimize a common option usage pattern #68667

Closed
@Pzixel

Description

@Pzixel

Consider following functions

pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
    a.map(|t| t >= b)
     .unwrap_or(false)
}

pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
    match a {
        Some(t) => t >= b,
        None => false
    }
}

The first pattern is what we often write and the second one is the most efficient manually unrolled version. Surprisingly rustc fails to optimize the former one into the latter as you can see in godbolt listing:

example::unwrap_combinators:
        xor     eax, eax
        cmp     edx, esi
        setle   al
        test    edi, edi
        mov     ecx, 2
        cmovne  ecx, eax
        cmp     cl, 2
        setne   al
        and     al, cl
        ret

example::unwrap_manual:
        test    edi, edi
        setne   cl
        cmp     esi, edx
        setge   al
        and     al, cl
        ret

P.S. Yes, I'm aware of map_or

Activity

added
C-bugCategory: This is a bug.
I-slowIssue: Problems and improvements with respect to performance of generated code.
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Jan 30, 2020
Aaron1011

Aaron1011 commented on Jan 30, 2020

@Aaron1011
Member

It looks like Rust's niche-filling optimization is defeating LLVM's optimizations.

In unwrap_combinators, the temporary Option<bool> uses the niche in bool for the discriminant, resulting in the following layout:

0 == Some(false)
1 === Some(true)
2 == None

LLVM ends up inlining the calls to map and unwrap_or, but is unable to optimize the particular combination of icmp and select that ends up in the LLVM IR.

This can be seen by changing the type to a tuple of (bool, ()), which inhibits Rust's niche-filling optimization: (playground):

pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
    a.map(|t| (t >= b, ()))
     .unwrap_or((false, ()))
     .0
}

pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
    match a {
        Some(t) => t >= b,
        None => false
    }
}

which generates the following ASM:

playground::unwrap_combinators:
	testl	%edi, %edi
	setne	%cl
	cmpl	%esi, %edx
	setle	%al
	andb	%cl, %al
	retq

playground::unwrap_manual:
	testl	%edi, %edi
	setne	%cl
	cmpl	%edx, %esi
	setge	%al
	andb	%cl, %al
	retq

LLVM has all of the information it needs to optimize the original IR - however, it doesn't seem to have an instcombine special-case that would allow it to do so.

Unfortunately, neither map nor unwrap_or gets MIR-inlined with -Z mir-opt-level=3, due to both of them having too high of a computed cost:

[DEBUG rustc_mir::transform::inline] checking whether to inline callsite CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [closure@slow.rs:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } }
[DEBUG rustc_mir::transform::inline] consider_optimizing(CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [closure@slow.rs:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] should_inline(CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [closure@slow.rs:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline]     final inline threshold = 100
[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [closure@slow.rs:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } } [cost=204 > threshold=100]
[DEBUG rustc_mir::transform::inline] checking whether to inline callsite CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } }
[DEBUG rustc_mir::transform::inline] consider_optimizing(CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] should_inline(CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline]     final inline threshold = 100
[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } } [cost=123 > threshold=100]

The fact that LLVM decides to inline these functions suggets that we might be overly conservative in how we calculating inlining cost.

Hopefully, this situation will be improved by #68528, which specifically calls out unwrap_or as having improved MIR generation.

ecstatic-morse

ecstatic-morse commented on Jan 30, 2020

@ecstatic-morse
Contributor

@Aaron1011 I can confirm that, with #68528, unwrap_or becomes eligible for MIR inlining in unwrap_combinators, and the two functions compile to the same assembly with -Z mir-opt-level=3.

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
A-mir-optArea: MIR optimizations
A-layoutArea: Memory layout of types
on Mar 29, 2020
felix91gr

felix91gr commented on Apr 27, 2020

@felix91gr
Contributor

Should this issue be closed now? Since it's been solved by #68528. Or maybe since it's still under mir-opt-3, and therefore unstable, it's still worth left open? :)

ecstatic-morse

ecstatic-morse commented on Apr 27, 2020

@ecstatic-morse
Contributor

This won't be fixed until MIR inlining becomes more usable and should remain open. I would like to see an "I-slow-fixed-by-MIR-inlining" tag so issues like this and #66234 can be triaged more efficiently.

felix91gr

felix91gr commented on Apr 28, 2020

@felix91gr
Contributor

That makes a lot of sense 🙂

Kobzol

Kobzol commented on Aug 9, 2022

@Kobzol
Contributor

It looks like the code is now optimized properly in recent nightly: https://rust.godbolt.org/z/sMhr83EMo, maybe because of enabled MIR inlining. Maybe we could add a codegen test for this?

added
E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.
on Jul 19, 2023
added 5 commits that reference this issue on Jun 9, 2024

Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic

b7edafe

Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic

a1b4415

Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic

b1761a2

Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic

4936869

Auto merge of rust-lang#125347 - tesuji:needtests, r=nikic

7ac6c2f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-layoutArea: Memory layout of typesA-mir-optArea: MIR optimizationsA-mir-opt-inliningArea: MIR inliningC-bugCategory: This is a bug.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @Aaron1011@jonas-schievink@cjgillot@Kobzol@Pzixel

      Issue actions

        Rustc fails to optimize a common option usage pattern · Issue #68667 · rust-lang/rust