Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rustc fails to optimize a common option usage pattern #68667

Closed
Pzixel opened this issue Jan 30, 2020 · 6 comments · Fixed by #125347
Closed

Rustc fails to optimize a common option usage pattern #68667

Pzixel opened this issue Jan 30, 2020 · 6 comments · Fixed by #125347
Labels
A-layout Area: Memory layout of types A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-inlining Area: MIR inlining C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Pzixel
Copy link

Pzixel commented Jan 30, 2020

Consider following functions

pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
    a.map(|t| t >= b)
     .unwrap_or(false)
}

pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
    match a {
        Some(t) => t >= b,
        None => false
    }
}

The first pattern is what we often write and the second one is the most efficient manually unrolled version. Surprisingly rustc fails to optimize the former one into the latter as you can see in godbolt listing:

example::unwrap_combinators:
        xor     eax, eax
        cmp     edx, esi
        setle   al
        test    edi, edi
        mov     ecx, 2
        cmovne  ecx, eax
        cmp     cl, 2
        setne   al
        and     al, cl
        ret

example::unwrap_manual:
        test    edi, edi
        setne   cl
        cmp     esi, edx
        setge   al
        and     al, cl
        ret

P.S. Yes, I'm aware of map_or

@jonas-schievink jonas-schievink added C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 30, 2020
@Aaron1011
Copy link
Member

Aaron1011 commented Jan 30, 2020

It looks like Rust's niche-filling optimization is defeating LLVM's optimizations.

In unwrap_combinators, the temporary Option<bool> uses the niche in bool for the discriminant, resulting in the following layout:

0 == Some(false)
1 === Some(true)
2 == None

LLVM ends up inlining the calls to map and unwrap_or, but is unable to optimize the particular combination of icmp and select that ends up in the LLVM IR.

This can be seen by changing the type to a tuple of (bool, ()), which inhibits Rust's niche-filling optimization: (playground):

pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
    a.map(|t| (t >= b, ()))
     .unwrap_or((false, ()))
     .0
}

pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
    match a {
        Some(t) => t >= b,
        None => false
    }
}

which generates the following ASM:

playground::unwrap_combinators:
	testl	%edi, %edi
	setne	%cl
	cmpl	%esi, %edx
	setle	%al
	andb	%cl, %al
	retq

playground::unwrap_manual:
	testl	%edi, %edi
	setne	%cl
	cmpl	%edx, %esi
	setge	%al
	andb	%cl, %al
	retq

LLVM has all of the information it needs to optimize the original IR - however, it doesn't seem to have an instcombine special-case that would allow it to do so.

Unfortunately, neither map nor unwrap_or gets MIR-inlined with -Z mir-opt-level=3, due to both of them having too high of a computed cost:

[DEBUG rustc_mir::transform::inline] checking whether to inline callsite CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [closure@slow.rs:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } }
[DEBUG rustc_mir::transform::inline] consider_optimizing(CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [closure@slow.rs:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] should_inline(CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [closure@slow.rs:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline]     final inline threshold = 100
[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [closure@slow.rs:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } } [cost=204 > threshold=100]
[DEBUG rustc_mir::transform::inline] checking whether to inline callsite CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } }
[DEBUG rustc_mir::transform::inline] consider_optimizing(CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] should_inline(CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline]     final inline threshold = 100
[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } } [cost=123 > threshold=100]

The fact that LLVM decides to inline these functions suggets that we might be overly conservative in how we calculating inlining cost.

Hopefully, this situation will be improved by #68528, which specifically calls out unwrap_or as having improved MIR generation.

@ecstatic-morse
Copy link
Contributor

@Aaron1011 I can confirm that, with #68528, unwrap_or becomes eligible for MIR inlining in unwrap_combinators, and the two functions compile to the same assembly with -Z mir-opt-level=3.

@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-layout Area: Memory layout of types labels Mar 29, 2020
@felix91gr
Copy link
Contributor

Should this issue be closed now? Since it's been solved by #68528. Or maybe since it's still under mir-opt-3, and therefore unstable, it's still worth left open? :)

@ecstatic-morse
Copy link
Contributor

This won't be fixed until MIR inlining becomes more usable and should remain open. I would like to see an "I-slow-fixed-by-MIR-inlining" tag so issues like this and #66234 can be triaged more efficiently.

@ecstatic-morse ecstatic-morse added the A-mir-opt-inlining Area: MIR inlining label Apr 27, 2020
@felix91gr
Copy link
Contributor

That makes a lot of sense 🙂

@Kobzol
Copy link
Contributor

Kobzol commented Aug 9, 2022

It looks like the code is now optimized properly in recent nightly: https://rust.godbolt.org/z/sMhr83EMo, maybe because of enabled MIR inlining. Maybe we could add a codegen test for this?

@cjgillot cjgillot added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Jul 19, 2023
tesuji added a commit to tesuji/rustc that referenced this issue May 20, 2024
tesuji added a commit to tesuji/rustc that referenced this issue May 20, 2024
tesuji added a commit to tesuji/rustc that referenced this issue Jun 8, 2024
tesuji added a commit to tesuji/rustc that referenced this issue Jun 9, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 9, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 10, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 11, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 13, 2024
@bors bors closed this as completed in 7ac6c2f Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Area: Memory layout of types A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-inlining Area: MIR inlining C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants