Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed optimization: extra branch with slicing and integer overflow checks #58692

Closed
alex opened this issue Feb 24, 2019 · 13 comments · Fixed by #72547
Closed

Missed optimization: extra branch with slicing and integer overflow checks #58692

alex opened this issue Feb 24, 2019 · 13 comments · Fixed by #72547
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. WG-llvm Working group: LLVM backend code generation

Comments

@alex
Copy link
Member

alex commented Feb 24, 2019

Godbolt: https://rust.godbolt.org/z/PXY1lr

Basically the issue is that you compute s.position + n, check that it doesn't overflow, and then you verify that s.position + n is greater than s.position, which it of course always is, since you checked for an overflow! This is probably an LLVM optimization bug, but I'm starting here :-)

Command line arguments: -C opt-level=3 -C debug-assertions

Source:

pub struct S1<'a> {
    data: &'a [u8],
    position: usize,
}

pub fn f1<'a>(s: &'a mut S1, n: usize) -> &'a [u8] {
    let d = &s.data[s.position..s.position+n];
    s.position += n;
    return d;
}

ASM:

example::f1:
        push    rax
        mov     rax, qword ptr [rdi + 16]
        add     rsi, rax
        jb      .LBB0_4
        mov     rdx, rsi
        sub     rdx, rax
        jb      .LBB0_5
        mov     rcx, qword ptr [rdi + 8]
        cmp     rcx, rsi
        jb      .LBB0_6
        add     rax, qword ptr [rdi]
        mov     qword ptr [rdi + 16], rsi
        pop     rcx
        ret
.LBB0_4:
        lea     rdi, [rip + .L__unnamed_1]
        call    qword ptr [rip + _ZN4core9panicking5panic17h8f422ca74d186618E@GOTPCREL]
        ud2
.LBB0_5:
        mov     rdi, rax
        call    qword ptr [rip + _ZN4core5slice22slice_index_order_fail17h7ab7c9c6113ae4b8E@GOTPCREL]
        ud2
.LBB0_6:
        mov     rdi, rsi
        mov     rsi, rcx
        call    qword ptr [rip + _ZN4core5slice20slice_index_len_fail17h3a0f2c4c66447e54E@GOTPCREL]
        ud2

str.0:
        .ascii  "/tmp/compiler-explorer-compiler119124-62-359u5j.hrbs4/example.rs"

str.1:
        .ascii  "attempt to add with overflow"

.L__unnamed_1:
        .quad   str.1
        .quad   28
        .quad   str.0
        .quad   64
        .long   8
        .long   33
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Feb 24, 2019
@nikic
Copy link
Contributor

nikic commented Feb 24, 2019

Slightly reduced testcase:

define i64 @test(i64* %p, i64 %n) nounwind {
  %x = load i64, i64* %p
  %y = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %x, i64 %n)
  %z = extractvalue { i64, i1 } %y, 1
  br i1 %z, label %bb3, label %bb1

bb1:
  %a = extractvalue { i64, i1 } %y, 0
  %b = icmp ult i64 %a, %x
  br i1 %b, label %bb3, label %bb2

bb2:
  store i64 %a, i64* %p
  ret i64 0

bb3:
  ret i64 1
}

declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)

LLVM does not realize that %z and %b are the same here.

@alex
Copy link
Member Author

alex commented Feb 24, 2019

Ok, looks like the thing to do is file an LLVM bug -- do you know off hand if they're ok with Rust reproducers, or do they prefer C ones?

@alex
Copy link
Member Author

alex commented Feb 24, 2019

Ok, ended up converting to C because that seemed easy: https://bugs.llvm.org/show_bug.cgi?id=40845

@nikic
Copy link
Contributor

nikic commented Feb 24, 2019

Submitted https://bugs.llvm.org/show_bug.cgi?id=40846 additionally for the IR pattern above (which is an issue separate from the missing add.with.overflow+sub.with.overflow combining). Just verified that fixing this does indeed eliminate the slice_index_order_fail check in this example.

@scottmcm scottmcm added the WG-llvm Working group: LLVM backend code generation label Feb 26, 2019
@alex
Copy link
Member Author

alex commented Nov 3, 2019

@nikic am I correct in reading that this has been waiting on a review of the LLVM patch?

@nikic
Copy link
Contributor

nikic commented Nov 3, 2019

@alex Needs a rebase. More generally though, I think this is fighting a losing battle. uadd.with.overflow and usub.with.overflow are not considered canonical IR for a reason: They optimize badly in the middle end.

I think we'd be better off with generating the canonical variants instead (which are a + b; (a + b) < a and a - b; a < b). LLVM will still form uadd.with.overflow/usub.with.overflow during CGP, but this should optimize better in the middle end (including avoiding the issue here).

As we already have a good benchmark for the overhead of overflow checks in #58475, I think it might be worth trying to rerun that without using the unsigned overflow intrinsics.

@comex
Copy link
Contributor

comex commented Dec 11, 2019

From the second LLVM bug above:

Fixed in llvm/llvm-project@8db5143.

I'm keeping this issue open, because the negated variation of this is not handled yet.

🎉

@alex
Copy link
Member Author

alex commented Dec 12, 2019

🎉 once this makes it's way into Rust's LLVM (is there a good way to track this?) I'll verify that the assembly in this bug was improved, and if so I'll rebase the rustc patch and we can get some new numbers!

@alex
Copy link
Member Author

alex commented May 22, 2020

Ok, confirmed that the recent LLVM10 update removes this branch: https://rust.godbolt.org/z/B38vDX

Shall this be closed?

@Mark-Simulacrum
Copy link
Member

Actually, that was probably premature. We should add a codegen test.

@Mark-Simulacrum Mark-Simulacrum added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label May 24, 2020
@alex
Copy link
Member Author

alex commented May 24, 2020

@Mark-Simulacrum is there a good example of what a test looks like? I can spin one up pretty easily if someone can point me at how to add one.

@tmiasko
Copy link
Contributor

tmiasko commented May 24, 2020

@alex Codegen tests are located in src/test/codegen. One test that is thematically related: slice-position-bounds-check.rs. They use LLVM FileCheck to verify generated LLVM IR.

Codegen tests can be run with:

./x.py test --stage 1 src/test/codegen

@alex
Copy link
Member Author

alex commented May 24, 2020

Thanks much, that's exactly what I needed! PR is up now.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue May 27, 2020
Added a codegen test for a recent optimization for overflow-checks=on

Closes rust-lang#58692
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue May 27, 2020
Added a codegen test for a recent optimization for overflow-checks=on

Closes rust-lang#58692
JohnTitor added a commit to JohnTitor/rust that referenced this issue May 28, 2020
Added a codegen test for a recent optimization for overflow-checks=on

Closes rust-lang#58692
JohnTitor added a commit to JohnTitor/rust that referenced this issue May 29, 2020
Added a codegen test for a recent optimization for overflow-checks=on

Closes rust-lang#58692
@bors bors closed this as completed in cd5f228 May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. WG-llvm Working group: LLVM backend code generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants