Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vec::reserve(n) followed by n calls to push should only check capacity once #105156

Open
jhorstmann opened this issue Dec 1, 2022 · 4 comments
Open
Labels
A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jhorstmann
Copy link
Contributor

I tried this code in compiler explorer (https://rust.godbolt.org/z/W35vT4osM):

pub fn push1(v: &mut Vec<u32>, x: u32) {
    v.reserve(1);
    v.push(x);
}

pub fn push2(v: &mut Vec<u32>, x: u32) {
    v.reserve(2);
    v.push(x);
    v.push(x);
}

I expected the generated assembly to only check the capacity once, and reallocate if needed, and then write the value. Instead, for each push there is another check of remaining capacity and potential call to reserve_for_push.

Adding explicit assume calls leads to much shorter code with only a single call to do_reserve_and_handle.

pub fn push2_assume(v: &mut Vec<u32>, x: u32) {
    v.reserve(2);

    unsafe {
        core::intrinsics::assume(v.len() < v.capacity());
    }
    v.push(x);

    unsafe {
        core::intrinsics::assume(v.len() < v.capacity());
    }
    v.push(x);
}

This seems similar to #82801 but that issue seems specific to the initial vector capacity and code for that case looks ok now.

Meta

rustc --version --verbose:

rustc 1.67.0-nightly (e0098a5cc 2022-11-29)
binary: rustc
commit-hash: e0098a5cc3a87d857e597af824d0ce1ed1ad85e0
commit-date: 2022-11-29
host: x86_64-unknown-linux-gnu
release: 1.67.0-nightly
LLVM version: 15.0.4
Compiler returned: 0
@jhorstmann jhorstmann added the C-bug Category: This is a bug. label Dec 1, 2022
@matthiaskrgr matthiaskrgr added I-slow Issue: Problems and improvements with respect to performance of generated code. and removed C-bug Category: This is a bug. labels Dec 2, 2022
@estebank estebank added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 21, 2023
@SUPERCILEX
Copy link
Contributor

Just ran into this too. Here's the generalization:

pub fn insert(v: &mut Vec<usize>, n: usize) {
    v.reserve(n);
    for i in 0..n {
        v.push(i);
    }
}

pub fn create(n: usize) -> Vec<usize> {
    let mut v = Vec::with_capacity(n);
    for i in 0..n {
        v.push(i);
    }
    v
}

@SUPERCILEX
Copy link
Contributor

Fixed with this:

pub fn insert(v: &mut Vec<usize>, n: usize) {
    v.reserve(n);
    for i in 0..n {
        if v.len() >= v.capacity() {
            unsafe { std::hint::unreachable_unchecked() }
        }
        v.push(i);
    }
}

pub fn create(n: usize) -> Vec<usize> {
    let mut v = Vec::with_capacity(n);
    for i in 0..n {
        if v.len() >= v.capacity() {
            unsafe { std::hint::unreachable_unchecked() }
        }
        v.push(i);
    }
    v
}

So should be handled by #82801

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 19, 2024
[Experiment] Eliminate possible `Vec::push` branches

Related to rust-lang#105156. Requesting a perf run.

```rust
pub fn push(v: &mut Vec<u8>) {
    let _ = v.reserve(4);
    v.push(1);
    v.push(2);
    v.push(3);
    v.push(4);
}
```

AFAICT, the codegen backend should infer the infallibility of these `push`s but unfortunately with LLVM 18 we still have unnecessary `reserve_for_push` branches.

For the sake of curiosity, `assert_unchecked` was included in `push` to see any potential impact of such change. Take a look at the generated assembly at https://godbolt.org/z/b5jjPhsf8.

AFAICT (again), the assumption of more available capacity for each `push` is not valid for all situations.
@c410-f3r
Copy link
Contributor

c410-f3r commented Feb 20, 2024

An example that eliminates branches: https://godbolt.org/z/TTEv73o44

Change v.allocate(4); to any number less than 4 to see the compiler insert guards accordingly.

For whatever reason, the trick is in the use of unchecked_add in allocate as hinted at #121300 (comment).

@cynecx
Copy link
Contributor

cynecx commented Feb 20, 2024

I'd like to leave some more observations I've noticed while playing around with this: https://gist.github.com/cynecx/24e3a4ae8756fdf9257a6cbf50005e08

tl;dr: A single assume in Vec's reserve method should've been enough to actually optimize out all further bound checks. But the big caveat here is that llvm isn't actually able to apply these optimizations with its default pipeline of passes. The constraint elimination pass must be run several times interleaved with other optimizations. This could be a limitation in the constraint elimination pass itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants