Iterating with step_by(1) is much slower than without #57517

mvlabat · 2019-01-11T16:01:18Z

Greetings!

I'd like to report that I get some significant (negative) performance impact when calling step_by(1) on an iterator. I created a repository with a detailed description of the issue and benchmarks to reproduce it: https://github.com/mvlabat/step_by_one

These are the functions I tested:

pub fn iter_default_step(mut arr: Vec<i32>) -> Vec<i32> {
    for e in arr.iter_mut() {
        *e = e.wrapping_add(3);
    }
    arr
}

pub fn iter_step(mut arr: Vec<i32>, iter_step: usize) -> Vec<i32> {
    for e in arr.iter_mut().step_by(iter_step) {
        *e = e.wrapping_add(3);
    }
    arr
}

Calling iter_step(vec![1; LARGE_ENOUGH], 1) computes 1.75x slower than iter_default_step(vec![1; LARGE_ENOUGH])
with const LARGE_ENOUGH: usize = 10_000_000;.

I'm running Macbook Pro 2015 with Intel i5-5257U CPU (2.70GHz).

My rustc version: 1.33.0-nightly (c2d381d39 2019-01-10).

These are the exact benchmark results I got:

running 4 tests
test tests::bench_iter_default_step ... bench:   8,702,963 ns/iter (+/- 1,648,782)
test tests::bench_iter_step_1       ... bench:  15,267,083 ns/iter (+/- 1,236,220)
test tests::bench_iter_step_16      ... bench:   9,053,772 ns/iter (+/- 380,422)
test tests::bench_iter_step_64      ... bench:   8,711,169 ns/iter (+/- 327,562)

In the repository README there are also links to the generated asm code of these two functions.

The text was updated successfully, but these errors were encountered:

nikic · 2019-01-11T22:32:16Z

Godbolt link: https://rust.godbolt.org/z/scU_bY

I've slightly changed the code to actually modify the array (wrapping_add returns the value -- it does not operate in-place).

One bit that stands out especially to me is this:

bb4: ; preds = %bb8, %start
  %iter.sroa.0.0 = phi i64 [ %1, %start ], [ %iter.sroa.0.1, %bb8 ]
  %iter.sroa.13.0 = phi i1 [ false, %start ], [ true, %bb8 ]
  br i1 %iter.sroa.13.0, label %bb2.i, label %bb1.i

I would have expected something (possibly jump threading?) to optimize away the %iter.sroa.13.0 phi.

mvlabat · 2019-01-11T22:34:36Z

Yeah, I forgot to use the returned addition value. Changed that in my code already too )

scottmcm · 2019-01-12T01:12:53Z

I wonder if this would be fixed with the change discussed in #52065. Also, try using .for_each() instead of a for loop; this might just be one of the many places where internal iteration is just faster.

nikic · 2019-09-09T20:14:57Z

I don't think #64121 fixes this issue, as it affects internal iteration only, while this one is about external iteration.

timvermeulen · 2020-02-16T12:11:26Z

@nikic You're right, but now it's just a matter of "for_each is faster than a for loop for step_by" which is not at all surprising if you consider that StepBy::next branches based on the first_take field. It's not something we're able to fix in the general case.

mlindner · 2020-05-18T17:26:14Z

There's another example of this coming up here in this thread: https://www.reddit.com/r/learnrust/comments/glxa5r/rust_vs_cpp_speed_in_implementation/

Rust vs C++ example where C++ takes 600-700ms and Rust takes 4300ms.

the8472 · 2023-05-23T07:42:36Z

Looks like it's fixed on beta and nightly. It hasn't reached stable yet.

https://rust.godbolt.org/z/6qjvoE719

nikic · 2023-05-23T16:09:57Z

I believe step_by was one of the iterators with improved optimization in LLVM 16. I thought I added a codegen test for that, but apparently not.

We should add a test for this, but probably not the one from OP, but something with simpler IR.

nikic added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jan 11, 2019

nikic mentioned this issue Mar 25, 2019

Iter with step_by(2) performs slowly #59281

Closed

lambda mentioned this issue May 7, 2019

Useless while loop with a step of 2 doesn't optimize away #60616

Closed

timvermeulen mentioned this issue Sep 3, 2019

Override StepBy::{try_fold, try_rfold} #64121

Merged

bors closed this as completed in 7e98ec5 Sep 9, 2019

nikic reopened this Sep 9, 2019

nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterating with step_by(1) is much slower than without #57517

Iterating with step_by(1) is much slower than without #57517

mvlabat commented Jan 11, 2019 •

edited

Loading

nikic commented Jan 11, 2019

mvlabat commented Jan 11, 2019

scottmcm commented Jan 12, 2019

nikic commented Sep 9, 2019

timvermeulen commented Feb 16, 2020

mlindner commented May 18, 2020 •

edited

Loading

the8472 commented May 23, 2023

nikic commented May 23, 2023

Iterating with step_by(1) is much slower than without #57517

Iterating with step_by(1) is much slower than without #57517

Comments

mvlabat commented Jan 11, 2019 • edited Loading

nikic commented Jan 11, 2019

mvlabat commented Jan 11, 2019

scottmcm commented Jan 12, 2019

nikic commented Sep 9, 2019

timvermeulen commented Feb 16, 2020

mlindner commented May 18, 2020 • edited Loading

the8472 commented May 23, 2023

nikic commented May 23, 2023

mvlabat commented Jan 11, 2019 •

edited

Loading

mlindner commented May 18, 2020 •

edited

Loading