RangeInclusive internal iteration performance improvement. #58122

matthieu-m · 2019-02-03T16:24:22Z

Specialize Iterator::try_fold and DoubleEndedIterator::try_rfold to improve code generation in all internal iteration scenarios.

This changes brings the performance of internal iteration with RangeInclusive on par with the performance of iteration with Range:

Single conditional jump in hot loop,
Unrolling and vectorization,
And even Closed Form substitution.

Unfortunately, it only applies to internal iteration. Despite various attempts at stream-lining the implementation of next and next_back, LLVM has stubbornly refused to optimize external iteration appropriately, leaving me with a choice between:

The current implementation, for which Closed Form substitution is performed, but which uses 2 conditional jumps in the hot loop when optimization fail.
An implementation using a is_done boolean, which uses 1 conditional jump in the hot loop when optimization fail, allowing unrolling and vectorization, but for which Closed Form substitution fails.

In the absence of any conclusive evidence as to which usecase matters most, and with no assurance that the lack of Closed Form substitution is not indicative of other optimizations being foiled, there is no way
to pick one implementation over the other, and thus I defer to the statu quo as far as next and next_back are concerned.

Specialize Iterator::try_fold and DoubleEndedIterator::try_rfold to improve code generation in all internal iteration scenarios. This changes brings the performance of internal iteration with RangeInclusive on par with the performance of iteration with Range: - Single conditional jump in hot loop, - Unrolling and vectorization, - And even Closed Form substitution. Unfortunately, it only applies to internal iteration. Despite various attempts at stream-lining the implementation of next and next_back, LLVM has stubbornly refused to optimize external iteration appropriately, leaving me with a choice between: - The current implementation, for which Closed Form substitution is performed, but which uses 2 conditional jumps in the hot loop when optimization fail. - An implementation using a "is_done" boolean, which uses 1 conditional jump in the hot loop when optimization fail, allowing unrolling and vectorization, but for which Closed Form substitution fails. In the absence of any conclusive evidence as to which usecase matters most, and with no assurance that the lack of Closed Form substitution is not indicative of other optimizations being foiled, there is no way to pick one implementation over the other, and thus I defer to the statu quo as far as next and next_back are concerned.

rust-highfive · 2019-02-03T16:24:32Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dtolnay (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

matthieu-m · 2019-02-03T16:25:15Z

Take 2 on #57378 .

src/libcore/iter/range.rs

matthieu-m · 2019-02-10T14:04:19Z

@dtolnay : I've addressed the concerns raised by scottmcm, so this should be ready for review.

dtolnay

The implementation looks good to me, but since this is intended to be a performance improvement only, do you have a benchmark that other people could run to confirm the improvement?

matthieu-m · 2019-02-15T18:30:51Z

During development, I used the following benchmark https://gist.github.com/matthieu-m/3a9ab9afc4eee80565c8d95f94db6fa4 where FixedRangeInclusive is a copy/paste of the current version, with the performance improvements of this PR.

On my machine, I get:

Benchmark	Exclusive	Chain	Current	Proposed (vs Current)
sum	1.1315 ns	4.8270 ns	972.33 ps	1.5776 ns (+62%)
triangle foreach	1.1667 ns	1.7881 ns	978.75 ps	1.5791 ns (+61%)
triangle loop	1.1299 ns	1.1643 ms	968.94 ps	975.43 ps (-0.34%)
addmul foreach	1.1611 ms	1.1633 ms	1.1799 ms	1.1644 ms (-1.3%)
addmul loop	1.1689 ms	1.1785 ms	1.1658 ms	1.1638 ms (-0.17%)
triples	960.00 us	889.76 us	1.8435 ms	968.05 us (-47%)

Where we can see that:

The sum and triangle foreach benchmarks are all transformed into a closed form, but not the same one.
The triangle loop and addmul loop benchmarks use the same next() implementation, so are mostly interesting to see the small imprecision of such measurements.
The addmul foreach benchmark is simple enough that LLVM was already optimizing it well, so the small improvement is close to the noise.
The triples benchmark, which originally prompted this work, shows a x2 improvement.

Overall, the new RangeInclusive implementation performs on par with the Range one (modulo closed form), as the performance cliff on triples is smoothed out.

dtolnay

Thanks!

dtolnay · 2019-02-15T18:59:20Z

@bors r+

bors · 2019-02-15T18:59:21Z

📌 Commit 4fed67f has been approved by dtolnay

bors · 2019-02-15T19:48:31Z

⌛ Testing commit 4fed67f with merge 74beb94d93bb043326dc2b1c80d53bbc054031d1...

bors · 2019-02-15T19:52:08Z

💔 Test failed - checks-travis

rust-highfive · 2019-02-15T19:52:08Z

Your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

pietroalbini · 2019-02-15T20:22:59Z

@bors retry -- apparently we can't clone the repo anymore on macOS

RangeInclusive internal iteration performance improvement. Specialize `Iterator::try_fold` and `DoubleEndedIterator::try_rfold` to improve code generation in all internal iteration scenarios. This changes brings the performance of internal iteration with `RangeInclusive` on par with the performance of iteration with `Range`: - Single conditional jump in hot loop, - Unrolling and vectorization, - And even Closed Form substitution. Unfortunately, it only applies to internal iteration. Despite various attempts at stream-lining the implementation of `next` and `next_back`, LLVM has stubbornly refused to optimize external iteration appropriately, leaving me with a choice between: - The current implementation, for which Closed Form substitution is performed, but which uses 2 conditional jumps in the hot loop when optimization fail. - An implementation using a `is_done` boolean, which uses 1 conditional jump in the hot loop when optimization fail, allowing unrolling and vectorization, but for which Closed Form substitution fails. In the absence of any conclusive evidence as to which usecase matters most, and with no assurance that the lack of Closed Form substitution is not indicative of other optimizations being foiled, there is no way to pick one implementation over the other, and thus I defer to the statu quo as far as `next` and `next_back` are concerned.

@ghost

Rollup of 16 pull requests Successful merges: - #58100 (Transition librustdoc to Rust 2018) - #58122 (RangeInclusive internal iteration performance improvement.) - #58199 (Add better error message for partial move) - #58227 (Updated RELEASES.md for 1.33.0) - #58353 (Check the Self-type of inherent associated constants) - #58453 (SGX target: fix panic = abort) - #58476 (Remove `LazyTokenStream`.) - #58526 (Special suggestion for illegal unicode curly quote pairs) - #58595 (Turn duration consts into associated consts) - #58609 (Allow Self::Module to be mutated.) - #58628 (Optimise vec![false; N] to zero-alloc) - #58643 (Don't generate minification variables if minification disabled) - #58648 (Update tests to account for cross-platform testing and miri.) - #58654 (Do not underflow after resetting unmatched braces count) - #58658 (Add expected/provided byte alignments to validation error message) - #58667 (Reduce Miri-related Code Repetition `like (n << amt) >> amt`) Failed merges: r? @ghost

rust-highfive assigned dtolnay Feb 3, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 3, 2019

scottmcm requested changes Feb 3, 2019

View reviewed changes

src/libcore/iter/range.rs Show resolved Hide resolved

scottmcm mentioned this pull request Feb 4, 2019

What degree of panic safety is expected for iterator adapters? #58170

Closed

dtolnay added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 6, 2019

Fix exhaustion of inclusive range try_fold and try_rfold

4fed67f

dtolnay added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 11, 2019

dtolnay reviewed Feb 14, 2019

View reviewed changes

dtolnay approved these changes Feb 15, 2019

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 15, 2019

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 15, 2019

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 15, 2019

Centril mentioned this pull request Feb 20, 2019

Rollup of 12 pull requests #58594

Closed

Centril mentioned this pull request Feb 23, 2019

Rollup of 16 pull requests #58664

Closed

Centril mentioned this pull request Feb 23, 2019

Rollup of 16 pull requests #58665

Closed

Centril mentioned this pull request Feb 23, 2019

Rollup of 15 pull requests #58666

Closed

Centril mentioned this pull request Feb 23, 2019

Rollup of 16 pull requests #58668

Closed

Centril mentioned this pull request Feb 23, 2019

Rollup of 16 pull requests #58669

Merged

bors merged commit 4fed67f into rust-lang:master Feb 23, 2019

matthieu-m deleted the range_incl_perf branch February 24, 2019 09:21

frol mentioned this pull request Feb 28, 2019

Big performance problem with closed intervals looping #45222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RangeInclusive internal iteration performance improvement. #58122

RangeInclusive internal iteration performance improvement. #58122

matthieu-m commented Feb 3, 2019 •

edited

Loading

rust-highfive commented Feb 3, 2019

matthieu-m commented Feb 3, 2019

matthieu-m commented Feb 10, 2019

dtolnay left a comment

matthieu-m commented Feb 15, 2019 •

edited

Loading

dtolnay left a comment

dtolnay commented Feb 15, 2019

bors commented Feb 15, 2019

bors commented Feb 15, 2019

bors commented Feb 15, 2019

rust-highfive commented Feb 15, 2019

pietroalbini commented Feb 15, 2019

RangeInclusive internal iteration performance improvement. #58122

RangeInclusive internal iteration performance improvement. #58122

Conversation

matthieu-m commented Feb 3, 2019 • edited Loading

rust-highfive commented Feb 3, 2019

matthieu-m commented Feb 3, 2019

matthieu-m commented Feb 10, 2019

dtolnay left a comment

Choose a reason for hiding this comment

matthieu-m commented Feb 15, 2019 • edited Loading

dtolnay left a comment

Choose a reason for hiding this comment

dtolnay commented Feb 15, 2019

bors commented Feb 15, 2019

bors commented Feb 15, 2019

bors commented Feb 15, 2019

rust-highfive commented Feb 15, 2019

pietroalbini commented Feb 15, 2019

matthieu-m commented Feb 3, 2019 •

edited

Loading

matthieu-m commented Feb 15, 2019 •

edited

Loading