Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator::array_chunks() #92393

Closed
wants to merge 2 commits into from

Conversation

rossmacarthur
Copy link
Contributor

@rossmacarthur rossmacarthur commented Dec 29, 2021

This has been similarly implemented as .tuples() in itertools as Itertools::tuples(). But it makes more sense with arrays since all elements are the same type.

I will update stability attributes with a tracking issue if accepted.

See also #92394

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 29, 2021
@the8472
Copy link
Member

the8472 commented Dec 29, 2021

See #87776 for a previous PR attempting that, some of the discussion might be relevant.

@rossmacarthur
Copy link
Contributor Author

@the8472 Thanks, I will check it out. Compared the benchmarks and that implementation is definitely a lot more performant 😅. I will look into improving this one using some ideas from that one.

@rossmacarthur
Copy link
Contributor Author

rossmacarthur commented Jan 2, 2022

Okay so I have basically taken #87776 and improved it, most notably the following:

  • Handle remainder correctly in DoubleEndedIterator implementation by requiring ExactSizeIterator implementation to remove the correct amount elements. It behaves like this now which I believe should be the correct behavior.
    let mut it = (0..5).array_chunks();
    assert_eq!(it.next_back(), Some([2, 3])):
    assert_eq!(it.next_back(), Some([0, 1])):
    assert_eq!(it.remainder(), &[5)):
  • Fill array in reverse in the DoubleEndedIterator implementation, this improves performance over the previous PR.

Using the benchmark functions the performance is now the following

test iter::bench_iter_array_chunks_fold                         ... bench:     749,479 ns/iter (+/- 91,429)
test iter::bench_iter_array_chunks_loop                         ... bench:     751,425 ns/iter (+/- 74,197)
test iter::bench_iter_array_chunks_rev_loop                     ... bench:     751,228 ns/iter (+/- 74,905)
test iter::bench_iter_array_chunks_rfold                        ... bench:     697,941 ns/iter (+/- 61,837)

Compared to previously

test iter::bench_iter_array_chunks_fold                         ... bench:     688,546 ns/iter (+/- 50,318)
test iter::bench_iter_array_chunks_loop                         ... bench:   1,039,287 ns/iter (+/- 40,049)
test iter::bench_iter_array_chunks_rev_loop                     ... bench:   1,202,417 ns/iter (+/- 79,646)
test iter::bench_iter_array_chunks_rfold                        ... bench:     850,933 ns/iter (+/- 20,082)


/// Returns a reference to the remaining elements of the original iterator
/// that are not going to be returned by this iterator. The returned slice
/// has at most `N-1` elements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to mention that it is only useful after next() is None ?

Copy link
Member

@the8472 the8472 Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Nevermind, my comment was nonsense, I forgot that this doesn't have access to the underlying source.

Copy link
Contributor Author

@rossmacarthur rossmacarthur Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to mention that it is only useful after next() is None ?

Yeah, this is a good point. I was thinking it might also be useful for this to return an Option so that it easy to tell the difference between an empty remainder and a not yet known remainder. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably suggest a single fn into_remainder(self) -> Option<array::IntoIter> method, and that'll also let us get rid of the custom remainder struct entirely I think? (We can directly store the Option<array::IntoIter> as the second field).

library/core/src/iter/traits/iterator.rs Outdated Show resolved Hide resolved

/// Returns a reference to the remaining elements of the original iterator
/// that are not going to be returned by this iterator. The returned slice
/// has at most `N-1` elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably suggest a single fn into_remainder(self) -> Option<array::IntoIter> method, and that'll also let us get rid of the custom remainder struct entirely I think? (We can directly store the Option<array::IntoIter> as the second field).

struct FrontGuard<T, const N: usize> {
/// A pointer to the array that is being filled. We need to use a raw
/// pointer here because of the lifetime issues in the fold implementations.
ptr: *mut T,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could store the array directly inside the guard?

}

#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
impl<I, const N: usize> DoubleEndedIterator for ArrayChunks<I, N>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may make sense to not have double ended iterators? It feels a little unclear that the semantics of cutting tail elements are the ones users will expect. Do we have a use case for backwards chunk iteration?

It seems plausible that we could expose a dedicated .skip_remainder() -> ArrayChunksRev<I, N> or similar, perhaps, to more explicitly skip it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think because .rev().array_chunks() is different to .array_chunks().rev() it is worth having. It is also worth noting that the following APIs have a DoubleEndedIterator implementation that behave in this way and cut the tail elements.

However I do realize since iterators are often used in a consuming way this might be strange. Also it does require ExactSizeIterator bound. Additionally, I haven't personally found a use case for this so I am happy to remove this implementation.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 10, 2022
// SAFETY: `array` will still be valid if `guard` is dropped.
let mut guard = unsafe { FrontGuard::new(&mut array) };

for slot in array.iter_mut() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: for any next method where there's a loop, consider just using self.try_for_each(ControlFlow::Break).break_value() instead. That's simpler code, less unsafe, gives better coverage of your try_fold override, and should be just as fast -- or even better for inner iterators that have their own try_fold overrides.

let (lower, upper) = self.iter.size_hint();
// Keep infinite iterator size hint lower bound as `usize::MAX`. This
// is required to implement `TrustedLen`.
if lower == usize::MAX {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. The lower bound can be usize::MAX without being infinite -- take .repeat(x).take(usize::MAX), for example, or just [(); usize::MAX].into_iter(). Nor is (usize::MAX, None) certain to be infinite -- especially on 16-bit platforms.

Adapters that make things shorter can propagate ExactSizeIterator, but not TrustedLen. (And, conversely but not relevant here, adapters that makes things longer, like Chain, can propagate TrustedLen but not ExactSizeIterator.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks, so you are saying we can't implement TrustedLen for this at all because it makes the iterator shorter? I think I got a little confused by this explanation in the TrustedLen documentation. It might need to be extended a little.

/// The iterator reports a size hint where it is either exact
/// (lower bound is equal to upper bound), or the upper bound is [`None`].
/// The upper bound must only be [`None`] if the actual iterator length is
/// larger than [`usize::MAX`]. In that case, the lower bound must be
/// [`usize::MAX`], resulting in an [`Iterator::size_hint()`] of
/// `(usize::MAX, None)`.
///
/// The iterator must produce exactly the number of elements it reported
/// or diverge before reaching the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks, so you are saying we can't implement TrustedLen for this at all because it makes the iterator shorter?

That's correct.

For a specific example, if the actual length is usize::MAX + 3 or usize::MAX + 5, the size_hint will be (usize::MAX, None).

But array_chunks::<2> would need to return a length of exactly usize::MAX/2 + 2 or usize::MAX/2 + 3 respectively for them, which it can't, so thus it can't be TrustedLen.

(Similar logic explains why https://doc.rust-lang.org/std/iter/struct.Skip.html doesn't implement TrustedLen.)

// SAFETY: `array` will still be valid if `guard` is dropped.
let mut guard = unsafe { FrontGuard::new(&mut array) };

let result = self.iter.try_fold(init, |mut acc, item| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really useful functionality, so consider whether there's a way to expose it on Iterator directly without the adapter, and then make the adapter really simple. (Vague analogy: https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.array_chunks is a simple wrapper around https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_chunks )

For example, imagine you had

pub trait Iterator {
    fn next_chunk<const N: usize>(&mut self) -> Result<[T; N], array::IntoIter<T, N>>;
}

That's usable independently, overridable for array and slice iterators that can do it more efficiently, and would be exactly the piece you need to make the ArrayChunks iterator easy to write in safe code.

(It'd be nice if it were -> Result<[T; N], array::IntoIter<T, N-1>>, but that's probably not writable right now.)

Copy link
Member

@the8472 the8472 Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be useful to manually unroll an iterator. E.g. we know that a vec goes from 0 to 8 capacity with its default allocation behavior (for small T) and then grows in multiples from there. So we could try unrolling the loop in batches of N <= 8. But this might still be difficult to optimize for Chain adapters if a middle part straddles the chain boundary.

A try_fold_chunked might be easier to optimize because it would split the loop into 3 parts. Going from try_fold_chunked to chunked_next is easier than the other way around.


// SAFETY: The array will still be valid if `guard` is dropped and
// it is forgotten otherwise.
let mut guard = unsafe { FrontGuard::new(&mut array) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's lots of unsafe here, as has often been the case in PRs for new methods using const generic arrays.

Please re-use the stuff from core::array for this (see

fn try_collect_into_array<I, T, R, const N: usize>(iter: &mut I) -> Option<R::TryType>
for example). That might mean adding a new crate-local or unstable method to the array module, or something, but that's better than making more versions of these Guard types in more places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it makes sense to have this stuff in core::array 👍

@JohnCSimon
Copy link
Member

ping from triage:
@rossmacarthur Is this PR ready for review or are you still working in it?

FYI: when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR appears in the reviewer's backlog.

@bors
Copy link
Contributor

bors commented Mar 10, 2022

☔ The latest upstream changes (presumably #94787) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC
Copy link
Member

closing this as inactive

@Dylan-DPC Dylan-DPC closed this Apr 8, 2022
@Dylan-DPC Dylan-DPC removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 8, 2022
@Dylan-DPC Dylan-DPC added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Apr 8, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 25, 2022
…-ou-se

Add `Iterator::next_chunk`

See also rust-lang#92393

### Prior art

-  [`Itertools::next_tuple()`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.next_tuple)

### Unresolved questions

- Should we also add `next_chunk_back` to `DoubleEndedIterator`?
- Should we rather call this `next_array()` or `next_array_chunk`?
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Aug 13, 2022
Add `Iterator::array_chunks` (take N+1)

A revival of rust-lang#92393.

r? `@Mark-Simulacrum`
cc `@rossmacarthur` `@scottmcm` `@the8472`

I've tried to address most of the review comments on the previous attempt. The only thing I didn't address is `try_fold` implementation, I've left the "custom" one for now, not sure what exactly should it use.
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Aug 14, 2022
Add `Iterator::array_chunks` (take N+1)

A revival of rust-lang#92393.

r? `@Mark-Simulacrum`
cc `@rossmacarthur` `@scottmcm` `@the8472`

I've tried to address most of the review comments on the previous attempt. The only thing I didn't address is `try_fold` implementation, I've left the "custom" one for now, not sure what exactly should it use.
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
Add `Iterator::next_chunk`

See also rust-lang/rust#92393

### Prior art

-  [`Itertools::next_tuple()`](https://docs.rs/itertools/latest/itertools/trait.Itertools.html#method.next_tuple)

### Unresolved questions

- Should we also add `next_chunk_back` to `DoubleEndedIterator`?
- Should we rather call this `next_array()` or `next_array_chunk`?
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
Add `Iterator::array_chunks` (take N+1)

A revival of rust-lang/rust#92393.

r? `@Mark-Simulacrum`
cc `@rossmacarthur` `@scottmcm` `@the8472`

I've tried to address most of the review comments on the previous attempt. The only thing I didn't address is `try_fold` implementation, I've left the "custom" one for now, not sure what exactly should it use.
@rossmacarthur rossmacarthur deleted the ft/array-chunks branch November 27, 2022 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.