Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make BufferedChannelIterator#hasNext idempotent #4065

Merged
merged 3 commits into from
Apr 11, 2024

Conversation

gitpaxultek
Copy link
Contributor

The rationale in favor of the change includes:

  • in the 1.6.x series Channel().iterator().hasNext() was idempotent
  • despite the lack of requirement for the Iterator#hasNext method implementations to be idempotent, it is generally safer to make them such. The same applies to ChannelIterator#hasNext

Here is an example where it matters. Consider the following code snippet:

val chunk = mutableListOf<Int>()
while(iter.hasNext()) {
  val v = iter.next()
  chunk.add(v)
  if (chunk.size >= MAX_BATCH_SIZE || !iter.hasNext()) {
    yield(chunk.toList())
    chunk.clear()
  }
}

We address two concerns with one block of code - we run the if-block on reaching the chunk size limit or on the last chunk.

If #hasNext isn't idempotent, I see a couple of options. The first solution is, the code would essentially have to execute the content of the if-block in two places - in the loop and after the loop:

val chunk = mutableListOf<Int>()
while(iter.hasNext()) {
  val v = iter.next()
  chunk.add(v)
  if (chunk.size >= MAX_BATCH_SIZE) {
    yield(chunk.toList())
    chunk.clear()
  }
}

if (chunk.isNotEmpty) {
    yield(chunk.toList())
}

Alternatively, we can do memoization of #hasNext ourselves e.g.:

val chunk = mutableListOf<Int>()
var hasNextMemo = iter.hasNext()
while(hasNextMemo) {
  val v = iter.next()
  chunk.add(v)
  hasNextMemo = iter.hasNext()
  if (chunk.size >= MAX_BATCH_SIZE || !hasNextMemo) {
    yield(chunk.toList())
    chunk.clear()
  }
}

Either way, the lack of idempotency of #hasNext spills the concern of knowing the state of the iterator over into the caller's code.

The rationale in favor of the change includes:
* in the `1.6.x` series `Channel().iterator().hasNext()` was idempotent
* despite the lack of requirement for the `Iterator#hasNext` method implementations
 to be idempotent, it is generally safer to make them such. The same applies
 to `ChannelIterator#hasNext`

Here is an example where it matters. Consider the following code snippet:

```
val chunk = mutableListOf<Int>()
while(iter.hasNext()) {
  val v = iter.next()
  chunk.add(v)
  if (chunk.size >= MAX_BATCH_SIZE || !iter.hasNext()) {
    yield(chunk.toList())
    chunk.clear()
  }
}
```

We address two concerns with one block of code - we run the if-block on reaching the chunk size
limit or on the last chunk.

If `#hasNext` isn't idempotent, I see a couple of options. The first solution is, the code would essentially have to execute the content of the if-block in two places - in the loop and after the loop:

```
val chunk = mutableListOf<Int>()
while(iter.hasNext()) {
  val v = iter.next()
  chunk.add(v)
  if (chunk.size >= MAX_BATCH_SIZE) {
    yield(chunk.toList())
    chunk.clear()
  }
}

if (chunk.isNotEmpty) {
    yield(chunk.toList())
}
```

Alternatively, we can do memoization of `#hasNext` ourselves e.g.:

```
val chunk = mutableListOf<Int>()
var hasNextMemo = iter.hasNext()
while(hasNextMemo) {
  val v = iter.next()
  chunk.add(v)
  hasNextMemo = iter.hasNext()
  if (chunk.size >= MAX_BATCH_SIZE || !hasNextMemo) {
    yield(chunk.toList())
    chunk.clear()
  }
}
```

Either way, the lack of idempotency of `#hasNext` spills the concern of knowing
the state of the iterator over into the caller's code.
@dkhalanskyjb
Copy link
Collaborator

The first solution is, the code would essentially have to execute the content of the if-block in two places - in the loop and after the loop

Yep, that's the best one. This way, you aren't wasting time on checking "is this the end already?" on every iteration, which even looks awkward. Also, this version can be adapted to the more idiomatic for-loop more easily:

import kotlinx.coroutines.channels.Channel
import kotlinx.coroutines.flow.flow

const val MAX_BATCH_SIZE = 100

suspend fun main() {
    val channel = Channel<Int>()
    val chunk = mutableListOf<Int>()
    flow {
        for (v in channel) {
            chunk.add(v)
            if (chunk.size >= MAX_BATCH_SIZE) {
                emit(chunk.toList())
                chunk.clear()
            }
        }
        if (chunk.isNotEmpty()) {
            emit(chunk.toList())
        }
    }
}

@gitpaxultek
Copy link
Contributor Author

@dkhalanskyjb In my opinion either solution is not great because we are addressing the concern that the iterator could have easily addressed.

I guess in case one writes some generic code that never knows what iterator it receives, one should guard against the possibility of an iterator not having #hasNext implemented as idempotent. For that case, the solution with the memorization could end up being a better option in some cases. For example, let's say that content of the if-block isn't just a mere emit or yield call. That it might be a few lines of code that deal with the chunk immediately and the block uses some vars created before the loop. So instead of creating a lamda and calling it twice, you might choose to do memorization.

I think the code that uses Channel().iterator() isn't limited to that. It knows it deals with the specific iterator implementation of ChannelIterator. Since it's currently the only implementation, it seems, the KDoc of ChannelIterator can also state its #hasNext implementations are idempotent

@dkhalanskyjb dkhalanskyjb merged commit c1a16f8 into Kotlin:develop Apr 11, 2024
@dkhalanskyjb
Copy link
Collaborator

Thanks!

Corje pushed a commit that referenced this pull request May 10, 2024
The rationale in favor of the change includes:
* in the `1.6.x` series `Channel().iterator().hasNext()` was idempotent
* despite the lack of requirement for the `Iterator#hasNext` method implementations
 to be idempotent, it is generally safer to make them such. The same applies
 to `ChannelIterator#hasNext`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants