Make `BufferedChannelIterator#hasNext` idempotent #4065

gitpaxultek · 2024-03-14T18:09:55Z

The rationale in favor of the change includes:

in the 1.6.x series Channel().iterator().hasNext() was idempotent
despite the lack of requirement for the Iterator#hasNext method implementations to be idempotent, it is generally safer to make them such. The same applies to ChannelIterator#hasNext

Here is an example where it matters. Consider the following code snippet:

val chunk = mutableListOf<Int>()
while(iter.hasNext()) {
  val v = iter.next()
  chunk.add(v)
  if (chunk.size >= MAX_BATCH_SIZE || !iter.hasNext()) {
    yield(chunk.toList())
    chunk.clear()
  }
}

We address two concerns with one block of code - we run the if-block on reaching the chunk size limit or on the last chunk.

If #hasNext isn't idempotent, I see a couple of options. The first solution is, the code would essentially have to execute the content of the if-block in two places - in the loop and after the loop:

val chunk = mutableListOf<Int>()
while(iter.hasNext()) {
  val v = iter.next()
  chunk.add(v)
  if (chunk.size >= MAX_BATCH_SIZE) {
    yield(chunk.toList())
    chunk.clear()
  }
}

if (chunk.isNotEmpty) {
    yield(chunk.toList())
}

Alternatively, we can do memoization of #hasNext ourselves e.g.:

val chunk = mutableListOf<Int>()
var hasNextMemo = iter.hasNext()
while(hasNextMemo) {
  val v = iter.next()
  chunk.add(v)
  hasNextMemo = iter.hasNext()
  if (chunk.size >= MAX_BATCH_SIZE || !hasNextMemo) {
    yield(chunk.toList())
    chunk.clear()
  }
}

Either way, the lack of idempotency of #hasNext spills the concern of knowing the state of the iterator over into the caller's code.

The rationale in favor of the change includes: * in the `1.6.x` series `Channel().iterator().hasNext()` was idempotent * despite the lack of requirement for the `Iterator#hasNext` method implementations to be idempotent, it is generally safer to make them such. The same applies to `ChannelIterator#hasNext` Here is an example where it matters. Consider the following code snippet: ``` val chunk = mutableListOf<Int>() while(iter.hasNext()) { val v = iter.next() chunk.add(v) if (chunk.size >= MAX_BATCH_SIZE || !iter.hasNext()) { yield(chunk.toList()) chunk.clear() } } ``` We address two concerns with one block of code - we run the if-block on reaching the chunk size limit or on the last chunk. If `#hasNext` isn't idempotent, I see a couple of options. The first solution is, the code would essentially have to execute the content of the if-block in two places - in the loop and after the loop: ``` val chunk = mutableListOf<Int>() while(iter.hasNext()) { val v = iter.next() chunk.add(v) if (chunk.size >= MAX_BATCH_SIZE) { yield(chunk.toList()) chunk.clear() } } if (chunk.isNotEmpty) { yield(chunk.toList()) } ``` Alternatively, we can do memoization of `#hasNext` ourselves e.g.: ``` val chunk = mutableListOf<Int>() var hasNextMemo = iter.hasNext() while(hasNextMemo) { val v = iter.next() chunk.add(v) hasNextMemo = iter.hasNext() if (chunk.size >= MAX_BATCH_SIZE || !hasNextMemo) { yield(chunk.toList()) chunk.clear() } } ``` Either way, the lack of idempotency of `#hasNext` spills the concern of knowing the state of the iterator over into the caller's code.

dkhalanskyjb · 2024-03-15T09:55:57Z

The first solution is, the code would essentially have to execute the content of the if-block in two places - in the loop and after the loop

Yep, that's the best one. This way, you aren't wasting time on checking "is this the end already?" on every iteration, which even looks awkward. Also, this version can be adapted to the more idiomatic for-loop more easily:

import kotlinx.coroutines.channels.Channel
import kotlinx.coroutines.flow.flow

const val MAX_BATCH_SIZE = 100

suspend fun main() {
    val channel = Channel<Int>()
    val chunk = mutableListOf<Int>()
    flow {
        for (v in channel) {
            chunk.add(v)
            if (chunk.size >= MAX_BATCH_SIZE) {
                emit(chunk.toList())
                chunk.clear()
            }
        }
        if (chunk.isNotEmpty()) {
            emit(chunk.toList())
        }
    }
}

gitpaxultek · 2024-03-15T14:41:52Z

@dkhalanskyjb In my opinion either solution is not great because we are addressing the concern that the iterator could have easily addressed.

I guess in case one writes some generic code that never knows what iterator it receives, one should guard against the possibility of an iterator not having #hasNext implemented as idempotent. For that case, the solution with the memorization could end up being a better option in some cases. For example, let's say that content of the if-block isn't just a mere emit or yield call. That it might be a few lines of code that deal with the chunk immediately and the block uses some vars created before the loop. So instead of creating a lamda and calling it twice, you might choose to do memorization.

I think the code that uses Channel().iterator() isn't limited to that. It knows it deals with the specific iterator implementation of ChannelIterator. Since it's currently the only implementation, it seems, the KDoc of ChannelIterator can also state its #hasNext implementations are idempotent

kotlinx-coroutines-core/common/src/channels/BufferedChannel.kt

kotlinx-coroutines-core/common/test/channels/BufferedChannelTest.kt

dkhalanskyjb · 2024-04-11T13:19:29Z

Thanks!

The rationale in favor of the change includes: * in the `1.6.x` series `Channel().iterator().hasNext()` was idempotent * despite the lack of requirement for the `Iterator#hasNext` method implementations to be idempotent, it is generally safer to make them such. The same applies to `ChannelIterator#hasNext`

qwwdfsad requested a review from dkhalanskyjb March 15, 2024 10:04

dkhalanskyjb reviewed Apr 11, 2024

View reviewed changes

kotlinx-coroutines-core/common/src/channels/BufferedChannel.kt Outdated Show resolved Hide resolved

Use identity comparison

9394535

dkhalanskyjb reviewed Apr 11, 2024

View reviewed changes

kotlinx-coroutines-core/common/test/channels/BufferedChannelTest.kt Outdated Show resolved Hide resolved

Fix a typo

9e77058

dkhalanskyjb merged commit c1a16f8 into Kotlin:develop Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `BufferedChannelIterator#hasNext` idempotent #4065

Make `BufferedChannelIterator#hasNext` idempotent #4065

gitpaxultek commented Mar 14, 2024

dkhalanskyjb commented Mar 15, 2024

gitpaxultek commented Mar 15, 2024

dkhalanskyjb commented Apr 11, 2024

Make BufferedChannelIterator#hasNext idempotent #4065

Make BufferedChannelIterator#hasNext idempotent #4065

Conversation

gitpaxultek commented Mar 14, 2024

dkhalanskyjb commented Mar 15, 2024

gitpaxultek commented Mar 15, 2024

dkhalanskyjb commented Apr 11, 2024

Make `BufferedChannelIterator#hasNext` idempotent #4065

Make `BufferedChannelIterator#hasNext` idempotent #4065