-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling CompletionStage.asDeferred() on the same future many times causes stack overflows #4156
Comments
Root cause The root cause seems to be a bad interaction between
Normally CompletableFuture runs all the callbacks iteratively, but due to triggering the |
Workaround The above issue can be worked around by breaking the recursive fun <T> CompletableFuture<T>.safeAsDeferred(): Deferred<T> {
val parent = this
val safeCompletableFuture = object : CompletableFuture<T>() {
override fun cancel(mayInterruptIfRunning: Boolean): Boolean {
return if (!parent.isDone) {
parent.cancel(mayInterruptIfRunning)
} else {
parent.isCancelled
}
}
}
parent.handle {
result, exception ->
if (exception == null) {
safeCompletableFuture.complete(result)
} else {
safeCompletableFuture.completeExceptionally(exception)
}
}
return safeCompletableFuture.asDeferred()
} When used in place of
And if you don't care about cancelling the future.thenApply { it }.asDeferred() |
Great job figuring out the problem! I think this is worth reporting to the JDK maintainers as well. As far as I can see, nothing in the docs indicates that this may happen. Though nothing says that it won't, I think:
I think this should be fixed on the side of the JDK as well by detecting the same thread entering the cancellation procedure more than once. That said, we can work around this explicitly on our side: #4173 The downside of this workaround is that when several threads cancel coroutines related to one future, only one of them will deal with the work related to the cancellation now (as opposed to many threads sharing the load), but I don't think that's a significant issue. |
Describe the bug
Given a shared
CompletableFuture
objectfuture
:future
, e.g.val didRun = AtomicBoolean(); future.whenComplete { _, _ -> didRun.set(true) }
future.asDeferred()
many times. Generally, 10,000 times is sufficient to trigger the failure. The resultingDeferred<*>
don't have to be used.future
exceptionally, e.g.future.completeExceptionally(RuntimeException())
.What happened? What should have happened instead?
Actual: Some of the
Deferred<*>
results would not complete. Some of the extra transformations in (1) do not run.The exception thrown from the completed
Deferred<*>
objects may contain a suppressedStackOverflowError
. In testing, in most of the cases there is no indication that anything failed, and the StackOverflowError can be observed by setting a breakpoint inside CompletableFuture's internal machinery.Expected: all the
Deferred<*>
results should complete exceptionally, and additional transforms attached in (1) should run.Why would you do this?
The above condition can happen when you use an asynchronous cache, e.g. Caffeine.newBuilder()...buildAsync().
In such use cases, an underlying
CompletableFuture
-based cache would return a cached (and shared)CompletableFuture
object, and code that is using the cache would callCompletionStage.asDeferred()
to consume the cached future asynchronously.In exceptional conditions, e.g. when the backend computation gated behind the cache slows to a crawl and then starts failing, the above described condition can happen - a single
CompletableFuture
is returned to a bunch of cache consumers while everybody is waiting for the computation to complete. If the incoming request rate is high enough, you can easily end up with 10,000 waiters on the sameCompletableFuture
.Then, when backend computation finally fails, this bug is triggered.
Note: this bug is particular insidious because it is very hard to trigger normally, and when it does, caches can behave very unexpectedly.
In the case of Caffeine, because some transforms fail to run (1 above), Caffeine does not get notified that the
CompletableFuture
fail and keeps returning the same failedCompletableFuture
to future callers.Why not CompletionStage.await()
Because the
CompletableFuture
is shared between multiple callers. UsingCompletionStage.await()
will allow one caller's cancellation to break all other concurrent callers waiting on the same future.Provide a Reproducer
This is a
*.main.kts
. Run withkotlin <something>.main.kts <num calls>
.Versions:
Kotlin version 2.0.0-release-341 (JRE 21.0.3)
Results:
$ kotlin test.main.kts 100
: no bug$ kotlin test.main.kts 10000
: bug, with various failure modesThe text was updated successfully, but these errors were encountered: