Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass along the received item to the next receiver if the task was cancelled #147

Merged
merged 2 commits into from
Aug 16, 2020

Conversation

agronholm
Copy link
Owner

@agronholm agronholm commented Aug 16, 2020

Fixes #146.

@agronholm agronholm requested a review from njsmith August 16, 2020 10:59
@njsmith
Copy link
Collaborator

njsmith commented Aug 16, 2020

At a quick skim I didn't notice any reasons why this can't work. It does seem more complicated than the "un-cancel" approach discussed in the issue thread though, and also less correct (because of the thing where this allows the buffer to overflow).

@mjwestcott
Copy link
Contributor

At a quick skim I didn't notice any reasons why this can't work. It does seem more complicated than the "un-cancel" approach discussed in the issue thread though, and also less correct (because of the thing where this allows the buffer to overflow).

The "un-cancel" approach is more accurately an "ignore cancel" approach though, isn't it? At least, that would be a fair way to describe the version I posted, it just continues past the cancellation — so I don't think it is fully correct either.

@agronholm
Copy link
Owner Author

Thanks for the review. I feel that the un-cancel option is less correct since it actually breaks the contract imposed by cancellation – a cancelled task should not proceed unless it explicitly catches that exception.

@agronholm
Copy link
Owner Author

I thought about having the sender task check if the receiver task has been cancelled, but that won't work either since this can happen:

  1. The receiver task starts blocking on receive()
  2. The sender task sends an item
  3. Another task cancels the receiver task's scope (or the scope's deadline is reached)
  4. The receiver task is resumed and receives the cancellation exception

@njsmith
Copy link
Collaborator

njsmith commented Aug 16, 2020

It's common that a cancellation can arrive "too late" to take effect on the current operation, and has to wait to be delivered on the next operation instead. That's what happens in the case where the cancel arrives after the send but before receive returns.

....I guess I am assuming that anyio's cancellation is stateful, like Trio's, and I'm not sure if you managed to implement that on asyncio/curio or not.

@agronholm
Copy link
Owner Author

....I guess I am assuming that anyio's cancellation is stateful, like Trio's, and I'm not sure if you managed to implement that on asyncio/curio or not.

I'm not entirely sure what you mean, but anyio's cancel scopes should work (at least within anyio code) the same way as trio's do. When a scope is cancelled, any code that hits a checkpoint within that scope gets a cancellation exception raised.

@mjwestcott
Copy link
Contributor

It's common that a cancellation can arrive "too late" to take effect on the current operation, and has to wait to be delivered on the next operation instead. That's what happens in the case where the cancel arrives after the send but before receive returns.

In this scenario, I don't think we want to give the receiving task until the next operation before then being re-cancelled. I think we want the receive to succeed and take an item or to be cancelled and not take an item. Once we're in a state where we've got an item and been cancelled, it's already too late isn't it?

Of course, the task that is calling receive can be cancelled at any point after the receive, but I'm not sure whether that is relevant.

@agronholm
Copy link
Owner Author

In this scenario, I don't think we want to give the receiving task until the next operation before then being re-cancelled. I think we want the receive to succeed and take an item or to be cancelled and not take an item. Once we're in a state where we've got an item and been cancelled, it's already too late isn't it?

The problem we faced here was that the task was both cancelled and given an item before it could be resumed to deal with either event. And since the sender by then is long gone, we can't have the it retry the operation with a new receiver. Thus, the receiver may end up in a situation where it's holding a sent item but its task has been cancelled. I did the only thing I could think of: pass it along to the next receiver in line (and if it was already about to receive a next item, pass it to the next receiver from there). The really awkward situation arises when there are no waiting receivers in the queue. The only sensible thing to do then is to put it in the buffer, even if that causes the buffer size to go over the limit. The only alternative is dropping the item. Fortunately this should be a rare case so I don't foresee this causing much trouble in practice.

I may come up with a better solution with enough time.

Of course, the task that is calling receive can be cancelled at any point after the receive, but I'm not sure whether that is relevant.

Yeah, it's not relevant then.

@agronholm agronholm merged commit bd9a310 into master Aug 16, 2020
@agronholm agronholm deleted the fix-146 branch August 16, 2020 18:40
@njsmith
Copy link
Collaborator

njsmith commented Aug 16, 2020

I'm not entirely sure what you mean, but anyio's cancel scopes should work (at least within anyio code) the same way as trio's do. When a scope is cancelled, any code that hits a checkpoint within that scope gets a cancellation exception raised.

"Stateful" cancellation means that if the code keeps executing checkpoints inside a cancelled scope, then it keeps getting repeated cancellation exceptions.

with trio.CancelScope as cscope:
   cscope.cancel()
   try:
        await checkpoint()  # raises trio.Cancelled
   finally:
        await checkpoint()  # *also* raises trio.Cancelled, because we're still in a cancelled scope

So if you discard the cancellation in receive, then it doesn't really discard the cancellation, it just defers it until the next checkpoint. Which is the same thing that both Trio and asyncio do if the receive task gets cancelled on the same tick as it succeeds, and the cancellation happens just after the send: they try to deliver a cancellation to receive, but discover that it's too late, so they let receive return normally and then deliver the cancellation later.

(I guess this whole thread also suggests you might want some better primitives to implement things like this?)

@mjwestcott
Copy link
Contributor

mjwestcott commented Aug 16, 2020

Thanks, that's useful. What if there are no more checkpoints within the cancel scope, won't that result in cancellation being discarded? Based on #146:

async with anyio.move_on_after(1):
    await r.receive()
# Continue, meaning cancellation is discarded...

@agronholm
Copy link
Owner Author

Indeed it would mean just that.

@agronholm
Copy link
Owner Author

So if you discard the cancellation in receive, then it doesn't really discard the cancellation, it just defers it until the next checkpoint. Which is the same thing that both Trio and asyncio do if the receive task gets cancelled on the same tick as it succeeds, and the cancellation happens just after the send: they try to deliver a cancellation to receive, but discover that it's too late, so they let receive return normally and then deliver the cancellation later.

As @mjwestcott pointed out, I don't think we can just ignore the cancellation because it would mean the cancelled task might do something meaningful with the data it received, and the cancellation may never end up being acted on if there are no more checkpoints left in the code.

@mjwestcott
Copy link
Contributor

mjwestcott commented Aug 16, 2020

The strange thing is that in the move_on_after case with only one checkpoint (or on the last checkpoint), dropping the cancellation seems fine — no invariant is broken as far as I can tell. We do "move on" after 1 second, right?

With move_on_after and multiple checkpoints (not the last), then discarding the cancellation is perhaps also fine — the cancellation will be delivered next (so it is late).

Is there another case — maybe not based on move_on_after — where discarding cancellation is a problem?

@njsmith
Copy link
Collaborator

njsmith commented Aug 17, 2020

It's a general fact about cancellations that every operation has a "point of no return", where if the cancellation arrives after that then it gets deferred until the next checkpoint, or dropped if there is no next checkpoint. That's fine and correct – if you try to cancel something that already completed, then the cancellation should be a no-op.

@smurfix
Copy link
Collaborator

smurfix commented Aug 17, 2020

… for now.
The problem is one of timing. If I understand this problem correctly, the sequence of events

  • receiver gets cancelled
  • sender queues something, which ends up in the just-cancelled receiver's slot
  • receiver gets around to processing its own cancellation

could happen in this order. If so, that's a problem and re-queuing is probably the best you can do. If not, letting the receiver proceed until it hits its next cancellation point is perfectly OK.

@mjwestcott
Copy link
Contributor

It's a general fact about cancellations that every operation has a "point of no return", where if the cancellation arrives after that then it gets deferred until the next checkpoint, or dropped if there is no next checkpoint. That's fine and correct – if you try to cancel something that already completed, then the cancellation should be a no-op.

@njsmith Thanks. In that case, as you said earlier, the "un-cancel" approach seems to be favourable over this PR: it turns a cancellation into a no-op if there are no more checkpoints, whereas this PR can break the semantics of the stream by overflowing (worst case scenario: with max_buffer_size=0, a sender can succeed, but the item is left in the buffer with no receiver).

@agronholm would you consider reverting this change in favour of the other approach?

If so, that's a problem and re-queuing is probably the best you can do. If not, letting the receiver proceed until it hits its next cancellation point is perfectly OK.

@smurfix The sequence you mentioned is what we're discussing. I think the first two bullets can happen in either order. Unless I'm missing something, letting the receiver proceed until the next cancellation point (including if none exists) solves the problem — there's no need for re-queueing.

@mjwestcott
Copy link
Contributor

I think we want the receive to succeed and take an item or to be cancelled and not take an item

It's surprising to me, but deferring the cancellation achieves this.

@agronholm
Copy link
Owner Author

@agronholm would you consider reverting this change in favour of the other approach?

I've just released v2.0.0b1 but since it's a pre-release version, I can change the semantics at will before the final. I will think about it.

@agronholm
Copy link
Owner Author

I've made the changes locally but my latest test seems to hang on the trio backend. Investigating.

@agronholm
Copy link
Owner Author

The test just fails due to task scheduling differences and my test expected the scheduling to work differently. The exact situation I want to test against is pretty hard to achieve in a controlled fashion.

agronholm added a commit that referenced this pull request Aug 17, 2020
As per the discussion on #147, it's better to ignore the cancellation exception now and have it triggered at the next checkpoint than to push the item to the buffer, potentially going over the buffer's limit.
@agronholm
Copy link
Owner Author

Ok, I've got it now. I've pushed the change to master. It should be noted that your original bug test script still raises curio.errors.TaskTimeout, but that is something I have to deal with separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Object stream randomly drops items
4 participants