Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUIC] Fixed disabled receives if buffer gets drained before RECEIVE returns #72439

Merged
merged 2 commits into from
Jul 19, 2022

Conversation

ManickaP
Copy link
Member

@ManickaP ManickaP commented Jul 19, 2022

Root cause: we're reading in a tight loop, so when RECEIVE event arrived, we'd copy some data (based on buffer capacity) and return SUCCESS with partial receive. But the reading loop thread was faster and it would drain the buffer before the event handler returned to MsQuic. Now we have ReadAsync waiting on the _receiveTcs (waiting on another RECEIVE event) and MsQuic waiting on StreamReceiveSetEnabled since we did partial data receive.

The fix: check if the receive buffer has a capacity (i.e.: someone read from it in the mean time) and if so immediately re-enable receives with QUIC_STATUS_CONTINUE

Interestingly, this manifested only on Windows, so I ran the offending test in a tight loop for ~10 minutes without any failures (normally it'd manifest in less than 1 minute).

Fixes #72196
Fixes #72422

@ghost
Copy link

ghost commented Jul 19, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Root cause: we're reading in a tight loop, so when RECEIVE event arrived, we'd copy some data (based on buffer capacity) and return SUCCESS with partial receive. But the reading loop thread was faster and it would drain the buffer before the event handler returned to MsQuic. Now we have ReadAsync waiting on the _receiveTcs (waiting on another RECEIVE event) and MsQuic waiting on StreamReceiveSetEnabled since we did partial data receive.

The fix: check if the receive buffer has a capacity (i.e.: someone read from it in the mean time) and if so immediately re-enable receives with QUIC_STATUS_CONTINUE

Interestingly, this manifested only on Windows, so I ran the offending test in a tight loop for ~10 minutes without any failures (normally it'd manifest in less than 1 minute).

Fixes #72196

Author: ManickaP
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@@ -509,7 +509,7 @@ private unsafe int HandleEventReceive(ref RECEIVE data)
_receiveTcs.TrySetResult();

data.TotalBufferLength = totalCopied;
return QUIC_STATUS_SUCCESS;
return _receiveBuffers.HasCapacity() ? QUIC_STATUS_CONTINUE : QUIC_STATUS_SUCCESS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if the buffers are consumed from a read thread just a moment after you check HasCapasity? Will we still have - however small but still - a probability of a deadlock (if I understood your description correctly)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read thread will see _receivedNeedsEnable is true and call StreamReceiveSetEnabled. It'll get queued in msquic and processed after the RECEIVE event finished, re-enabling the reads.

if (totalCopied > 0 && Interlocked.CompareExchange(ref _receivedNeedsEnable, 0, 1) == 1)
{
unsafe
{
ThrowHelper.ThrowIfMsQuicError(MsQuicApi.Api.ApiTable->StreamReceiveSetEnabled(
_handle.QuicHandle,
1),
"StreamReceivedSetEnabled failed");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The (initial) problem happens when this line on Read thread

if (totalCopied > 0 && Interlocked.CompareExchange(ref _receivedNeedsEnable, 0, 1) == 1)

executes earlier than this line on MsQuic thread

Volatile.Write(ref _receivedNeedsEnable, 1);

meaning we are able to catch that situation any time after Volatile.Write, including on return.

I have one more question:
Is it ok when we both return CONTINUE and also schedule EnableReceive afterwards? What do you think about including the same CompareExchange here as well:

Suggested change
return _receiveBuffers.HasCapacity() ? QUIC_STATUS_CONTINUE : QUIC_STATUS_SUCCESS;
return (_receiveBuffers.HasCapacity() && Interlocked.CompareExchange(ref _receivedNeedsEnable, 0, 1) == 1) ? QUIC_STATUS_CONTINUE : QUIC_STATUS_SUCCESS;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, let me check if that works without breaking anything.
That would prevent queuing StreamReceiveSetEnabled if we've already re-enabled receives via CONTINUE status --> -1 mindless P/Invoke.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, looks good, I'll update the PR.

Copy link
Member

@CarnaViire CarnaViire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ManickaP
Copy link
Member Author

Failures look like #72429, i.e.: -1073740791 exit code on remote executors.

@ManickaP ManickaP merged commit 518d6cb into dotnet:main Jul 19, 2022
@ManickaP ManickaP deleted the mapichov/72196-fix branch July 19, 2022 16:00
@karelz karelz added this to the 7.0.0 milestone Aug 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants