Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WebSocketOptions.Timeout to ClientWebSocket #48729

Closed
bjornen77 opened this issue Feb 24, 2021 · 39 comments · Fixed by #105841
Closed

Add WebSocketOptions.Timeout to ClientWebSocket #48729

bjornen77 opened this issue Feb 24, 2021 · 39 comments · Fixed by #105841
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Net in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@bjornen77
Copy link
Contributor

bjornen77 commented Feb 24, 2021

Edited by @CarnaViire

Justification

This API allows the users to identify and close idle WebSocket connections without having to rely on the TCP KeepAlive timeout, which can be way too big (e.g. ~15 min on Linux), and can be impossible or very hard to configure.

// see the original issue in the end of the post

API Proposal

namespace System.Net.WebSockets;

// used to create a base WebSocket (ManagedWebSocket) instance
public class WebSocketCreationOptions
{
    // existing
    //public TimeSpan KeepAliveInterval { get; set; } // default: infinite (off)

    // new
    public TimeSpan KeepAliveTimeout { get; set; } = Timeout.InfiniteTimeSpan;

    // ...
}

// used to configure a ClientWebSocket instance
public class ClientWebSocketOptions
{
    // existing
    //[UnsupportedOSPlatform("browser")]
    //public TimeSpan KeepAliveInterval { get; set; } // default: 30s

    // new
    [UnsupportedOSPlatform("browser")]
    public TimeSpan KeepAliveTimeout { get; set; } = Timeout.InfiniteTimeSpan;

    // ...
}

Usage

For ClientWebSocket:

var ws = new ClientWebSocket();
ws.Options.KeepAliveTimeout = TimeSpan.FromSeconds(10);
await ws.ConnectAsync(uri, cts.Token);

For ManagedWebSocket:

var options = new WebSocketCreationOptions()
{
    KeepAliveInterval = WebSocket.DefaultKeepAliveInterval,
    KeepAliveTimeout = TimeSpan.FromSeconds(10)
};
var ws = WebSocket.CreateFromStream(stream, options);

Notes on behavior

Timeout.InfiniteTimeSpan or TimeSpan.Zero would disable the timeout logic and fall back to the existing "unsolicited" Pong keep-alive (per understanding in #35473 (comment) we cannot send Pings without waiting for the reply)

Any other positive timeout value would turn on Ping-Pong logic; we will be sending Pings instead of Pongs and track whether we've received Pongs in time; if not, the connection is treated as stale and aborted. The behavior and implementation should be similar to HTTP/2 KeepAlivePingTimeout.

Alternative Naming

We could name the new property KeepAlivePingTimeout. This will "break" it's similarity to KeepAliveInterval, but it might help highlight that Pings, not Pongs, are being sent (if we care enough about this distinction).

namespace System.Net.WebSockets;

public class WebSocketCreationOptions
{
    //public TimeSpan KeepAliveInterval { get; set; } // existing
    public TimeSpan KeepAlivePingTimeout { get; set; } // new

    // ...
}

public class ClientWebSocketOptions
{
    //public TimeSpan KeepAliveInterval { get; set; } // existing
    public TimeSpan KeepAlivePingTimeout { get; set; } // new

    // ...
}

Existing API shape for other keep-alives

For comparison:

// HTTP/2 in runtime:
namespace System.Net.Http;
public class SocketsHttpHandler
{
    public TimeSpan KeepAlivePingDelay { get; set; } // default: infinite (off)
    public TimeSpan KeepAlivePingTimeout { get; set; } // default: 20s (N/A when delay=inf)
    public HttpKeepAlivePingPolicy KeepAlivePingPolicy { get; set; } // default: Always
    // ...
}

// HTTP/2 in Kestrel:
namespace Microsoft.AspNetCore.Server.Kestrel.Core;
public class Http2Limits
{
    public TimeSpan KeepAlivePingDelay { get; set; } // default: infinite (off)
    public TimeSpan KeepAlivePingTimeout { get; set; } // default: 20s (N/A when delay=inf)
    // ...
}

// Sockets in runtime:
namespace System.Net.Sockets;
public enum SocketOptionName
{
    TcpKeepAliveInterval, // between probe retries, if no answer received
    TcpKeepAliveRetryCount, // probe retries before deemed idle, if no answer received
    TcpKeepAliveTime, // before the first probe, after the last received byte
    // ...
}

Original issue by @bjornen77

I think that adding a WebSocketOptions.Timeout property would be a good thing.

Today, the ClientWebSocket is closed when the TCP Retransmission times out(if the connection to the server drops for some reason). This timeout seems to be different on Windows and Linux. By default, on Windows it takes about 21 seconds and on Linux it takes about 15 minutes before the WebSocket is closed. It would be good if we could have a separate Timeout setting, so that we get the same behavior regardless of the platform. And also that this could be configured by the application without having to modify the TCP Timeout settings in the OS... The websocket should still be disconnected if the OS TCP Timeout is reached.

The ClientWebSocket today only sends "Pong" to the server, and therefore it does not wait for a "Pong" response to arrive. I think that the ClientWebSocket shall send a "Ping" instead, and then verify that a "Pong" is received within the Timeout set. If not received, the connection shall be closed. Or at least it should expose the "received "Pong" message in the WebSocketReceiveResult.MessageType. It would be good if an received "Ping" message from a server is also exposed. Then the application could use these Ping/Pong message types to implement its own timeout logic.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net untriaged New issue has not been triaged by the area owner labels Feb 24, 2021
@ghost
Copy link

ghost commented Feb 24, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

I think that adding a WebSocketOptions.Timeout property would be a good thing.

Today, the ClientWebSocket is closed when the TCP Retransmission times out. This timeout seems to be different on Windows and Linux. By default, on Windows it takes about 21 seconds and on Linux it takes about 15 minutes before the WebSocket is closed. It would be good if we could have a separate Timeout setting, so that we get the same behavior regardless of the platform.

The ClientWebSocket today only sends "Pong" to the server, and therefore it does not wait for a "Pong" response to arrive. I think that the ClientWebSocket shall send a "Ping" instead, and then verify that a "Pong" is received within the Timeout set. If not received, the connection shall be closed. Or at least it should expose the "Ping" and "Pong" messages in the WebSocketReceiveResult.MessageType. Then the application could use these Ping/Pong message types to implement its own timeout logic.

Author: bjornen77
Assignees: -
Labels:

area-System.Net, untriaged

Milestone: -

@bjornen77
Copy link
Contributor Author

Related: dotnet/aspnetcore#24614

@ManickaP
Copy link
Member

This is a duplicate of: #35473
We expose KeepAliveInterval but we use it only for sending Pongs to keep the connection open:

// This exists purely to keep the connection alive; don't wait for the result, and ignore any failures.
// The call will handle releasing the lock. We send a pong rather than ping, since it's allowed by
// the RFC as a unidirectional heartbeat and we're not interested in waiting for a response.
ValueTask t = SendFrameLockAcquiredNonCancelableAsync(MessageOpcode.Pong, true, ReadOnlyMemory<byte>.Empty);

We might revisit the original decision to not to send Pings.

@bjornen77
Copy link
Contributor Author

bjornen77 commented Feb 25, 2021

I think it should be revisited. Today's implementation relies on the OS TCP/IP timeout setting to be able to determine when the WebSocket has been disconnected. It is not always wanted to change this setting in the OS as it effect all applications running there.

Also. this OS setting differs by default between Windows and Linux. By default on Linux it takes about 15 minutes before the client realizes that the server is not there anymore. I think that it would be good to be able to set a timeout value on the ClientWebSocket itself(and use the PING/PONG frames internally to detect if a timeout occurs). Exposing the PONG messages is not needed if this is handled internally(preferred).

@ManickaP ManickaP removed the untriaged New issue has not been triaged by the area owner label Feb 25, 2021
@ManickaP ManickaP added this to the Future milestone Feb 25, 2021
@ManickaP
Copy link
Member

ManickaP commented Feb 25, 2021

Triage: we are open to adding proper ping/pong sending. At the moment, there's only this one issue asking for it so we'll leave it to future.
We should consider starting with fixed timeout before adding API.

@ManickaP ManickaP added the enhancement Product code improvement that does NOT require public API changes/additions label Feb 25, 2021
@bjornen77
Copy link
Contributor Author

If adding a new Timeout setting/property maybe the following would be something to consider when implementing:

  • Timeout value is set to <=0, the old behavior is used, where only PONGs are sent.
  • Timeout value is set to > 0, the new PING/PONG sequence, with timeout detection is used.

@osexpert
Copy link

osexpert commented Feb 26, 2021

Triage: we are open to adding proper ping/pong sending. At the moment, there's only this one issue asking for it so we'll leave it to future.

Not so. Issue dotnet/aspnetcore#24614 is already asking for the same (and duplicate dotnet/aspnetcore#26117)

@aroman
Copy link

aroman commented Jun 24, 2021

Strong +1 for this issue. This is an important feature for applications like games where detecting a broken connection quickly is important for user experience. The WebSocket API does not expose the ability to send ping or pong messages, so there isn't even really a way to add this functionality in our applications without creating our own works-like ping/pong messages (which aren't really ping/pong messages but regular websocket messages).

Just one point of clarification in the original issue — OP mentions that the current implementation only sends pong, rather than ping. I don't think that's true, from what I can see in ManagedWebSocket:518. I think the issue is simply that the implementation doesn't wait for a pong in response to the ping.

The WebSocket RFC does not explicitly specify that clients should disconnect in the case a pong is not received in response to a ping within a certain timeframe. I think this would be a good thing to do though, as a user-configurable option. It should probably be default-off, so the change doesn't impact any applications which are somehow depending on very long retry times. Just adding an extra WebSocketOptions for this as OP suggests is the right approach IMHO.

@CarnaViire
Copy link
Member

Note: there's another ask for this feature #60620

@karelz
Copy link
Member

karelz commented Nov 1, 2021

@gdalsnes @aroman can you please upvote top post? That will help us track number of people who need this behavior. Thanks!

@karelz karelz added api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed enhancement Product code improvement that does NOT require public API changes/additions labels Nov 1, 2021
@osexpert
Copy link

osexpert commented Nov 1, 2021

I guess SignalR uses its own pingpong, and is covering over for this flaw for most use cases? But I use it with vs-streamjsonrpc and ended up implementing my own pingpong, and used 99.9% time on that and the rest on actual work... My testing involved killing processes, pulling network cables, supending/resuming/killing VM's, "pulling" network "cables" within VM's etc. I easily got into situations with infinite hang (waited over night).

@CarnaViire
Copy link
Member

CarnaViire commented Nov 1, 2021

I've updated the root description with the API proposal

@osexpert
Copy link

osexpert commented Nov 1, 2021

I've updated the root description with the API proposal

But will this cover aspnetcore websocket as well? It should also have the same timeout option, else it can hang forever and stall request pipelines (potentional DOS problem?)

@karelz
Copy link
Member

karelz commented Nov 1, 2021

@gdalsnes what do you mean by "aspnetcore WebSocket"?
ASP.NET WebSocket is based on our implementation to our best knowledge. Unless there is some wrapper around our APIs, all should be good. We do not restrict what frameworks/apps above us do.

@CarnaViire
Copy link
Member

@gdalsnes good point. This will need an additional API from ASP.NET Core side as well. I suspect it should be added to ExtendedWebSocketAcceptContext the way WebSocket compression was added... @BrennanConroy will you be able to help with ASP.NET Core side please?

@BrennanConroy
Copy link
Member

WebSocketAcceptContext would likely need a new property, something like KeepAliveTimeout. I can help out if this feature gets added.

@michaldobrodenka
Copy link

We need to ping from client and server every few seconds to make sure connection is still alive and restart asap if needed. Since this is not possible with official ClientWebSocket, I use modified websocket-sharp for now

@Madguy93
Copy link

I would also like to see this implemented. Our setup is running a websoket server with lots of embedded devices as clients. We run a standardized protocol based on websocket and therefore cannot implement our own ping mechanism or use signalR. It's important since if you run linux for your servers it will take about 15 min for it to detect a disconnect as the default TCP RTO setting in linux.

@CarnaViire
Copy link
Member

Triage: given that the customer ask is high, and it should be relatively easy to implement, optimistically putting it to 9.0.

@CarnaViire CarnaViire removed this from the Future milestone Jan 15, 2024
@CarnaViire CarnaViire added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Jan 18, 2024
@BrennanConroy
Copy link
Member

We'll likely have to manually test with the browser. These kinds of tests aren't great to add to automation since they by design would need to last 20+ seconds due to no APIs in the browser that let us influence the keep alive timeout.

Assuming we add good enough unit tests the one-off manual test should be fine.

@bjornen77 bjornen77 changed the title Suggestion: Add WebSocketOptions.Timeout to ClientWebSocket Add WebSocketOptions.Timeout to ClientWebSocket Jan 22, 2024
@mar1u50
Copy link

mar1u50 commented Feb 28, 2024

It's unfortunate that we can't implement this effectively because ManagedWebSocket.SendFrameAsync is private, preventing us from sending the PING frames ourselves in the application. Additionally, ClientWebSocket.ReceiveAsync doesn't allow the reception of PONG frames in the application.

@CarnaViire
Copy link
Member

@mar1u50 is there a reason you would still need this access on the application level, after the proposed API gets implemented?

@mar1u50
Copy link

mar1u50 commented Feb 28, 2024

I think the only missing part (or maybe I missed this from the API change description) is a sequence number sent in the PING to be able to pair it with the PONG.
We may want to disconnect also if the PING-PONG took too long time.

@CarnaViire
Copy link
Member

The proposed WebSocketCreationOptions.KeepAliveTimeout is exactly the "too long time" you mention -- that will be tracked and applied from within the WebSocket, so it will get disconnected automatically in case of the response timing out. The pairing process will also become just a part of the internal implementation. You still won't be able to send any arbitrary data in the PING frame, but given that its main purpose is for pairing with PONGs, I am not sure that the exact values should matter on the application level.

@mar1u50
Copy link

mar1u50 commented Mar 1, 2024

In systems where there may be other machines between the client and server, such as a CloudFlare node, it is possible that if the client-CloudFlare connection dies, the CloudFlare-server connection will not also die. Likewise, it is possible for the CloudFlare-server connection to die while the client-CloudFlare connection remains active. To prevent either end from remaining in a zombie state, I was thinking that the server could also check if pings are still coming and force a close if they are not. For this, the server should be able to detect if the ping is the one requested by the client (through its payload). It is true that this could be implemented in another way too: the client initiates pings and waits for pongs, and at the same time, the server also initiates pings and waits for pongs.

@CarnaViire
Copy link
Member

CarnaViire commented Mar 1, 2024

It is true that this could be implemented in another way too: the client initiates pings and waits for pongs, and at the same time, the server also initiates pings and waits for pongs.

Yes, I believe this would be the expected approach by RFC. If the server is on Kestrel, the timeout APIs will be introduced as a follow-up to this one (see #48729 (comment)), within the same release.

@BrennanConroy is there a tracking issue for this?

@BrennanConroy
Copy link
Member

I've been using this as the tracking issue, until there is something to do on the Kestrel side.

@CarnaViire CarnaViire added the blocking Marks issues that we want to fast track in order to unblock other important work label Mar 4, 2024
@bartonjs
Copy link
Member

bartonjs commented Mar 5, 2024

Video

Looks good as proposed.

namespace System.Net.WebSockets;

public partial class WebSocketCreationOptions
{
    public TimeSpan KeepAliveTimeout { get; set; } = Timeout.InfiniteTimeSpan;
}

public partial class ClientWebSocketOptions
{
    [UnsupportedOSPlatform("browser")]
    public TimeSpan KeepAliveTimeout { get; set; } = Timeout.InfiniteTimeSpan;
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Mar 5, 2024
@ken-goldstein-igt
Copy link

The proposal is exactly what's needed for my company's servers. With .NET 5-8, the replacement of the unidirectional Ping with a unidirectional Pong broke tens of thousands of clients that didn't have a 3rd party implementation that could understand Pong frames as a heartbeat. The result was chaos, with disconnections about every minute or so.

@Madguy93
Copy link

Madguy93 commented May 6, 2024

Will this be ready for dotnet 9?

@CarnaViire
Copy link
Member

@Madguy93 it is currently planned for .NET 9, yes.

@osexpert
Copy link

I wonder... Maybe you could also add two new DateTimes, so we can better get a view of what is happening internally in the WS?


DateTime? DataLastTransmittedUtc
DateTime? DataLastRecievedUtc

These would be updated whenever data (including ping\pong) is transmitted\recieved.

@CarnaViire CarnaViire self-assigned this Jun 5, 2024
@chrsin
Copy link

chrsin commented Jun 26, 2024

@CarnaViire any estimate on which preview it would make? (I won't hold you to it)

Right now our options seem to be as follows and would like to make an informed decision.

  1. start Development with a preview version of dotnet 9 (unless its far out in the future)
  2. Use something else than AspNet for handling the Websocket part of our application.
  3. Use custom messages to track when the connection is lost

@CarnaViire
Copy link
Member

any estimate on which preview it would make?

Most possibly Preview 7, scheduled for the middle of August @chrsin

@CarnaViire
Copy link
Member

I wonder... Maybe you could also add two new DateTimes, so we can better get a view of what is happening internally in the WS?


DateTime? DataLastTransmittedUtc
DateTime? DataLastRecievedUtc

These would be updated whenever data (including ping\pong) is transmitted\recieved.

@osexpert these timestamps, or rather, the difference between UtcNow and these timestamps, will always be less or equal to KeepAliveInterval + KeepAliveTimeout. And you will need to go and regularly check these values manually, let's say in a loop with some interval, e.g. KeepAliveTimeout/2.

BUT if you already need go and check the WebSocket instance in a regular fashion, and your goal is to notify when the connection is torn down -- instead of the timestamps, you can just check the WebSocket.State property, and raise the notification when it becomes Aborted (or Closed, or CloseReceived, etc)

It seems redundant to check the timestamps, "do the math" (and manual comparison with a timeout) to deduce the state, when the state would be already computed and available in Websocket.State.

@CarnaViire
Copy link
Member

any estimate on which preview it would make?

Most possibly Preview 7, scheduled for the middle of August @chrsin

Update: this didn't make it in time for Preview 7; it is now targeting RC1 instead (September)

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 1, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Net in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.