Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore error if can't write back echoed protocol in negotiate #87

Merged

Conversation

MarcoPolo
Copy link
Contributor

@MarcoPolo MarcoPolo commented Jun 16, 2022

We would fail to negotiate if we couldn't echo back the protocol id. It should be valid to fail to write back the protocol ID and return the negotiate stream so that the caller may read data from the stream.

This pattern is common when a dialer opens a new stream, sends some data, then closes the stream immediately. (bitswap, graphsync). Without this there's really nothing a caller can do besides sleep enough to let the other side send it's multistream protocol response.

Fixes:

This is a regression from #85. NegotiateLazy would not fail to negotiate for the listener.

r? @marten-seemann cc @Stebalien

@MarcoPolo MarcoPolo marked this pull request as ready for review June 17, 2022 05:41
@marten-seemann
Copy link
Contributor

We would fail to negotiate if we couldn't echo back the protocol id. It should be valid to fail to write back the protocol ID and return the negotiate stream so that the caller may read data from the stream.

Agreed.

This pattern is common when a dialer opens a new stream, sends some data, then closes the stream immediately. (bitswap, graphsync). Without this there's really nothing a caller do besides sleep.

I'm trying to understand what this means in the QUIC stream state machine, where every stream is basically composed of two unidirectional streams that can separately be closed / reset. If we fail to write our multistream response, that must mean that the peer reset the write side of our stream. That also means that while we'll be able to read what the peer sent on the stream, we will never be able to write anything.
Is my interpretation correct? In what cases would the peer's behavior make sense? It sounds like a "fire and forget" kind of message.

@marten-seemann marten-seemann self-requested a review June 17, 2022 07:40
@MarcoPolo
Copy link
Contributor Author

I'm trying to understand what this means in the QUIC stream state machine, where every stream is basically composed of two unidirectional streams that can separately be closed / reset.
If we fail to write our multistream response, that must mean that the peer reset the write side of our stream.

Yep

That also means that while we'll be able to read what the peer sent on the stream, we will never be able to write anything.

Yep.

Is my interpretation correct?

Yep.

In what cases would the peer's behavior make sense? It sounds like a "fire and forget" kind of message.

There's nothing that says a peer can't do this. As far as I've seen we allow a peer to:

  1. write data to a stream
  2. close that stream and have the writes flushed and delivered to the peer.

Nothing says that a peer has to wait for an ack response before closing the stream. That wouldn't make sense when you already have the reliability gaurantees of tcp or quic.

While I'm not sure the graphsync/bitswap use case is an efficient use of resources (opening a transient stream to send a single message doesn't seem great), I don't think they are doing anything wrong as clients.

It is like a fire and forget message, but they want to make sure the message gets to the destination (as opposed to a unreliable datagram).


The issues we've seen around this only happen when there is really low latency between the nodes. What actually causes the bug is:

  1. Peer A writes MULTISTREAM PROTOCOL
  2. Peer B echos MULTISTREAM PROTOCOL
  3. Peer A writes
  4. Peer A writes
  5. Peer A attempts to close their stream (for their reading and writing).
    1. At the same time, Peer B tries to send back
    2. Peer B's negotiate fails to send this back since the write stream to peer A is closed.
    3. Peer B's negotiate bubbles up this error. The caller resets the stream.
  6. Peer A gets the stream reset error and returns it to the caller.
  7. Both sides end up with an error and no one is happy :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants