Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ipfs/go-bitswap] Reevaluate MessageQueue dial logic #123

Open
dirkmc opened this issue Feb 3, 2020 · 3 comments
Open

[ipfs/go-bitswap] Reevaluate MessageQueue dial logic #123

dirkmc opened this issue Feb 3, 2020 · 3 comments

Comments

@dirkmc
Copy link
Contributor

dirkmc commented Feb 3, 2020

Currently the MessageQueue will wait up to 10 minutes for a dial to a peer to succeed:
https://github.com/ipfs/go-bitswap/blob/5030b8dd39738778c2cc621a1d7b35cca010ba28/internal/messagequeue/messagequeue.go#L471-L477

However IIUC we only ever try to send messages to peers that we are already connected to.

Proposal:

  • remove the retry logic in MessageQueue
  • if the dial times out, remove the peer from the session
@Stebalien
Copy link
Member

Do we remove peers from sessions when they disconnect?

Note: We're still probably going to want a retry in the message sender as the first connection we try may actually be dead, even if we have a separate connection. Unfortunately, we can easily get into a state where:

  1. Our connection gets cut (e.g., some nasty middlebox sens a reset or the other peer just decides to kill it).
  2. The other peer reconnects before we notice (1). We now have two connections.
  3. The other peer asks us for a block using the new connection.
  4. We try to send it back on the old connection.

Currently, we'll fail at step 4 and retry. If we removed the retry, we'd just drop the message entirely.


Really, we need to re-evaluate how we handle "connections" and state. (ipfs/specs#201)

@dirkmc
Copy link
Contributor Author

dirkmc commented Feb 5, 2020

Do we remove peers from sessions when they disconnect?

Yes, the PeerManager tells the Session when a peer disconnects

  1. The other peer asks us for a block using the new connection.
  2. We try to send it back on the old connection.

Note that the reconnect code referred to above is on the client side: the 10 minute dial timeout is for when we're sending a want.

MessageQueue will attempt to send the message 10 times. So in the scenario you're describing of our connection failing, and the other side opening a new connection, we would expect:

  • the first send attempt to fail
  • the second attempt to reconnect (using the new connection)
  • successful send

Is that correct?

@Stebalien
Copy link
Member

Note that the reconnect code referred to above is on the client side: the 10 minute dial timeout is for when we're sending a want.

Oh. Never mind.

Is that correct?

Probably? Technically, we could have a couple of failures (multiple semi-dead connections). Really, I'd try several times while we think we're connected.

At the very least, we can remove the connect logic. Really, we can (and probably should?) use https://godoc.org/github.com/libp2p/go-libp2p-core/network#WithNoDial to avoid dialing.

@Jorropo Jorropo changed the title Reevaluate MessageQueue dial logic [ipfs/go-bitswap] Reevaluate MessageQueue dial logic Jan 27, 2023
@Jorropo Jorropo transferred this issue from ipfs/go-bitswap Jan 27, 2023
@aschmahmann aschmahmann reopened this Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants