Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect should wait for an acceptable connection #1603

Closed
Tracked by #2523
Stebalien opened this issue Jun 17, 2022 · 8 comments · Fixed by #1604, #2542 or #2547
Closed
Tracked by #2523

Connect should wait for an acceptable connection #1603

Stebalien opened this issue Jun 17, 2022 · 8 comments · Fixed by #1604, #2542 or #2547
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/architecture Core architecture of project P1 High: Likely tackled by core team if no one steps up

Comments

@Stebalien
Copy link
Member

Currently, it waits for any connection, then returns. Unfortunately, this means that if the user calls host.NewStream(peer) to try to get a new stream to a non-connected peer, we'll fail every single time if we try to create a stream behind a NAT.

The difficulty here is that we aren't "in control", and don't know if the remote peer will actually directly connect to us. It's up to the receiver to establish a direct connection.

I think the solution here is to wait a bit after we establish a transient connection to see if a non-transient connection comes through. The tricky part will be actually implementing that...

@marten-seemann marten-seemann added exp/expert Having worked on the specific codebase is important P1 High: Likely tackled by core team if no one steps up kind/architecture Core architecture of project effort/weeks Estimated to take multiple weeks labels Jun 20, 2022
@marten-seemann
Copy link
Contributor

Notes after syncing with @Stebalien:

  • The dial worker loop currently only cares about addresses we’re dialing, not incoming connections. Hole punched connections are incoming connections though, so we might end up waiting for any dial to finish, while we already have a perfectly usable direct connection.
  • The dial worker loop might be simplified by uncoupling the dial result from the specific dial call. Maybe by having an Add(ma.Multiaddr) function that joins a new dial job, and then Get() <-chan network.Conn function that returns a channel with all successfully established functions (maybe pass a filter function to Get).

@BigLep
Copy link
Contributor

BigLep commented Aug 26, 2022

@marten-seemann : should this be opened or closed? I wasn't sure given the PR was merged but it was reopened.

@BigLep
Copy link
Contributor

BigLep commented May 11, 2023

@marten-seemann @p-shahi : is this still an issue? (I'm asking because I'm trying to figure out if can close ipfs/kubo#9751 ). Thanks.

@marten-seemann
Copy link
Contributor

Yes, we haven't fixed this yet.

@lidel
Copy link
Member

lidel commented Jul 24, 2023

Gentle bump, just for the record ipfs/kubo#9751 is still a problem.

@Jorropo
Copy link
Contributor

Jorropo commented Jul 24, 2023

I'll submit a repro later this week.

@sukunrt
Copy link
Member

sukunrt commented Sep 1, 2023

The PR I've raised changes host.NewStream and swarm.NewStream to wait for a direct connection.

Why to do this in NewStream as opposed to DialPeer.

  • I don't want to change the current behaviour of DialPeer returning a relayed connection when network.WithUseTransient is not used. Maybe this the intention of the implementation was to not return a relayed connection but this behaviour has been there for a while and we'll end up subtly breaking those clients.
  • This PR with swarm: fix DialPeer behaviour for transient connections #2547 provides us with reasonable behaviour of DialPeer depending on network.ForceDirectDial and NewStream depending on network.WithUseTransient.
  • I want to avoid modifying the dial worker loop to wait for direct connections. In future I see us changing the dial worker loop to take a list of address as input and providing a connection on one of those addresses. This would help us provide an option like this Allow encouraging (or forcing) dials with specific properties #2412. This is specifically for not modifying the worker loop as suggested here

The dial worker loop might be simplified by uncoupling the dial result from the specific dial call. Maybe by having an Add(ma.Multiaddr) function that joins a new dial job, and then Get() <-chan network.Conn function that returns a channel with all successfully established functions (maybe pass a filter function to Get).

@sukunrt
Copy link
Member

sukunrt commented Sep 18, 2023

Need to merge #2542 to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/architecture Core architecture of project P1 High: Likely tackled by core team if no one steps up
Projects
Status: 🎉 Done
6 participants