swarm/: Support generic connection management #2824

mxinden · 2022-08-17T09:06:25Z

Description

Today a user can:

Set a limit for incoming/outgoing/pending connections, global and per peer.

Close a connection via NetworkBehaviour.

Keep a connection alive via ConnectionHandler::keep_alive.

A user is restricted in (1) as they can not do anything beyond setting upper bound limits. E.g. they can not decide whether to accept or deny an incoming connection based on the IP, the PeerId or on their current CPU or memory utilization.

(Taken from #2118 (comment).)

I see 3 ways how we can enable users to do advanced connection management:

Make the pool::ConnectionCounter generic and allow users to provide their own.
- Yet another moving piece.
- When not Boxed requires an additional generic parameter.
- Can only do connection management. Not as powerful as the NetworkBehaviour trait and thus limited in its decision making.
Extend NetworkBehaviour with methods called to review a pending or established connection.
- Yet another set of methods on NetworkBehaviour.
- Has knowledge of the full state, thus allows powerful decision making.
- Users are already familiar with it.
Emit a SwarmEvent::ReviewConnection requiring the user to manually accept each pending/established connection via e.g. Swarm::accept().
- Complicates the getting-started experience. User has to explicitly call a method to make progress. Not intuitive.

Motivation

Allows downstream users to do advanced connection management and can simplify existing implementations working around this today.

Are you planning to do it yourself in a pull request?

Yes

The text was updated successfully, but these errors were encountered:

mxinden · 2022-08-17T09:07:02Z

I am currently working on proposal (2), i.e. extending NetworkBehaviour. Hope to publish a first proof-of-concept this week.

thomaseizinger · 2022-08-17T14:27:33Z

How is (2) going to work in terms of composing behaviours? One NetworkBehaviour declining a connection means it will be denied? i.e. Need full agreement across all NetworkBehaviours?

mxinden · 2022-08-18T09:02:00Z

How is (2) going to work in terms of composing behaviours? One NetworkBehaviour declining a connection means it will be denied? i.e. Need full agreement across all NetworkBehaviours?

Correct.

mxinden · 2022-08-19T04:32:37Z

Draft implementation of (2) is happening on #2828. Still work in progress, but might help understand the idea behind (2). Happy to expand on any of this in written form. No need to dig through the code in case you don't want to.

thomaseizinger · 2022-11-04T03:07:01Z

Going to start on this once #3011 is merged to avoid the churn.

mxinden · 2022-11-04T11:17:12Z

I think we should design this with a concrete use-case in mind. @divagant-martian would you volunteer overseeing
this from the lighthouse user perspective?

@thomaseizinger can you keep @divagant-martian in the loop?

divagant-martian · 2022-11-04T11:23:40Z

@mxinden for sure! I'm also relatively familiar with substrate's peer management and can probably get in touch with the iroh guys to check what needs we have in common and where do those differ

thomaseizinger · 2022-11-08T06:29:11Z

NetworkBehaviour already has quite a few callbacks, although we are getting rid of some of them in #3011. Thus, I am a bit hesitant to continue in the direction of #2828.

So here is an alternative idea:

Make `IntoConnectionHandler` fallible

NetworkBehaviour::new_handler is already invoked for every incoming connection. If a NetworkBehaviour would like to impose a policy on which connections should be established, it can pass the required data to a prototype implementation of IntoConnectionHandler and make the decision as soon as into_handler is called with the PeerId etc.

There are a few drawbacks with the current design but I think those are all fixable:

If somehow possible, I'd like to get rid of IntoConnetionHandler and delay calling new_handler until we have established the connection. This would allows us to directly make new_handler fallible which would make this feature much easier to use. Additionally, it means we won't run into race conditions with simple limit connection managers as new_handler is called with &mut self and thus, will be called sequentially for each connection.
NetworkBehaviourAction::Dial currently also emits an entire handler. To make sure connection limits are enforced in a single place, I'd replace this mechanism with an "OpenInfo" style where Dial can carry some "OpenInfo" struct that is passed back in new_handler.

I see the following benefits of this design:

Good use of the type system: We need a ConnectionHandler for each connection. If we fail to construct one, we can never actually establish the connection.
Synchronous decision making: Async decision making would require some kind of timeout so we would potentially hold every connection for some amount of time before we can consider it to be established.
Small API surface: Very little change to the existing API of NetworkBehaviour.

cc @divagant-martian @dignifiedquire @rkuhn @mxinden

divagant-martian · 2022-11-08T15:50:38Z

The proposal would work very well for us and save a lot of time spent on disconnecting peers we were not interested in having in the first place 👍

mxinden · 2022-11-08T21:44:10Z

f somehow possible, I'd like to get rid of IntoConnetionHandler and delay calling new_handler until we have established the connection. This would allows us to directly make new_handler fallible which would make this feature much easier to use. Additionally, it means we won't run into race conditions with simple limit connection managers as new_handler is called with &mut self and thus, will be called sequentially for each connection.

I am in favor of getting rid of IntoConnectionHandler as in my eyes it adds a lot of complexity.

NetworkBehaviourAction::Dial currently also emits an entire handler. To make sure connection limits are enforced in a single place, I'd replace this mechanism with an "OpenInfo" style where Dial can carry some "OpenInfo" struct that is passed back in new_handler.

It would be wonderful to have a single mechanism only to create a new handler.

Synchronous decision making: Async decision making would require some kind of timeout so we would potentially hold every connection for some amount of time before we can consider it to be established.

In my eyes this is the way to go. We also achieved consensus on this in #2118 (comment).

thomaseizinger · 2022-11-09T02:21:11Z

I've put up a draft PR here that aims to build the foundation for the above idea.

thomaseizinger · 2022-11-09T02:22:30Z

The proposal would work very well for us and save a lot of time spent on disconnecting peers we were not interested in having in the first place +1

A note on this: From the other party's perspective, it is still a regular disconnect, it just happens automatically within the Swarm.

mxinden · 2022-11-09T08:21:27Z

We do still need a mechanism for a NetworkBehaviour to allow or deny pending inbound or outbound connections.

(In #2828 that happens via new methods on NetworkBehaviour.)

divagant-martian · 2022-11-09T11:24:12Z

@mxinden wouldn't it be enough making the new handler function fallible? I as I understand this is part of the proposal. Am I missing something?

thomaseizinger · 2022-11-09T12:30:31Z

Unless I am missing something, the only thing not possible with my proposal is blocking a new incoming connection before the upgrade process is finished.

For any policy that purely operates on the number of connections, these upgrades would be wasted (plus they consume resources).

thomaseizinger · 2022-11-09T12:35:40Z

We could retain a prototype-based system where creating a handler is a two-step process for inbound connections:

Replace new_inbound_handler with on_new_pending_connection that returns a Result.
The Ok of said result is a handler prototype.
The handler prototype has a fallible conversion into the actual handler.

The first failure point can be used for pending inbound connections before the upgrade.
The second failure point can be used for established connections and policies like banning by peer ID.

divagant-martian · 2022-11-09T15:22:44Z

Right, that makes sense and would tackle both cases. Would that be reasonable to add to your current work @thomaseizinger or do you think it should be an effort for a second iteration?

mxinden · 2022-11-09T23:10:48Z

The second failure point can be used for established connections and policies like banning by peer ID.

But that second failure point needs some synchronization mechanism back to a single point of truth, likely living in the NetworkBehaviour, i.e. some central place e.g. counting the number of inbound connections.

Removing the IntoConnectionHandler indirection, adding a NetworkBehaviour::on_pending_connection and returning a ConnectionHandler via NetworkBehaviour::new_handler would resolve the need for a synchronization mechanism.

thomaseizinger · 2022-11-09T23:37:17Z

The second failure point can be used for established connections and policies like banning by peer ID.

But that second failure point needs some synchronization mechanism back to a single point of truth, likely living in the NetworkBehaviour, i.e. some central place e.g. counting the number of inbound connections.

Counting policies would use the first failure point, data driven policies like allow/ban lists would use the latter one.

List-based policies don't need synchronisation.

Removing the IntoConnectionHandler indirection, adding a NetworkBehaviour::on_pending_connection and returning a ConnectionHandler via NetworkBehaviour::new_handler would resolve the need for a synchronization mechanism.

The argument for synchronisation with NB stems from it having more knowledge right? But a pending connection doesn't provide any information apart from the incoming multiaddress. This makes me think that policies that need to take into account pending connections are likely going to be resource-based, i.e. RAM usage, number of connections etc.

Pushing these policies into NB doesn't feel right.

NBs are composed but resource-based policies are likely global.
The owner of Swarm likely knows more how constrained the resources are.

This makes me think that policing pending connections should either happen in the Swarm or some other, non-composed module that a Swarm can be configured with. Perhaps it should even happen on a Transport level as a Transport wrapper?

thomaseizinger · 2022-11-13T21:30:28Z

I've put up a draft PR here that aims to build the foundation for the above idea.

Removing IntoConnectionHandler is proving difficult (see PR comments).

I am tempted to build a version of this that doesn't remove IntoConnectionHandler but makes the conversion fallible instead.

@divagant-martian Would that be sufficient for your usecase?

A connection management behaviour would have to create ConnectionHandler prototypes that have the necessary data embedded to make a decision about the incoming connection.

For example, if you have a list of banned peers, you'd copy this list into the prototype upon constructing it in new_handler.

thomaseizinger · 2022-11-14T07:06:18Z

I put up a draft PR here for what this could look like: #3118

thomaseizinger · 2022-11-15T08:50:42Z

I now have a fully working version of the current codebase without the IntoConnectionHandler abstraction: #3099

I think this can be merged as an independent improvement. We can then later decide how we want to manage pending connections.

Previously, we used the full reference to the `OutEvent` of the `ConnectionHandler` in all implementations of `NetworkBehaviour`. Not only is this very verbose, it is also more brittle to changes. With the current implementation plan for #2824, we will be removing the `IntoConnectionHandler` abstraction. Using a type-alias to refer to the `OutEvent` makes the migration much easier.

Previously, a `ConnectionHandler` was immediately requested from the `NetworkBehaviour` as soon as a new dial was initiated or a new incoming connection accepted. With this patch, we delay the creation of the handler until the connection is actually established and fully upgraded, i.e authenticated and multiplexed. As a consequence, `NetworkBehaviour::new_handler` is now deprecated in favor of a new set of callbacks: - `NetworkBehaviour::handle_pending_inbound_connection` - `NetworkBehaviour::handle_pending_outbound_connection` - `NetworkBehaviour::handle_established_inbound_connection` - `NetworkBehaviour::handle_established_outbound_connection` All callbacks are fallible, allowing the `NetworkBehaviour` to abort the connection either immediately or after it is fully established. All callbacks also receive a `ConnectionId` parameter which uniquely identifies the connection. For example, in case a `NetworkBehaviour` issues a dial via `NetworkBehaviourAction::Dial`, it can unambiguously detect this dial in these lifecycle callbacks via the `ConnectionId`. Finally, `NetworkBehaviour::handle_pending_outbound_connection` also replaces `NetworkBehaviour::addresses_of_peer` by allowing the behaviour to return more addresses to be used for the dial. Resolves #2824. Pull-Request: #3254.

mxinden · 2023-02-28T16:27:30Z

We had multiple requests for rust-libp2p to be able to prevent dialing any private IP addresses. Thus far we have pointed them to the following Transport wrapper:

https://github.com/mxinden/kademlia-exporter/blob/master/src/exporter/client/global_only.rs

Somehow I was operating with the assumption that we can now use the generic NetworkBehaviour connection management system to implement the above. More specifically one would return an Error from NetworkBehaviour::handle_pending_outbound_connection.

Unfortunately this is not quite possible. Say that a NetworkBehaviour emits a dial request with one private and one global IP address. The connection-management NetworkBehaviour can either deny all or allow all, but not deny the private and all the global via NetworkBehaviour::handle_pending_outbound_connection.

Should we support the above use-case, e.g. by returning both a set of additional addresses and a set of blocked addresses from NetworkBehaviour::handle_pending_outbound_connection? Or should we just continue recommending the above Transport wrapper, thus blocking private IP addresses at the Transport layer instead.

I am leaning towards the latter, i.e. not extend the NetworkBehaviour::handle_pending_outbound_connection but instead provide the GlobalIpOnly Transport implementation from the rust-libp2p mono repository.

@thomaseizinger any thoughts on this?

thomaseizinger · 2023-02-28T21:28:47Z

That is something that would be nice to support. It is a bit tricky though.

Even if we pass a mutable reference of the existing addresses into the behaviour, you can never guarantee that another behaviour doesn't add more non-global addresses.

How effective such a global-ip address management is depends on how you compose your behaviours which is quite the foot-gun.

In this case, refusing the dial on the Transport layer is much more effective.

thomaseizinger · 2023-02-28T21:33:41Z

Maybe we've mixed too many concerns here by merging addresses_of_peer into handle_pending_outbound_connection. If we were to undo that, we could gather all addresses before and then pass a custom collection type to handle_pending_outbound_connection that only allows removal and not adding new addresses. (a &mut [Multiaddr] would be too permissive).

mxinden · 2023-03-02T11:56:10Z

Agreed with the concerns above. I suggest continuing with the custom Transport implementation for now.

This patch deprecates the existing connection limits within `Swarm` and uses the new `NetworkBehaviour` APIs to implement it as a plugin instead. Related #2824. Pull-Request: #3386.

Currently, banning peers is a first-class feature of `Swarm`. With the new connection management capabilities of `NetworkBehaviour`, we can now implement allow and block lists as a separate module. We introduce a new crate `libp2p-allow-block-list` and deprecate `Swarm::ban_peer_id` in favor of that. Related #2824. Pull-Request: #3590.

Previously, we used the full reference to the `OutEvent` of the `ConnectionHandler` in all implementations of `NetworkBehaviour`. Not only is this very verbose, it is also more brittle to changes. With the current implementation plan for libp2p#2824, we will be removing the `IntoConnectionHandler` abstraction. Using a type-alias to refer to the `OutEvent` makes the migration much easier.

Previously, a `ConnectionHandler` was immediately requested from the `NetworkBehaviour` as soon as a new dial was initiated or a new incoming connection accepted. With this patch, we delay the creation of the handler until the connection is actually established and fully upgraded, i.e authenticated and multiplexed. As a consequence, `NetworkBehaviour::new_handler` is now deprecated in favor of a new set of callbacks: - `NetworkBehaviour::handle_pending_inbound_connection` - `NetworkBehaviour::handle_pending_outbound_connection` - `NetworkBehaviour::handle_established_inbound_connection` - `NetworkBehaviour::handle_established_outbound_connection` All callbacks are fallible, allowing the `NetworkBehaviour` to abort the connection either immediately or after it is fully established. All callbacks also receive a `ConnectionId` parameter which uniquely identifies the connection. For example, in case a `NetworkBehaviour` issues a dial via `NetworkBehaviourAction::Dial`, it can unambiguously detect this dial in these lifecycle callbacks via the `ConnectionId`. Finally, `NetworkBehaviour::handle_pending_outbound_connection` also replaces `NetworkBehaviour::addresses_of_peer` by allowing the behaviour to return more addresses to be used for the dial. Resolves libp2p#2824. Pull-Request: libp2p#3254.

mxinden mentioned this issue Aug 17, 2022

feat(p2p): always connected peers n0-computer/beetle#256

Open

mxinden mentioned this issue Aug 19, 2022

swarm/: Support generic connection management through NetworkBehaviour #2828

Closed

4 tasks

mxinden mentioned this issue Aug 19, 2022

Use Multiaddr instead of ConnectedPoint in DialError::WrongPeerId #2793

Draft

4 tasks

mxinden mentioned this issue Oct 5, 2022

Allow to read current connection limits #2809

Closed

thomaseizinger mentioned this issue Oct 19, 2022

Deprecate NetworkBehaviour listener callback methods #3040

Closed

thomaseizinger self-assigned this Oct 28, 2022

mxinden mentioned this issue Nov 3, 2022

Backpressure between components #3078

Open

15 tasks

thomaseizinger mentioned this issue Nov 15, 2022

feat(swarm)!: Allow NetworkBehaviours to manage incoming connections #3099

Closed

4 tasks

mxinden mentioned this issue Nov 18, 2022

Swarm: add fn ban_peer_id_with_duration #3140

Closed

dariusc93 mentioned this issue Dec 5, 2022

Auto relay connection dariusc93/rust-ipfs#4

Open

This was referenced Dec 14, 2022

Add builder for tls and quic config #3237

Closed

refactor(swarm)!: don't be generic over Transport #3272

Merged

mxinden mentioned this issue Dec 30, 2022

docs(coding-guidelines): Add request response correlation #3290

Merged

4 tasks

thomaseizinger mentioned this issue Jan 1, 2023

feat(swarm): allow NetworkBehaviours to create and remove listeners #3292

Merged

4 tasks

This was referenced Jan 16, 2023

feat(swarm)!: allow NetworkBehaviours to manage connections #3254

Merged

refactor: expose and use THandlerOutEvent type alias #3368

Merged

This was referenced Jan 23, 2023

Make it possible to not send outgoing traffic to RFC1918 addresses #3370

Closed

gossipsub: Re-write tests to remove technical debt #3371

Open

mergify bot closed this as completed in #3254 Feb 23, 2023

This was referenced Mar 8, 2023

feat: introduce libp2p-connection-limits connection management module #3386

Merged

feat: introduce libp2p-allow-block-list connection management module #3590

Merged

mxinden mentioned this issue Mar 23, 2023

Add GlobalOnly Transport implementation blocking dials to private IPs #3669

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swarm/: Support generic connection management #2824

swarm/: Support generic connection management #2824

mxinden commented Aug 17, 2022

mxinden commented Aug 17, 2022

thomaseizinger commented Aug 17, 2022

mxinden commented Aug 18, 2022

mxinden commented Aug 19, 2022

thomaseizinger commented Nov 4, 2022

mxinden commented Nov 4, 2022

divagant-martian commented Nov 4, 2022

thomaseizinger commented Nov 8, 2022

divagant-martian commented Nov 8, 2022

mxinden commented Nov 8, 2022

thomaseizinger commented Nov 9, 2022

thomaseizinger commented Nov 9, 2022

mxinden commented Nov 9, 2022

divagant-martian commented Nov 9, 2022

thomaseizinger commented Nov 9, 2022

thomaseizinger commented Nov 9, 2022

divagant-martian commented Nov 9, 2022

mxinden commented Nov 9, 2022

thomaseizinger commented Nov 9, 2022

thomaseizinger commented Nov 13, 2022

thomaseizinger commented Nov 14, 2022

thomaseizinger commented Nov 15, 2022

mxinden commented Feb 28, 2023

thomaseizinger commented Feb 28, 2023

thomaseizinger commented Feb 28, 2023

mxinden commented Mar 2, 2023

swarm/: Support generic connection management #2824

swarm/: Support generic connection management #2824

Comments

mxinden commented Aug 17, 2022

Description

Motivation

Are you planning to do it yourself in a pull request?

mxinden commented Aug 17, 2022

thomaseizinger commented Aug 17, 2022

mxinden commented Aug 18, 2022

mxinden commented Aug 19, 2022

thomaseizinger commented Nov 4, 2022

mxinden commented Nov 4, 2022

divagant-martian commented Nov 4, 2022

thomaseizinger commented Nov 8, 2022

Make IntoConnectionHandler fallible

divagant-martian commented Nov 8, 2022

mxinden commented Nov 8, 2022

thomaseizinger commented Nov 9, 2022

thomaseizinger commented Nov 9, 2022

mxinden commented Nov 9, 2022

divagant-martian commented Nov 9, 2022

thomaseizinger commented Nov 9, 2022

thomaseizinger commented Nov 9, 2022

divagant-martian commented Nov 9, 2022

mxinden commented Nov 9, 2022

thomaseizinger commented Nov 9, 2022

thomaseizinger commented Nov 13, 2022

thomaseizinger commented Nov 14, 2022

thomaseizinger commented Nov 15, 2022

mxinden commented Feb 28, 2023

thomaseizinger commented Feb 28, 2023

thomaseizinger commented Feb 28, 2023

mxinden commented Mar 2, 2023

Make `IntoConnectionHandler` fallible