Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm/: Never call Behaviour::inject_event without a previous Behaviour::inject_connection_established #2337

Closed
mxinden opened this issue Nov 11, 2021 · 10 comments · Fixed by #2350

Comments

@mxinden
Copy link
Member

mxinden commented Nov 11, 2021

Imagine the following scenario:

  1. The user calls Swarm::ban_peer_id for peer X.
  2. The Swarm receives a NetworkEvent::IncomingConnection for a not-yet authenticated peer.
  3. The Swarm calls this.network.accept to accept the connection and thus starts the upgrade process.
  4. The Swarm receives a NetworkEvent::ConnectionEstablished from the network for the previously accepted connection. The connection appears to be to peer X which is marked as banned and thus the Swarm starts closing the connection. The Swarm does not call NetworkBehaviour::inject_connection_established.
  5. The Swarm receives a NetworkEvent::ConnectionEvent from the network from the now closing connection to peer X. The Swarm calls NetworkBehaviour::inject_event.

The above leads to NetworkBehaviour::inject_event being called without a previous NetworkBehaviour::inject_connection_established. This violates the API contract between Swarm and NetworkBehaviour. E.g. libp2p-identify expects NetworkBehaviour::inject_event to never be called without a previous NetworkBehaviour::inject_connection_established.

.expect(
"`inject_event` is only called with an established connection \
and `inject_connection_established` ensures there is an entry; qed",
);

I see multiple ways moving forward, ordered by preference:

  1. Track the connection state (Established and Closing) in EstablishedConnectionInfo (pool.rs) and ignore any task::EstablishedConnectionEvent::Notify from the connection in case it is in Closing state.
  2. Check Swarm::banned_peers before forwarding an event from the network to the behaviour in Swarm (NetworkEvent::ConnectionEvent).
  3. Drop the promise of Swarm to never call NetworkBehaviour::inject_event without a previous NetworkBehaviour::inject_connection_established and adjust libp2p-identify accordingly.

Credit goes to @divagant-martian for associating the libp2p-identify panic with the Swarm banning system.

//CC @AgeManning

What do folks think?

@AgeManning
Copy link
Contributor

My preference is not to do 3. I think it would be counter-intuitive and lead to more bugs when people build new behaviours.

I agree with your preferences :).

@divagant-martian
Copy link
Contributor

Agree on crossing opt 3 out.
Once we receive an event, we check that the connection is present in the pool here I think the most straightforward and correct way to handle it is to do 2) and check here that the connection is (semantically) open/not closed

@divagant-martian
Copy link
Contributor

divagant-martian commented Nov 16, 2021

Also, doing some tests I see that inject_connection_closed is called too for the banned peer, for which inject_connection_established was never called. I find this one a bit more complicated since the ban state can come and go, and for a currently connected peer, there can be connections for which inject_connection_established was called and some for which not. What do you think?

@MarcoPolo
Copy link
Contributor

@divagant-martian Are you working on this? If not, do you mind if I pick this up? It sounds like we have a rough consensus on the solution here. But to be a bit clearer here's what I'm thinking:

Laws (not sure if this is currently true, but I'd endeavor to make them true):

  • Banning a peer will result in a disconnect from the peer if you are currently connected.
  • Banning has no effect on a currently connected peer, except for the disconnect mentioned above. This means if you ban an active peer, you will get inject_connection_closed for that peer when the connection is closed.
  • Once a peer is banned, a network behavior will never receive any events from that peer.

@AgeManning
Copy link
Contributor

@MarcoPolo

This issue is rather time-sensitive for us, because the current master-libp2p branch panics when using identify. I think @divagant-martian has made progress (i'll let her chime in) but this is something we'd like to remedy quickly (I'm not sure about how quickly you think you would resolve this one).

@MarcoPolo
Copy link
Contributor

That sounds good. This seems pretty important so I also wanted to make sure someone was working on it

@mxinden
Copy link
Member Author

mxinden commented Nov 18, 2021

Capturing a conversation with @divagant-martian out-of-band.

Most favorable option at the moment.

When banning a peer:

  1. Track the connections that have already been reported as open to the NetworkBehaviour for said peer.
    Basically:
    diff --git a/swarm/src/lib.rs b/swarm/src/lib.rs
    index 0059a21a..ad4ee426 100644
    --- a/swarm/src/lib.rs
    +++ b/swarm/src/lib.rs
    @@ -273,7 +273,7 @@ where
         external_addrs: Addresses,
     
         /// List of nodes for which we deny any incoming connection.
    -    banned_peers: HashSet<PeerId>,
    +    banned_peers: HashMap<PeerId, HashSet<ConnectionId>>,
  2. Close the connections.
  3. When receiving an event from a banned peer, drop it.
  4. When receiving a connection close event for a connection of a banned peer, report it to the NetworkBehaviour if and only if it is tracked as one of the previously reported as open connections. Remove the connection from Swarm::banned_peers.

This should uphold all laws listed above by @MarcoPolo.

@divagant-martian will work on this 🙏. Please keep us posted on the progress here @divagant-martian.

@divagant-martian
Copy link
Contributor

Since closing a connection is not immediate, tracking allowed connections for banned peers is still open to at least one other race condition that I can identify:

  1. Peer is not banned, it's connected with connection 1. This connection was reported.
  2. Peer is banned. It's added to banned peers and it's connection 1 is stored as previously allowed. start_close is called for 1.
  3. A connection for the peer is received, and not reported, since it's banned. Let's call this one 2
  4. Peer is un-banned. The entry for the peer is removed. Here we lost the information about 1 being allowed and 2 not being allowed.
  5. Both connections finish their closing. The swarm is notified about this. We can now no longer decide correctly if the connection was allowed or not.

When suggesting to keep track of allowed connections, I would suggest doing it on a per-connection basis, instead on a per peer one. Tracking this can be done in a synchronous manner and it targets directly the root of the issue.

Thoughts?

@MarcoPolo
Copy link
Contributor

I think that makes sense. To extend the laws above by one more constraint:

  • unbanning has no effect on currently active connections. Just like banning has no effect on currently active connects, except for the start of the disconnect.

Tracking this per connection is probably the simplest way to implement these laws correctly. Abstractly:

  • If a peer is banned, all active connections are still in the "unbanned state", but the connections are made to disconnect.
  • If a peer is banned, all new connections will be in the "banned state" and thus not emit any events to the network behavior.
  • If a peer is unbanned (and previously banned), all active connections are still in the "banned state".
  • If a peer is unbanned, all new connections will be in the "unbanned state".

Does that align with your thoughts?

@divagant-martian
Copy link
Contributor

Yeah, I think we are on the same page. Seems straightforward to implement. I'll send a PR in a couple of hours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants