-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Make non-validators listen on /ws by default #8609
Conversation
To clarify, this means that non-validators listen on both TCP and WebSockets by default, right? |
No, only WebSocket. |
Would there be an advantage to having both? Like, are any programs that interact with the node using raw TCP going to break? I think you should at least clarify this change. Maybe add a TLDR too as the changelog is pretty wordy |
Yes, a program that only supports libp2p using plain TCP connections and not WebSocket would break. I'm not aware of the existence of any such program. The problem with listening on both is that it requires node operators to open two ports instead of one, meaning two firewall rules. We currently raise an alert if no incoming connection arrives, in order to potentially detect firewall issues, but with our current tooling it isn't possible to raise an alert if for example only the WebSocket port is unreachable. Listening on the same port for either plain TCP connections or WebSocket is also generally not a good idea because it would be too magic. In general, I'm currently not aware of any drawback of listening only for WebSocket, apart from a very slight overhead, and what I describe in the section about DoS attacks. As for the TL;DR, I don't know what else to say than what the title of the PR doesn't say. This change is supposed to be transparent for users. |
Would this affect the way bootnodes get specified? ie they would need to have a |
Yes in theory. |
bot merge |
Trying merge. |
Bot will approve on the behalf of @tomaka, since they are a team lead, in an attempt to reach the minimum approval count |
Good bot |
Hi! @tomaka |
You need to use two different ports:
|
cc paritytech/polkadot-sdk#547
Context
The peer-to-peer endpoint of the client supports two modes: plain TCP connections, and WebSockets.
Multiaddresses that look like
/ip4/<ip>/tcp/<port>
use plain TCP connections, while multiaddresses that look like/ip4/<ip>/tcp/<port>/ws
use WebSockets.When using WebSockets, the Noise and Yamux protocols are applied on top of each WebSocket connection, like they are on plain TCP connections. The WebSockets "layer" is not a replacement for anything, but is inserted in the middle of the protocols stack.
The reason why WebSockets are supported in the first place is: light clients in browsers. It is part of the Web3 vision to directly connect to the network, without using an untrusted JSON-RPC endpoint, and for user-friendliness reasons we would like to be able to host web pages that directly access the blockchain network (as opposed to asking for users to launch a software on their desktop). By making nodes listen for incoming WebSockets connections, we make it possible for JavaScript in browsers to directly connect to them.
What this PR does
While WebSockets have been supported for a long time, they weren't enabled by default until now. This PR modifies the address the node is listening on by default. If
--validator
is passed, then things stay the same as right now, but if--validator
is not passed, then nodes will automatically listen for incoming WebSocket connections instead of plain TCP connections.(see below for the reason why
--validator
is taken into account)The port the nodes will listen on isn't changed, so any firewall rule that node operators might have put in place would still be relevant, and the node will still work. Switching from non-WebSocket to WebSocket basically only modifies the format of the data being sent and expected to be received.
Nodes support establishing outgoing WebSocket connections as well, so the network should continue to work without any issue. There will be a small disruption in the incoming connections for up to an hour after restarting your node to listen on WebSockets, as other nodes will have to refresh their DHT entries.
About certificates and DoS attacks
One concern you might have is: doesn't this open the door to easier DoS attacks?
Here are the rules that browsers have in place:
This PR doesn't add any certificate to nodes(*), and thus nodes are not reachable through Secure WebSocket connections.
In other words, the only situation where a webpage can connect to a node is if the website is 127.0.0.1, or if it's a browser extension requesting it.
For the light-client-in-browser use case (the reason for this PR), we are aiming for the latter: it's a browser extension that establishes outgoing connections to the network (and syncs the chain).
(*): It can be done by adding an nginx reverse proxy in front of the node. It has been done by our Parity devops and works.
The traditional DoS scenario is this: someone hacks a website with a moderate amount of user traffic, and injects some JavaScript that aggressively tries to connect to the network. Users would innocently visit the site and try to connect to the network, and this excess of traffic would cause problems for the targets. This, however, can't happen as browsers would refuse to establish the non-secure WebSocket connection necessary to establish a connection.
It is still possible to try this DoS attack with a large number of connections that we know will be refused by the target, but this is already a risk right now, and this PR doesn't change this in any way.
Instead of hacking a random website, someone could instead hack our browser extension to aggressively connect to the network. But for many reasons the risk is way smaller.
A risk that this PR introduces, however, is simply that a lot of genuine users accidentally overload the network through the connections from their browser extensions. Right now, our approach, maybe a bit naive, is to spread the load amongst the network and hope for the best. The more full nodes that are running the better, but at the moment full nodes aren't incentivized to serve these browser light clients.
Another risk is that light clients accidentally occupy all the incoming slots of a node. Again, I'm talking about accidents. "Attacking" a node by occupying all its slots is already possible before this PR, and should generally have no effect apart from degrading the connectivity between nodes.
What this PR modifies, however, is that the situation could happen accidentally with a large number of users using up most of the slots of most nodes, degrading connectivity between the nodes that matter.
We could imagine having slots for full nodes and slots for light clients, and I'll open an issue for that.
It is the reason (in addition to a bit of unjustified conservatism) why this PR doesn't change the behaviour when you pass
--validator
.Other drawback
If a node operator was starting their node with
--public-addr
but without--listen-addr
, then this PR is a breaking change and they will stop receiving incoming connections until they update the value passed to--public-addr
to include the missing/ws
.