Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict Dials From Discovery #14052

Merged
merged 11 commits into from
May 30, 2024
Merged

Restrict Dials From Discovery #14052

merged 11 commits into from
May 30, 2024

Conversation

nisdas
Copy link
Member

@nisdas nisdas commented May 28, 2024

What type of PR is this?

Bug Fix

What does this PR do? Why is it needed?

There have been longstanding reports of windows users randomly losing peers over the years, this had been chalked down to an improperly configured firewall, timeserver or antivirus. Unfortunately over the years as the network has grown along with more intensive requirements in peer search for nodes, this has continued to happen more. It appears that a recent windows update has exacerbated this problem by quite a bit. This PR tries to fix this once and for all:

  • For discovery, we have an aggressive peer search algorithm. This was very useful in finding peers quickly for a recently booted node. However windows as an OS doesn't like the aggressive discovery search too much and randomly decides to block outgoing discovery packets from prysm. This PR makes it less aggressive, so it will take longer for nodes to find their desired number of peers

  • Add a flag to restrict the maximum amount of concurrent dials(-max-concurrent-dials). In the event users continue running into this, they can run with this flag at a smaller value(5,10, etc). This should stop triggering the OS at the cost of discovery being a lot slower along with possible reduced validator performance.

Which issues(s) does this PR fix?

Fixes #14025 #13936 #13431 #8144

Other notes for review

@nisdas nisdas added Ready For Review A pull request ready for code review Priority: High High priority item Networking P2P related items Windows Anything related only to Windows OS labels May 28, 2024
@nisdas nisdas requested a review from a team as a code owner May 28, 2024 13:04
nisdas and others added 2 commits May 29, 2024 12:30
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
func (s *Service) wantedPeerDials() int {
maxPeers := int(s.cfg.MaxPeers)

activePeers := len(s.Peers().Active())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would'nt be safer to convert activePeers to uint instead converting s.cfg.MaxPeers to int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end result would be the same as these are all preconfigured values at startup. Any edge cases we hit here would require a user wildly misconfiguring their node.

// Do not excessively perform lookups if we constantly receive non-viable peers.
if lookupCounter > backOffCounter {
lookupCounter = 0
runtime.Gosched()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this call still useful with latest versions of golang?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wouldn't it be useful with current versions of golang ?

@nisdas nisdas added this pull request to the merge queue May 30, 2024
Merged via the queue into develop with commit 7a4ecb6 May 30, 2024
17 checks passed
@nisdas nisdas deleted the addSubnetDialBackoff branch May 30, 2024 07:04