Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: discovery cannot find peers after network disconnect #2258

Closed
walldiss opened this issue May 26, 2023 · 7 comments · Fixed by #2263
Closed

bug: discovery cannot find peers after network disconnect #2258

walldiss opened this issue May 26, 2023 · 7 comments · Fixed by #2263
Labels
bug Something isn't working enhancement New feature or request

Comments

@walldiss
Copy link
Member

walldiss commented May 26, 2023

Problem

I had experienced network going offline for few seconds while running node. It caused 2 problems related to discovery:

  • Network disconnect caused discovery to lose all peers and never recover the peer amount until node restart restart.
  • After restart discovery was able to find enough peer, but lost them due to peer enforced disconnects shortly after. Even tho discovered peers has been disconnected, lowering size of discovery set, the new discovery search was not triggered.
Screenshot 2023-05-26 at 16 25 47 Screenshot 2023-05-26 at 16 27 59

log.txt

The problem in discovery is related to new search trigger not working correctly. Solutions options:

  • Quick fix would be to trigger discovery once in period of time.
  • Debug the triggering condition logic and find the root cause
@walldiss walldiss added bug Something isn't working enhancement New feature or request labels May 26, 2023
@Wondertan
Copy link
Member

I remember @distractedm1nd had a similar issue. We concluded that we were out of peers as they all got disconnected and in the backoff. Could this explain this issue?

@Wondertan
Copy link
Member

Even tho discovered peers has been disconnected, lowering size of discovery set, the new discovery search was not triggered.

How do you know the discovery was not triggered?

@Wondertan
Copy link
Member

peer enforced disconnects shortly after

What do you mean by "peer enforced"? Remote peers disconnected from your node?

@walldiss
Copy link
Member Author

I remember @distractedm1nd had a similar issue. We concluded that we were out of peers as they all got disconnected and in the backoff. Could this explain this issue?

I think the answer is no. I've made a metric for it.
image

How do you know the discovery was not triggered?

There is no discovery events in logs. It is attached to the issue

What do you mean by "peer enforced"? Remote peers disconnected from your node?

Yes, peer disconnected from my node. Check removed peer metric

@Wondertan
Copy link
Member

Ok. Makes sense. I am inclined to investigate the code and if we find no answer until Monday, we can do a quick fix.

Wondertan added a commit that referenced this issue May 30, 2023
…peration (#2263)

Closes #2258

Two cases were possible:
* Sometimes, the discovery is not triggered, and memorizing triggers might help
	* It's an unconfirmed theory, and we are still determining if it fixes anything yet.
	* We considered a case with @walldiss, but it should not happen.
* FindPeers can get stuck sometimes for an indefinite time and so forth stopping the whole discovery. We should stop and restart it.
@Wondertan
Copy link
Member

Opening again as we are not sure that it's fixed, as we removed esoteric channel buffer fix in #2258, because there is no explanation of why would it work

@Wondertan Wondertan reopened this May 30, 2023
Wondertan added a commit that referenced this issue May 30, 2023
…peration (#2263)

Closes #2258

Two cases were possible:
* Sometimes, the discovery is not triggered, and memorizing triggers might help
	* It's an unconfirmed theory, and we are still determining if it fixes anything yet.
	* We considered a case with @walldiss, but it should not happen.
* FindPeers can get stuck sometimes for an indefinite time and so forth stopping the whole discovery. We should stop and restart it.
@ramin
Copy link
Contributor

ramin commented Mar 12, 2024

no longer happening

@ramin ramin closed this as not planned Won't fix, can't repro, duplicate, stale Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants