Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peer: send reestablish, shutdown messages before starting writeHandler #8186

Merged
merged 1 commit into from
Nov 16, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 23 additions & 17 deletions peer/brontide.go
Original file line number Diff line number Diff line change
Expand Up @@ -703,6 +703,23 @@ func (p *Brontide) Start() error {

p.startTime = time.Now()

// Before launching the writeHandler goroutine, we send any channel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we abstract out loadActiveChannels, or split it into: load chans and load messages, then we can write a unit test here to try to trigger the panic. IIUC, it relies on concurrent access to the brontide state machine, so the two goroutines need to line up directly. This may be easier to trigger with the -race flag on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a unit test that detects the race. Will send a PR once I clean it up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// sync messages that must be resent for borked channels. We do this to
// avoid data races with WriteMessage & Flush calls.
if len(msgs) > 0 {
p.log.Infof("Sending %d channel sync messages to peer after "+
"loading active channels", len(msgs))

// Send the messages directly via writeMessage and bypass the
// writeHandler goroutine.
for _, msg := range msgs {
if err := p.writeMessage(msg); err != nil {
return fmt.Errorf("unable to send "+
"reestablish msg: %v", err)
}
}
}

err = p.pingManager.Start()
if err != nil {
return fmt.Errorf("could not start ping manager %w", err)
Expand All @@ -717,23 +734,6 @@ func (p *Brontide) Start() error {
// Signal to any external processes that the peer is now active.
close(p.activeSignal)

// Now that the peer has started up, we send any channel sync messages
// that must be resent for borked channels.
if len(msgs) > 0 {
p.log.Infof("Sending %d channel sync messages to peer after "+
"loading active channels", len(msgs))

// Send the messages directly via writeMessage and bypass the
// writeHandler goroutine to avoid cases where writeHandler
// may exit and cause a deadlock.
for _, msg := range msgs {
if err := p.writeMessage(msg); err != nil {
return fmt.Errorf("unable to send reestablish"+
"msg: %v", err)
}
}
}

// Node announcements don't propagate very well throughout the network
// as there isn't a way to efficiently query for them through their
// timestamp, mostly affecting nodes that were offline during the time
Expand Down Expand Up @@ -2093,6 +2093,12 @@ func (p *Brontide) logWireMessage(msg lnwire.Message, read bool) {
// message buffered on the connection. It is safe to call this method again
// with a nil message iff a timeout error is returned. This will continue to
// flush the pending message to the wire.
//
// NOTE:
// Besides its usage in Start, this function should not be used elsewhere
// except in writeHandler. If multiple goroutines call writeMessage at the same
// time, panics can occur because WriteMessage and Flush don't use any locking
// internally.
func (p *Brontide) writeMessage(msg lnwire.Message) error {
// Only log the message on the first attempt.
if msg != nil {
Expand Down
Loading