write: Protect against sending messages after close #476

FrauElster · 2024-09-05T09:24:49Z

Closes #448

mafredri · 2024-09-05T10:18:08Z

Hey, thanks for the PR. I do think we should take a slightly different approach, though. Instead of piggy-backing on closeMutex in writeClose and potentially introduce deadlocks due to multiple consumers of the mutex, we move the logic to writeFrame instead. Then we can rely on the already existing writeFrameMu and set a boolean for the close when opcode == opClose.

This has another nice benefit of catching the close message wherever it may originate from.

Edit: Looks like CI failed with a test timeout so this did indeed introduce a deadlock: https://github.com/coder/websocket/actions/runs/10717638649/job/29717749114?pr=476#step:4:97

FrauElster · 2024-09-05T13:59:03Z

Sure great idea. I just had this bug and need it fixed and in my experience fixing it and opening a PR is the fastest way. I did not read nor understand the whole codebase, just stepped through it with my debugger and implemented this quick and dirty solution.

Just for clarification if you say we, do you mean you or me? Clearly you already have the solution on your mind, but if you are busy I can also do it.

FrauElster · 2024-09-06T06:50:34Z

As suggested I moved the writeSent read/write into writeFrame.
However I felt bad to fail silently if a double Close occurred, and is not really an net.ErrClosed since the connection could still be open waiting for the close response, so I introduced a new ErrAlreadyClosed, but currently it is not handled anywhere.

mafredri

Thanks for making the changes! I took the liberty to push a few myself to your branch, hope you don't mind. (See review comments.)

Just for clarification if you say we, do you mean you or me? Clearly you already have the solution on your mind, but if you are busy I can also do it.

My thinking is that I'm happy to let you work on this since you started the PR, and hopefully land a contribution to the project.

Side-note: I typically write/talk in _we_s as I see coding as a group effort, this way both successes and failures are shared and not the responsibility of any one individual. My hope is that it also sparks inclusivity by feeling like a group effort instead of one person making decisions.

TBH I think this PR is in pretty good shape now, but I think we should add at least one test testing this new behavior. Would you want to tackle that?

conn.go

write.go

mitar · 2024-09-08T19:53:06Z

Very interesting. I came to this PR because I am debugging our tests. In our tests we measure how many bytes we send over a websocket (using this library for both client and server in Go) and how many we get for a simple sending of text message hi from the client and then sending hi back from the server to the client.

With v1.8.10, before closing was rewritten, we had {"wsFromClient":16,"wsToClient":8} bytes transmitted. With v1.8.12, we saw change to {"wsFromClient":16,"wsToClient":12} which I understood that the increase in wsToClient is because server is now properly closing. But then I noticed that it is not always like that and that sometimes it is {"wsFromClient":24,"wsToClient":12} because it seems client sends close twice. So I got to this PR and tried this PR and with this PR I am back to {"wsFromClient":16,"wsToClient":8}, always. Not sure why not {"wsFromClient":16,"wsToClient":12}.

FrauElster · 2024-09-08T21:13:50Z

Nice to hear that it makes things more predictable. :)

@mafredri Yeah a test would be great. Sadly I am on a business trip until next monday, so until then this PR would be stall, unless you would not mind.

FrauElster · 2024-09-11T08:03:02Z

So I had a moment to have look and wrote several approaches, but I have to admit that I find it kindof hard to test. I am not familiar enough with the code, so I'll describe some assumptions here.

My general idea is to open a server and a client connection, close one and count the amount of close messages received on both ends. Both are asserted to be one.

Assumption 1:
The problem is that the exposed methods are highlevel and manage the closing handshake themself without returning it to the callee (e.g. conn.Read()) , so I cannot count them.

Assumption 2:
A connection ignores any messages received after the connection was closed. So it just gets discarded and I cannot count any more than 1 received closed message. (This behaviour differs e.g. compared to websocket client of chrome).

I am tempting to have a private callback on the connection options, but introducing private dependency injection just to be able to test it seems overkill, and introduces to much complexity for the benefits. (IMHO).

Second attempt would be to write a separat minimal websocket client / server that can count, but this seems like a bunch of work for such a simple test.

I saw that there are some helpers like EchoServer, one could also introduce something similar that can count opcodes, but also this would need a separat implementation since it is not in the websocket package.

For now I am using my fork (thanks to go mod replace thats pretty easy) to fix the problems i am currently experiencing in production.

Looking forward to get some feedback on the testing issue.

FrauElster · 2024-09-11T08:28:02Z

Another note:

your last commit makes my test (go test -count=10 ./...) gives me the "conn_test.go:623: failed to close WebSocket: failed to read frame header: EOF" again (sometimes). So the race conditions seems to be back in the current state.

mafredri · 2024-09-11T12:50:09Z

Another note:

your last commit makes my test (go test -count=10 ./...) gives me the "conn_test.go:623: failed to close WebSocket: failed to read frame header: EOF" again (sometimes). So the race conditions seems to be back in the current state.

Thanks @FrauElster, I can reproduce this. I took a look and I'm pretty sure that the fix simply surfaced a new issue. Calling (*Conn).CloseRead(ctx) seems to interfere with the close handshake, it hijacks the reader so that the close handshake isn't propagated during actual close. This in turn is closing the connection early so that (*Conn).Close() runs into io.EOF. (Parts of this could be wrong, I'm still investigating exactly what is happening.)

I appreciate you taking a look and thinking about ways to test for this, if you want though, I can take over as I want to look more closely at the way we close connections.

PS. Are you using CloseRead(ctx) yourself? And have you tried this branch as it is right now as a fix for the problems you're experiencing? If we disregard the failing test, have you been able to show that this fix doesn't work in your use-case?

FrauElster · 2024-09-16T05:57:18Z

No I don't use CloseRead(ctx) in my application. I use the simple

msgType, data, err := conn.Read(ctx)
if websocket.CloseStatus(err) != -1 { ... }

or

conn.Close(websocket.StatusInternalError, string(reason))

mafredri · 2024-09-17T15:08:14Z

@FrauElster I pushed a fix for the failed to close WebSocket: failed to read frame header: EOF issue. Let me know if you want me to open a separate PR since I'm pushing to your fork (I don't mind but don't want to cause conflicts for you).

@mitar I'm curious if c3613fc causes any change in the behavior you're seeing? I have noticed that there are still cases where the writeTimeout and readTimeout contexts aren't cleared and could potentially cause an abrupt closure of the connection. I'm not saying that's what's happening to you, just a potential cause.

FrauElster · 2024-09-18T04:46:06Z

Sure go ahead. My production go.mod is pinned to a commit, so no worries. ;)

adds a lock on writeClose. closes coder#448

0daf07f

mafredri self-requested a review September 5, 2024 10:18

mafredri changed the title ~~adds a lock on writeClose. closes #448~~ close: Add a lock on writeClose Sep 5, 2024

moves the writeLock to writeFrame to avoid deadlock

87caf58

mafredri added 2 commits September 6, 2024 09:41

group closeSent with writeFrameMu

e70e060

return close sent for all writes, remove sentinel error

2ea151a

mafredri reviewed Sep 6, 2024

View reviewed changes

conn.go Show resolved Hide resolved

write.go Show resolved Hide resolved

mafredri changed the title ~~close: Add a lock on writeClose~~ write: Protect against sending messages after close Sep 6, 2024

mafredri mentioned this pull request Sep 9, 2024

failed to close writer: context canceled #474

Open

fix: echo read error after close received

c3613fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write: Protect against sending messages after close #476

write: Protect against sending messages after close #476

FrauElster commented Sep 5, 2024 •

edited by mafredri

Loading

mafredri commented Sep 5, 2024 •

edited

Loading

FrauElster commented Sep 5, 2024

FrauElster commented Sep 6, 2024

mafredri left a comment •

edited

Loading

mitar commented Sep 8, 2024 •

edited

Loading

FrauElster commented Sep 8, 2024

FrauElster commented Sep 11, 2024

FrauElster commented Sep 11, 2024

mafredri commented Sep 11, 2024 •

edited

Loading

FrauElster commented Sep 16, 2024

mafredri commented Sep 17, 2024

FrauElster commented Sep 18, 2024

write: Protect against sending messages after close #476

Are you sure you want to change the base?

write: Protect against sending messages after close #476

Conversation

FrauElster commented Sep 5, 2024 • edited by mafredri Loading

mafredri commented Sep 5, 2024 • edited Loading

FrauElster commented Sep 5, 2024

FrauElster commented Sep 6, 2024

mafredri left a comment • edited Loading

Choose a reason for hiding this comment

mitar commented Sep 8, 2024 • edited Loading

FrauElster commented Sep 8, 2024

FrauElster commented Sep 11, 2024

FrauElster commented Sep 11, 2024

mafredri commented Sep 11, 2024 • edited Loading

FrauElster commented Sep 16, 2024

mafredri commented Sep 17, 2024

FrauElster commented Sep 18, 2024

FrauElster commented Sep 5, 2024 •

edited by mafredri

Loading

mafredri commented Sep 5, 2024 •

edited

Loading

mafredri left a comment •

edited

Loading

mitar commented Sep 8, 2024 •

edited

Loading

mafredri commented Sep 11, 2024 •

edited

Loading