Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHANGED] MQTT s.clear() do not wait for JS responses when disconnecting the session #5575

Merged
merged 6 commits into from
Jun 21, 2024

Conversation

levb
Copy link
Contributor

@levb levb commented Jun 20, 2024

MQTT s.clear(): do not wait for JS responses when disconnecting the session

Related to #5471

Previously we were making jsa.NewRequest as it is needed when connecting a clean session. On disconnect, there is no reason to wait for the response (and tie up the MQTT read loop of the client).

This should specifically help situations when a client app with many MQTT connections and QOS subscriptions disconnects suddenly, causing a flood of JSAPI deleteConsumer requests.

Test: n/a, not sure how to instrument for it.

@levb levb requested a review from a team as a code owner June 20, 2024 22:16
@levb levb marked this pull request as draft June 20, 2024 22:39
Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test TestMQTTSessionNotDeletedOnDeleteConsumerError fails because it was expecting an error that is no longer reported due to this change.

@@ -1000,7 +1000,7 @@ func (s *Server) mqttHandleClosedClient(c *client) {

// This needs to be done outside of any lock.
if doClean {
if err := sess.clear(); err != nil {
if err := sess.clear(true); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be invoked for "clean" sessions, which means sessions which state should be discarded and not reused in any subsequent session. Could there be a situation where a create/close/create causes an issue since some assets may not have been completely removed from the JS server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do anything with the error other than logging it anyway, so the logic doesn't change. Also, if the session is re-created as clean, we'd clean out its state on CONNECT. If it is re-created as persistent... 0/5 we could wipe out and re-create the consumers if there was a session message; but if we failed to delete the session message on the "clean" DISCONNECT... well, it still would not change the existing logic, would it?

@@ -3823,7 +3829,7 @@ CHECK:
// This Session lasts as long as the Network Connection. State data
// associated with this Session MUST NOT be reused in any subsequent
// Session.
if err := es.clear(); err != nil {
if err := es.clear(false); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least here we make sure that the delete is complete, which I think is the right thing to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I understand why checking the success is important here, thus preserved as before.

server/mqtt.go Show resolved Hide resolved
server/mqtt.go Outdated Show resolved Hide resolved
server/mqtt.go Outdated Show resolved Hide resolved
@levb levb requested a review from kozlovic June 20, 2024 23:15
@levb levb marked this pull request as ready for review June 20, 2024 23:19
server/mqtt.go Outdated Show resolved Hide resolved
server/mqtt.go Outdated Show resolved Hide resolved
@levb
Copy link
Contributor Author

levb commented Jun 21, 2024

@kozlovic Sorry, I misunderstood the failing test. I thought the error in the log was supposed to come from JetStream itself, where it is actually the very MQTT error I skipped that the test was expecting.

I need a little time to re-think how to do this, there can still be errors sending the request to delete the consumer, so the test is still valid.

@levb
Copy link
Contributor Author

levb commented Jun 21, 2024

@kozlovic here's my thinking.

0/5 for this PR we can keep the disconnect cleanup as opportunistic, and remove the failing TestMQTTSessionNotDeletedOnDeleteConsumerError. There really aren't any plausible errors in the "no wait" case, we just push a message onto the jsapi queue. So we will always attempt to delete the session message, just like before this PR.

Longer term:

  1. For clean sessions, can we make subscription consumers ephemeral, with a short inactive threshold, so that they get auto-collected if we fail to delete?
  2. For all new sessions, even clean, try to load the existing session message first. If the previous persisted session was clean, then synchronously delete its subs' consumers, we know the names. Ditto the pubrel consumer.
  3. PubRel consumers can be consolidated, like the retained messages into one, account-specific consumer. There’s no need to match the messages to subs, just to the sessions, by client ID.

What do you think?

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kozlovic
Copy link
Member

Longer term:
For clean sessions, can we make subscription consumers ephemeral, with a short inactive threshold, so that they get auto-collected if we fail to delete?

But is not an ephemeral consumer supposed to be told that the "client" side is still active through pings? Meaning that would that mean that the MQTT code would then need to be more involved in maintaining this consumer? And if so, it means that if there was a disconnect between where the MQTT sub lives and the JS consumer leader, we could end-up with the consumer being deleted while the MQTT code does not know that this is the case and end-up with a stop in message delivery?

@levb
Copy link
Contributor Author

levb commented Jun 21, 2024

@derekcollison This can be merged now.

@derekcollison derekcollison merged commit ee9af0f into main Jun 21, 2024
4 checks passed
@derekcollison derekcollison deleted the lev-leak-clean-nowait branch June 21, 2024 19:11
neilalexander pushed a commit that referenced this pull request Jun 21, 2024
…ing the session (#5575)

MQTT s.clear(): do not wait for JS responses when disconnecting the
session

Related to #5471

Previously we were making `jsa.NewRequest` as it is needed when
connecting a clean session. On disconnect, there is no reason to wait
for the response (and tie up the MQTT read loop of the client).

This should specifically help situations when a client app with many
MQTT connections and QOS subscriptions disconnects suddenly, causing a
flood of JSAPI deleteConsumer requests.

Test: n/a, not sure how to instrument for it.
wallyqs added a commit that referenced this pull request Jun 21, 2024
Includes:

* #5571 
* #5082
* #5573
* #5574
* #5575
* #5577
* #5578
* #5580 
* #5559 

Signed-off-by: Neil Twigg <neil@nats.io>
derekcollison pushed a commit that referenced this pull request Sep 7, 2024
Built on top of #5575. Tries to prevent the stream `msgs` queue from
becoming overwhelmed by core NATS publishes. In this case, where a reply
subject is known, the sender will receive a 429 "Too Many Requests".
Otherwise it's rate-logged.

Two new configuration options are added to the JetStream block:
`max_buffered_size` and `max_buffered_msgs`. If not configured, defaults
are used.

Signed-off-by: Neil Twigg <neil@nats.io>

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants