Closed connection re-establishement behaviour #149

FZambia · 2022-11-29T08:04:57Z

I just came across one thing which I'd like to solve somehow. Let's imagine we have a Client and then the connection with Redis lost. For example, we can stop Redis server. Then quickly start it again.

In this case currently we get a behaviour that even though Redis is back, our next command issued to Redis over rueidis.Client (for some specific slot, which may be issued an hour after Redis restart happened for example) always returns EOF error. And only second command issued to Redis is successful.

It seems to me that connection re-establishement mark happens inside mux.pipeline method after we issue a command:

if resp = wire.Do(ctx, cmd); isBroken(resp.NonRedisError(), wire) {
	m.wire[slot].CompareAndSwap(wire, m.init)
}

At the same time in the pipe._backgroundRead:

for {
	if msg, err = readNextMessage(p.r); err != nil {
		return
	}

– we get error from read when connection is closed (upon Redis stop) but it seems it does not mark it for re-establishement, so we will still get EOF when trying to issue next command over rueidis.Client, and no errors on subsequent commands.

One more thing I do not understand: it seems that rueidis does not PING all connections, only some of them. So the fact we have closed connection can not be discovered automatically soon after Redis available again. Dedicated connections are periodically PINGed, but general pipeline connections seems not. So connections are not re-established automatically.

This makes me think there are two possible ways to solve this:

When we receive an error from readNextMessage mark the connection in a way so that the next attempt to write a command over it will result in re-establishing the connection, so that we could avoid extra error on first command from application after some time.
PING all connections in background to detect closed ones and re-establish. So that from the application side temporary network loss works transparently. And internal connections of rueidis quickly come to live if Redis available again.
Probably combine 1 and 2

Theoretically I could add retries on app level if I am getting EOF from rueidis – but not sure this really required if can be handled internally by the library.

The text was updated successfully, but these errors were encountered:

rueian · 2022-11-29T14:45:32Z

Hi @FZambia,

One more thing I do not understand: it seems that rueidis does not PING all connections, only some of them.

Currently, rueidis only does periodical PINGs after it starts pipelining because they are treaded as alternatives to ReadTimeout due to the fact that, after pipelining, there is no obvious way to set timeout to the net.Conn as we have discussed here.

In other words, currently, if there were no concurrent requests, client-side caching requests or pubsub subscriptions, then rueidis will just write the solely request to the net.Conn directly without batching and simply apply SetDeadline to the it.

This makes me think there are two possible ways to solve this:

When we receive an error from readNextMessage mark the connection in a way so that the next attempt to write a command over it will result in re-establishing the connection, so that we could avoid extra error on first command from application after some time.

PING all connections in background to detect closed ones and re-establish. So that from the application side temporary network loss works transparently. And internal connections of rueidis quickly come to live if Redis available again.

Probably combine 1 and 2

Theoretically I could add retries on app level if I am getting EOF from rueidis – but not sure this really required if can be handled internally by the library.

I agree. This should be better handled in this library and the combining both ways you proposed is the only solution, I think.
Again, though I already have some idea about how to implement it, it may still take some time.

FZambia · 2022-12-03T08:28:04Z

I tried 3748f2e which looks close to make automatic re-establishement. It did not worked for me initially though, it seems that for general pipeline connections func (p *pipe) background() function is not called at all – so connection close not detected. As soon as I did sth like this:

+       p.background()
        if !r2 && !r2ps {
                if ver, ok := p.info["version"]; ok {
                        if v := strings.Split(ver.string, "."); len(v) != 0 {
@@ -169,9 +173,6 @@ func _newPipe(connFn func() (net.Conn, error), option *ClientOption, r2ps bool)
                                p.version = int32(vv)
                        }
                }
-               if p.onInvalidations = option.OnInvalidations; p.onInvalidations != nil {
-                       p.background()
-               }

All works fine - background routines start and connection closing/re-establishement works as expected. It seems I am still lacking some understanding of rueidis internals to properly reason about this change (as I am not sure how things work without running background reading - probably there is synchronous reading from the connection after issuing a command at some other place).

rueian · 2022-12-03T09:00:22Z

Hi @FZambia,

3748f2e only addresses the first part of your proposal:

When we receive an error from readNextMessage mark the connection in a way so that the next attempt to write a command over it will result in re-establishing the connection, so that we could avoid extra error on first command from application after some time.

I am still working on the second part:

PING all connections in background to detect closed ones and re-establish. So that from the application side temporary network loss works transparently. And internal connections of rueidis quickly come to live if Redis available again.

When the second part is finished, rueidis will be able to handle reconnection proactively without user engagement.

probably there is synchronous reading from the connection after issuing a command at some other place).

You are right. rueidis uses synchronous writing and reading until user fires concurrent requests or subscribes to pubsub channels. The p.background() invocation is the trigger making the *pipe actually start pipelining in background.

FZambia · 2022-12-03T09:08:23Z

rueidis uses synchronous writing and reading until user fires concurrent requests or subscribes to pubsub channels.

I see, thanks for explaining! Do you think it's beneficial though? I mean always starting a background reading/pinging goroutine seems sufficient – it will anyway start in the hot path as we have concurrent requests, so trying to avoid goroutines while request rate is small or sequential should not produce big difference. Also connection closing may be detected immediately in many cases. Or am I missing sth here?

rueian · 2022-12-03T09:28:27Z

Indeed, it won’t have big difference. The only benefit is per sequential request latency will be smaller than pipelining in the background due to less goroutine switches.

FZambia · 2022-12-03T16:58:35Z

Then do you think it's a good idea to reduce code complexity and reduce time to detect closed connections by always running background read and background ping? For me it seems as a logical thing as in concurrent usage case application moves to such state anyway. Plus it seems coming to this state as early as possible will make behaviour slightly more predictable right from the app start.

rueian · 2022-12-04T06:47:08Z

Hi @FZambia,

For me it seems as a logical thing as in concurrent usage case application moves to such state anyway.

Yes, application can trigger the underlying pipe to actually start pipelining very easily. Especially now, v0.0.87 released, I let it send background PINGs even in synchronous mode to detect closed connections proactively.

Therefore, synchronous mode can only be considered as a tiny latency optimization for sparse traffic or sequential usage.
But I think this optimization is still worth to have because sequential requests are not uncommon use cases.

Currently, the code complexity of the optimization is acceptable, I believe. It just use the p.waits counter to detect concurrent requests. Once detected then it triggers the pipe to actually start pipelining and never going back to synchronous mode.

Actually I am wondering if it is possible to go back to synchronous mode to provide lower latencies automatically. I haven't have idea how to implement such switching and it might be then really not worth to have in terms of code complexity.

FZambia · 2022-12-04T07:03:02Z

Actually, I am not worried a lot about code complexity, it could be just a bonus from the change. I'd say my main concern is possibility to detect closed conns immediately – what we can get when reading goroutine always works. So that quick connection glitch was just unnoticeable by the app in most cases reducing errors.

Yes, application can trigger the underlying pipe to actually start pipelining very easily

I see your point about optimizing the case - but probably we could add an option to control this - so we could inherit the closing behaviour and still have fast sequential path? Sth like ForceBackgroundRead bool.

rueian · 2022-12-04T07:24:22Z

Sure, maybe AlwaysPipelining bool?

FZambia · 2022-12-04T08:10:02Z

Sure, maybe AlwaysPipelining bool?

Thanks, works for me! May be also PipeliningMode: auto, always if you can imagine additional modes for the future or theoretically it can have more sense from the user perspective who expects that rueidis pipelines requests by default anyway - since it's the main feature. But AlwaysPipelining is OK too.

rueian · 2022-12-04T10:27:27Z

Hi @FZambia,

AlwaysPipelining has been added in ba8001d.

And v0.0.88 is released with a fix to the performance issue introduced in #151.

Please use the latest version if you have downloaded v0.0.87.

if you can imagine additional modes for the future

Many redis features rely on pipelining, such client-side caching and pubsub. So, I believe turning off pipelining completely is not an option. Therefore, there is probably no more modes I can imagine.

FZambia · 2022-12-04T16:01:51Z

Awesome, many thanks - I updated rueidis version and it seems I now have an open road forward.

FZambia closed this as completed Dec 4, 2022

FZambia reopened this Dec 4, 2022

FZambia closed this as completed Dec 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closed connection re-establishement behaviour #149

Closed connection re-establishement behaviour #149

FZambia commented Nov 29, 2022 •

edited

Loading

rueian commented Nov 29, 2022

FZambia commented Dec 3, 2022 •

edited

Loading

rueian commented Dec 3, 2022

FZambia commented Dec 3, 2022

rueian commented Dec 3, 2022

FZambia commented Dec 3, 2022

rueian commented Dec 4, 2022

FZambia commented Dec 4, 2022 •

edited

Loading

rueian commented Dec 4, 2022

FZambia commented Dec 4, 2022 •

edited

Loading

rueian commented Dec 4, 2022 •

edited

Loading

FZambia commented Dec 4, 2022

Closed connection re-establishement behaviour #149

Closed connection re-establishement behaviour #149

Comments

FZambia commented Nov 29, 2022 • edited Loading

rueian commented Nov 29, 2022

FZambia commented Dec 3, 2022 • edited Loading

rueian commented Dec 3, 2022

FZambia commented Dec 3, 2022

rueian commented Dec 3, 2022

FZambia commented Dec 3, 2022

rueian commented Dec 4, 2022

FZambia commented Dec 4, 2022 • edited Loading

rueian commented Dec 4, 2022

FZambia commented Dec 4, 2022 • edited Loading

rueian commented Dec 4, 2022 • edited Loading

FZambia commented Dec 4, 2022

FZambia commented Nov 29, 2022 •

edited

Loading

FZambia commented Dec 3, 2022 •

edited

Loading

FZambia commented Dec 4, 2022 •

edited

Loading

FZambia commented Dec 4, 2022 •

edited

Loading

rueian commented Dec 4, 2022 •

edited

Loading