[go/mysql] use sync.Pool for reusing bufio readers and writers #4186

LK4D4 · 2018-09-06T23:22:34Z

Readers recycle objects on Read if there are no buffered bytes. Writers recycle objects on Flush. Both recycle on Reset.
Maybe we should also recycle writers on Write in similar to Read manner.
/cc @danieltahara @sougou
benchmark results:

benchmark                            old ns/op      new ns/op      delta
BenchmarkParallelShortQueries-8      3443           3439           -0.12%
BenchmarkParallelMediumQueries-8     19907          23057          +15.82%
BenchmarkParallelRandomQueries-8     1098953101     1077136194     -1.99%

benchmark                            old MB/s     new MB/s     speedup
BenchmarkParallelShortQueries-8      6.10         6.10         1.00x
BenchmarkParallelMediumQueries-8     823.82       711.25       0.86x

benchmark                            old allocs     new allocs     delta
BenchmarkParallelShortQueries-8      24             24             +0.00%
BenchmarkParallelMediumQueries-8     24             25             +4.17%
BenchmarkParallelRandomQueries-8     32292          31851          -1.37%

benchmark                            old bytes     new bytes     delta
BenchmarkParallelShortQueries-8      867           873           +0.69%
BenchmarkParallelMediumQueries-8     63241         69361         +9.68%
BenchmarkParallelRandomQueries-8     147128744     142483592     -3.16%

danieltahara · 2018-09-07T00:15:30Z

go/mysql/bufio_pool.go

+
+func (pbr *poolBufioReader) Read(b []byte) (int, error) {
+	pbr.getReader()
+	return pbr.br.Read(b)


shouldn't we check if Buffered() == 0 and release the reader in that case?

go/mysql/bufio_pool.go

danieltahara · 2018-09-07T00:16:33Z

go/mysql/bufio_pool.go

@@ -0,0 +1,113 @@
+package mysql


tests please :)

lots of random stuff and make sure we get the right string out the right side.

also i'm curious about error conditions etc

danieltahara · 2018-09-07T00:28:36Z

Edit: happens here https://github.com/vitessio/vitess/blob/master/go/mysql/server.go#L397

~~As a related question -- why don't we flush after sending all rows in a result set?~~

~~https://github.com/vitessio/vitess/blob/master/go/mysql/query.go#L590~~
~~https://github.com/vitessio/vitess/blob/master/go/mysql/server.go#L371~~

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>

LK4D4 · 2018-09-07T00:45:41Z

So, I really forgot to recycle on Read.
But with recycle on Read benchmarks looks like:

benchmark                            old ns/op     new ns/op     delta
BenchmarkParallelShortQueries-8      3407          3566          +4.67%
BenchmarkParallelMediumQueries-8     18171         28565         +57.20%
BenchmarkParallelRandomQueries-8     11452408      10863779      -5.14%

benchmark                            old MB/s     new MB/s     speedup
BenchmarkParallelShortQueries-8      6.16         5.89         0.96x
BenchmarkParallelMediumQueries-8     902.49       574.12       0.64x

benchmark                            old allocs     new allocs     delta
BenchmarkParallelShortQueries-8      24             24             +0.00%
BenchmarkParallelMediumQueries-8     25             28             +12.00%
BenchmarkParallelRandomQueries-8     83             58             -30.12%

benchmark                            old bytes     new bytes     delta
BenchmarkParallelShortQueries-8      979           1212          +23.80%
BenchmarkParallelMediumQueries-8     69074         89985         +30.27%
BenchmarkParallelRandomQueries-8     66810632      62952562      -5.77%

It's encouraging that random queries are so much better, but other benchmarks are weird.

derekperkins · 2018-09-07T01:04:06Z

Is it possible that garbage collection is causing the weird benchmarks?
https://blog.cloudflare.com/go-dont-collect-my-garbage/

LK4D4 · 2018-09-07T01:22:59Z

@derekperkins yeah, it's possible. Will check later

danieltahara · 2018-09-07T05:17:54Z

@derekperkins we had that happen in the previous diff, and it's obviously even worse here (i.e. heap gets even smaller since we're pooling everything) golang/go#27545

danieltahara · 2018-09-07T05:45:04Z

go/mysql/conn.go

-	reader   *bufio.Reader
-	writer   *bufio.Writer
+	reader   bufioReader
+	writer   bufioWriter
 	sequence uint8



github won't let me comment on other lines, so here i go. (https://github.com/vitessio/vitess/blob/master/go/mysql/conn.go#L195)

as mentioned, since most connections sit waiting for read, "pooling" doesn't help much.
however, we can note that every read must read the 4 byte length header, and profiles validate the intuition that most idle connections are blocking on the Read(..) from an empty network buffer.

this leaves us a few options:

write a read header function, which does a direct read from the base connection

initialize all conns with a 4 byte buffered reader.
~~3. have a pool of 4 byte reader buffers for reading headers.~~ (this is pointless)

only once we've read the header do we then use the pooled buffered reader.

in both cases i think factoring out the header read is useful so then we don't risk someone messing up the header read by using the bufio.

@sougou thoughts or other ideas? option 1 is the simplest, as we are already allocating the 4 byte arrays -- and no implementation of buffering can avoid the degenerate case problem of 1 byte being sent at a time. i don't like option 2 as we'll get funky behavior with the two wrapped bufios.

the cost to all of this is 1 extra syscall per request served, which, for how we are using the mysql server is a huge boon. it's obviously a trade between throughput and tail latencies, but i think removing the unexpected and hard to debug GC problem and lowering steady state memory usage makes that tradeoff worthwhile. It's easier to reason about a consistent QoS IMO, and frankly if anyone actually hits throughput problems from the double syscall, i'd love to chat with them and see how we can make this faster :)

all that said, there is an asymmetry between the needs of those using the mysql connection as a client and those using it as part of the server. for that reason it might be worth making the connection take an option, and have the server set that option for its connections (which then makes it instantiate pooled bufios and do the header read directly from the base connection) but the client leave it off by default.

In spite of the benchmarks, I don't think the pooling will give us anything in real life. This is because most connections will generally be idle, and during that time, they will be possessing a buffer waiting on a read.

If we went with the 4-byte approach, we'll need to perform some real-life benchmarks to evaluate the impact of the extra syscall. If we find that it's worth it, then we should not use bufio at all:
All waiting reads can fetch into a 4-byte buffer, and then we can directly read the rest of the data into the target slice. There is no benefit to using bufio in this case.

Another option is to reduce the size of bufio to 4K, which is what go recommends. With the removal of the other two buffers, the overhead of a connection would drop to 4K, which can be more easily accommodated.

We should just look at dropping connBuffSize and use go's default anyways.

danieltahara · 2018-09-08T21:17:35Z

So one note of clarification here is that from our perspective this isn't a typical perf or throughput play -- it's a GC improvement and will help tail latencies. The benchmarks are just to show no or acceptable levels of throughput regression. This is why I bring up the assymmetry between a client and a server. A client is really only hurt by anything were doing here -- pooling is overkill and introducing a second syscall reduces throughput. Most client applications will have no more than a few hundred connections. In contrast, a server has to pay the buffer cost for every inbound connection, and this can number in the hundreds of thousands. If you've been following along with the golang GC issue, this results in multi hundred second stop the world sweeps (and high concurrent mark). Lowering the buffer to 4kb will help a tad but doesn't avoid the fundamental problem of effectively immortal allocations clogging up concurrent mark and concurrent sweep. This is what pooling and removing buffers aims to solve. Hence my proposed solution is to have a forked read path for the server where we just dont use bufio (your suggestion) at all and have two syscall, leaving the buffering for client.

danieltahara · 2018-09-09T00:03:10Z

Looping in some offline conversation --

@sougou agrees in the assessment that the additional syscall per mysql packet could pose problems for latency sensitive clients, in particular folks at Slack. It would be great if we could have @demmer or @rafael (or whoever is appropriate) benchmark whatever client-side solution we come up with.

In light of the need to minimize impact on those using the connection as a mysql client (vs as a server), I'd propose forking the code paths to have a buffered path and an unbuffered path. A naive implementation of this would invite a lot of complexity -- casing out the header read or ReadQueryResult, but I believe there's a simpler way to achieve the same end: offer an additional constructor for the connection object newConnServer, which doesn't instantiate a buffered reader and uses the connection directly (I don't even think we need to bother with pooling the bufio readers, although we could do it if we want). This removes the buffer problem for a mysql listener/server while preserving existing behavior for clients.

sougou · 2018-09-24T18:50:06Z

Looks like this can be closed now?

LK4D4 · 2018-09-24T18:55:13Z

@sougou thanks for the reminder :)

danieltahara suggested changes Sep 7, 2018

View reviewed changes

[go/mysql] use sync.Pool for reusing bufio readers and writers

f5dce7b

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>

danieltahara suggested changes Sep 7, 2018

View reviewed changes

LK4D4 mentioned this pull request Sep 7, 2018

[go/mysql] use sync.Pool for reusing bufio writers #4188

Closed

LK4D4 closed this Sep 24, 2018

LK4D4 deleted the recycle_bufio branch September 24, 2018 18:55

brirams mentioned this pull request Sep 1, 2022

[vtgate] Add flag to pool connection read buffers #11167

Merged

3 tasks

systay pushed a commit that referenced this pull request Jul 22, 2024

cherry pick of 15007 (#4186)

73b30f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[go/mysql] use sync.Pool for reusing bufio readers and writers #4186

[go/mysql] use sync.Pool for reusing bufio readers and writers #4186

LK4D4 commented Sep 6, 2018

danieltahara Sep 7, 2018

danieltahara Sep 7, 2018

danieltahara commented Sep 7, 2018 •

edited

Loading

LK4D4 commented Sep 7, 2018

derekperkins commented Sep 7, 2018

LK4D4 commented Sep 7, 2018

danieltahara commented Sep 7, 2018

danieltahara Sep 7, 2018 •

edited

Loading

sougou Sep 8, 2018

danieltahara commented Sep 8, 2018 via email •

edited

Loading

danieltahara commented Sep 9, 2018

sougou commented Sep 24, 2018

LK4D4 commented Sep 24, 2018

[go/mysql] use sync.Pool for reusing bufio readers and writers #4186

[go/mysql] use sync.Pool for reusing bufio readers and writers #4186

Conversation

LK4D4 commented Sep 6, 2018

danieltahara Sep 7, 2018

Choose a reason for hiding this comment

danieltahara Sep 7, 2018

Choose a reason for hiding this comment

danieltahara commented Sep 7, 2018 • edited Loading

LK4D4 commented Sep 7, 2018

derekperkins commented Sep 7, 2018

LK4D4 commented Sep 7, 2018

danieltahara commented Sep 7, 2018

danieltahara Sep 7, 2018 • edited Loading

Choose a reason for hiding this comment

sougou Sep 8, 2018

Choose a reason for hiding this comment

danieltahara commented Sep 8, 2018 via email • edited Loading

danieltahara commented Sep 9, 2018

sougou commented Sep 24, 2018

LK4D4 commented Sep 24, 2018

danieltahara commented Sep 7, 2018 •

edited

Loading

danieltahara Sep 7, 2018 •

edited

Loading

danieltahara commented Sep 8, 2018 via email •

edited

Loading