Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[go/mysql] use sync.Pool for reusing bufio readers and writers #4186

Closed
wants to merge 1 commit into from
Closed

[go/mysql] use sync.Pool for reusing bufio readers and writers #4186

wants to merge 1 commit into from

Conversation

LK4D4
Copy link
Contributor

@LK4D4 LK4D4 commented Sep 6, 2018

Readers recycle objects on Read if there are no buffered bytes. Writers recycle objects on Flush. Both recycle on Reset.
Maybe we should also recycle writers on Write in similar to Read manner.
/cc @danieltahara @sougou
benchmark results:

benchmark                            old ns/op      new ns/op      delta
BenchmarkParallelShortQueries-8      3443           3439           -0.12%
BenchmarkParallelMediumQueries-8     19907          23057          +15.82%
BenchmarkParallelRandomQueries-8     1098953101     1077136194     -1.99%

benchmark                            old MB/s     new MB/s     speedup
BenchmarkParallelShortQueries-8      6.10         6.10         1.00x
BenchmarkParallelMediumQueries-8     823.82       711.25       0.86x

benchmark                            old allocs     new allocs     delta
BenchmarkParallelShortQueries-8      24             24             +0.00%
BenchmarkParallelMediumQueries-8     24             25             +4.17%
BenchmarkParallelRandomQueries-8     32292          31851          -1.37%

benchmark                            old bytes     new bytes     delta
BenchmarkParallelShortQueries-8      867           873           +0.69%
BenchmarkParallelMediumQueries-8     63241         69361         +9.68%
BenchmarkParallelRandomQueries-8     147128744     142483592     -3.16%


func (pbr *poolBufioReader) Read(b []byte) (int, error) {
pbr.getReader()
return pbr.br.Read(b)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we check if Buffered() == 0 and release the reader in that case?

go/mysql/bufio_pool.go Show resolved Hide resolved
@@ -0,0 +1,113 @@
package mysql

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests please :)

lots of random stuff and make sure we get the right string out the right side.

also i'm curious about error conditions etc

@danieltahara
Copy link

danieltahara commented Sep 7, 2018

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
@LK4D4
Copy link
Contributor Author

LK4D4 commented Sep 7, 2018

So, I really forgot to recycle on Read.
But with recycle on Read benchmarks looks like:

benchmark                            old ns/op     new ns/op     delta
BenchmarkParallelShortQueries-8      3407          3566          +4.67%
BenchmarkParallelMediumQueries-8     18171         28565         +57.20%
BenchmarkParallelRandomQueries-8     11452408      10863779      -5.14%

benchmark                            old MB/s     new MB/s     speedup
BenchmarkParallelShortQueries-8      6.16         5.89         0.96x
BenchmarkParallelMediumQueries-8     902.49       574.12       0.64x

benchmark                            old allocs     new allocs     delta
BenchmarkParallelShortQueries-8      24             24             +0.00%
BenchmarkParallelMediumQueries-8     25             28             +12.00%
BenchmarkParallelRandomQueries-8     83             58             -30.12%

benchmark                            old bytes     new bytes     delta
BenchmarkParallelShortQueries-8      979           1212          +23.80%
BenchmarkParallelMediumQueries-8     69074         89985         +30.27%
BenchmarkParallelRandomQueries-8     66810632      62952562      -5.77%

It's encouraging that random queries are so much better, but other benchmarks are weird.

@derekperkins
Copy link
Member

Is it possible that garbage collection is causing the weird benchmarks?
https://blog.cloudflare.com/go-dont-collect-my-garbage/

@LK4D4
Copy link
Contributor Author

LK4D4 commented Sep 7, 2018

@derekperkins yeah, it's possible. Will check later

@danieltahara
Copy link

@derekperkins we had that happen in the previous diff, and it's obviously even worse here (i.e. heap gets even smaller since we're pooling everything) golang/go#27545

reader *bufio.Reader
writer *bufio.Writer
reader bufioReader
writer bufioWriter
sequence uint8

Copy link

@danieltahara danieltahara Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github won't let me comment on other lines, so here i go. (https://github.com/vitessio/vitess/blob/master/go/mysql/conn.go#L195)

as mentioned, since most connections sit waiting for read, "pooling" doesn't help much.
however, we can note that every read must read the 4 byte length header, and profiles validate the intuition that most idle connections are blocking on the Read(..) from an empty network buffer.

this leaves us a few options:

  1. write a read header function, which does a direct read from the base connection
  2. initialize all conns with a 4 byte buffered reader.
    3. have a pool of 4 byte reader buffers for reading headers. (this is pointless)

only once we've read the header do we then use the pooled buffered reader.

in both cases i think factoring out the header read is useful so then we don't risk someone messing up the header read by using the bufio.

@sougou thoughts or other ideas? option 1 is the simplest, as we are already allocating the 4 byte arrays -- and no implementation of buffering can avoid the degenerate case problem of 1 byte being sent at a time. i don't like option 2 as we'll get funky behavior with the two wrapped bufios.

the cost to all of this is 1 extra syscall per request served, which, for how we are using the mysql server is a huge boon. it's obviously a trade between throughput and tail latencies, but i think removing the unexpected and hard to debug GC problem and lowering steady state memory usage makes that tradeoff worthwhile. It's easier to reason about a consistent QoS IMO, and frankly if anyone actually hits throughput problems from the double syscall, i'd love to chat with them and see how we can make this faster :)

all that said, there is an asymmetry between the needs of those using the mysql connection as a client and those using it as part of the server. for that reason it might be worth making the connection take an option, and have the server set that option for its connections (which then makes it instantiate pooled bufios and do the header read directly from the base connection) but the client leave it off by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In spite of the benchmarks, I don't think the pooling will give us anything in real life. This is because most connections will generally be idle, and during that time, they will be possessing a buffer waiting on a read.

If we went with the 4-byte approach, we'll need to perform some real-life benchmarks to evaluate the impact of the extra syscall. If we find that it's worth it, then we should not use bufio at all:
All waiting reads can fetch into a 4-byte buffer, and then we can directly read the rest of the data into the target slice. There is no benefit to using bufio in this case.

Another option is to reduce the size of bufio to 4K, which is what go recommends. With the removal of the other two buffers, the overhead of a connection would drop to 4K, which can be more easily accommodated.

We should just look at dropping connBuffSize and use go's default anyways.

@danieltahara
Copy link

danieltahara commented Sep 8, 2018 via email

@danieltahara
Copy link

Looping in some offline conversation --

@sougou agrees in the assessment that the additional syscall per mysql packet could pose problems for latency sensitive clients, in particular folks at Slack. It would be great if we could have @demmer or @rafael (or whoever is appropriate) benchmark whatever client-side solution we come up with.

In light of the need to minimize impact on those using the connection as a mysql client (vs as a server), I'd propose forking the code paths to have a buffered path and an unbuffered path. A naive implementation of this would invite a lot of complexity -- casing out the header read or ReadQueryResult, but I believe there's a simpler way to achieve the same end: offer an additional constructor for the connection object newConnServer, which doesn't instantiate a buffered reader and uses the connection directly (I don't even think we need to bother with pooling the bufio readers, although we could do it if we want). This removes the buffer problem for a mysql listener/server while preserving existing behavior for clients.

@sougou
Copy link
Contributor

sougou commented Sep 24, 2018

Looks like this can be closed now?

@LK4D4
Copy link
Contributor Author

LK4D4 commented Sep 24, 2018

@sougou thanks for the reminder :)

@LK4D4 LK4D4 closed this Sep 24, 2018
@LK4D4 LK4D4 deleted the recycle_bufio branch September 24, 2018 18:55
systay pushed a commit that referenced this pull request Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants