-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[go/mysql] use sync.Pool for reusing bufio readers and writers #4186
Conversation
go/mysql/bufio_pool.go
Outdated
|
||
func (pbr *poolBufioReader) Read(b []byte) (int, error) { | ||
pbr.getReader() | ||
return pbr.br.Read(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we check if Buffered() == 0 and release the reader in that case?
@@ -0,0 +1,113 @@ | |||
package mysql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests please :)
lots of random stuff and make sure we get the right string out the right side.
also i'm curious about error conditions etc
Edit: happens here https://github.com/vitessio/vitess/blob/master/go/mysql/server.go#L397
|
Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
So, I really forgot to recycle on Read.
It's encouraging that random queries are so much better, but other benchmarks are weird. |
Is it possible that garbage collection is causing the weird benchmarks? |
@derekperkins yeah, it's possible. Will check later |
@derekperkins we had that happen in the previous diff, and it's obviously even worse here (i.e. heap gets even smaller since we're pooling everything) golang/go#27545 |
reader *bufio.Reader | ||
writer *bufio.Writer | ||
reader bufioReader | ||
writer bufioWriter | ||
sequence uint8 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github won't let me comment on other lines, so here i go. (https://github.com/vitessio/vitess/blob/master/go/mysql/conn.go#L195)
as mentioned, since most connections sit waiting for read, "pooling" doesn't help much.
however, we can note that every read must read the 4 byte length header, and profiles validate the intuition that most idle connections are blocking on the Read(..) from an empty network buffer.
this leaves us a few options:
- write a read header function, which does a direct read from the base connection
- initialize all conns with a 4 byte buffered reader.
3. have a pool of 4 byte reader buffers for reading headers.(this is pointless)
only once we've read the header do we then use the pooled buffered reader.
in both cases i think factoring out the header read is useful so then we don't risk someone messing up the header read by using the bufio.
@sougou thoughts or other ideas? option 1 is the simplest, as we are already allocating the 4 byte arrays -- and no implementation of buffering can avoid the degenerate case problem of 1 byte being sent at a time. i don't like option 2 as we'll get funky behavior with the two wrapped bufios.
the cost to all of this is 1 extra syscall per request served, which, for how we are using the mysql server is a huge boon. it's obviously a trade between throughput and tail latencies, but i think removing the unexpected and hard to debug GC problem and lowering steady state memory usage makes that tradeoff worthwhile. It's easier to reason about a consistent QoS IMO, and frankly if anyone actually hits throughput problems from the double syscall, i'd love to chat with them and see how we can make this faster :)
all that said, there is an asymmetry between the needs of those using the mysql connection as a client and those using it as part of the server. for that reason it might be worth making the connection take an option, and have the server set that option for its connections (which then makes it instantiate pooled bufios and do the header read directly from the base connection) but the client leave it off by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In spite of the benchmarks, I don't think the pooling will give us anything in real life. This is because most connections will generally be idle, and during that time, they will be possessing a buffer waiting on a read.
If we went with the 4-byte approach, we'll need to perform some real-life benchmarks to evaluate the impact of the extra syscall. If we find that it's worth it, then we should not use bufio at all:
All waiting reads can fetch into a 4-byte buffer, and then we can directly read the rest of the data into the target slice. There is no benefit to using bufio in this case.
Another option is to reduce the size of bufio to 4K, which is what go recommends. With the removal of the other two buffers, the overhead of a connection would drop to 4K, which can be more easily accommodated.
We should just look at dropping connBuffSize and use go's default anyways.
So one note of clarification here is that from our perspective this isn't a
typical perf or throughput play -- it's a GC improvement and will help tail latencies.
The benchmarks are just to show no or acceptable levels of throughput
regression.
This is why I bring up the assymmetry between a client and a server. A
client is really only hurt by anything were doing here -- pooling is
overkill and introducing a second syscall reduces throughput. Most client
applications will have no more than a few hundred connections.
In contrast, a server has to pay the buffer cost for every inbound
connection, and this can number in the hundreds of thousands. If you've
been following along with the golang GC issue, this results in multi
hundred second stop the world sweeps (and high concurrent mark). Lowering
the buffer to 4kb will help a tad but doesn't avoid the fundamental problem
of effectively immortal allocations clogging up concurrent mark and
concurrent sweep. This is what pooling and removing buffers aims to solve.
Hence my proposed solution is to have a forked read path for the server
where we just dont use bufio (your suggestion) at all and have two syscall,
leaving the buffering for client.
|
Looping in some offline conversation -- @sougou agrees in the assessment that the additional syscall per mysql packet could pose problems for latency sensitive clients, in particular folks at Slack. It would be great if we could have @demmer or @rafael (or whoever is appropriate) benchmark whatever client-side solution we come up with. In light of the need to minimize impact on those using the connection as a mysql client (vs as a server), I'd propose forking the code paths to have a buffered path and an unbuffered path. A naive implementation of this would invite a lot of complexity -- casing out the header read or |
Looks like this can be closed now? |
@sougou thanks for the reminder :) |
Readers recycle objects on Read if there are no buffered bytes. Writers recycle objects on Flush. Both recycle on Reset.
Maybe we should also recycle writers on Write in similar to Read manner.
/cc @danieltahara @sougou
benchmark results: