Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: optimize Buffer#toString() #2027

Merged
merged 1 commit into from
Jun 25, 2015

Conversation

bnoordhuis
Copy link
Member

Break up Buffer#toString() into a fast and slow path. The fast path
optimizes for zero-length buffers and no-arg method invocation.

The speedup for zero-length buffers is a satisfying 700%. The no-arg
toString() operation gets faster by about 13% for a one-byte buffer.

This change exploits the fact that most Buffer#toString() calls are
plain no-arg method calls. Rewriting the method to take no arguments
means a call doesn't go through an ArgumentsAdaptorTrampoline stack
frame in the common case.

R=@trevnorris?

CI: https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/64/

@bnoordhuis bnoordhuis added buffer Issues and PRs related to the buffer subsystem. benchmark Issues and PRs related to the benchmark subsystem. labels Jun 21, 2015
return '';
if (arguments.length === 0)
return this.utf8Slice(0, length);
return slowToString(this, arguments[0], arguments[1], arguments[2]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a tiny bit faster than slowToString.apply(this, arguments) but that may change when we upgrade to V8 4.4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, seems 4.2 does a pretty good job too, once it warms up. I'll switch this over to .apply().

return '';
if (arguments.length === 0)
return this.utf8Slice(0, length);
return slowToString.apply(this, arguments);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be faster to do return slowToString.call(this, arguments[0], arguments[1], arguments[2]);? Or maybe pass this as the first argument and avoid .call()/.apply() altogether?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this comment. The initial version called slowToString(this, arguments[0], ...) but when I ran more benchmarks, it turned out that .apply() is faster by about 25-30% once the optimizing compiler kicks in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't function(encoding, start, end) { and return slowToString.apply(this, [encoding, start, end]); work here? There seems to be no reason to use arguments. Could you test that, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why create a new array every time when there is already arguments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, true, an array is slow.

In my local microbenchmark function(encoding, start, end) { and return .call(this, encoding, start, end) wins for all number of arguments (except three, where .apply(this, arguments) is as fast).

The problem with return slowToString.call(this, arguments[0], arguments[1], arguments[2]); is in arguments, not in .call().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Can't get my test to perform accurately. Seems the true performance hit is using undefined arguments[N] values. Welp, seems we have some cleaning up to do in places like: https://github.com/nodejs/io.js/blob/v2.3.0/src/node.js#L339-L350

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevnorris What do you mean cleaning up that particular section of code? That code is switching the arguments length and only passing that many arguments.

FWIW I already benchmarked various alternative function calling methods for this patch on top of the next branch (v8 4.3):

Replacing apply() with .call(this, arguments[0], arguments[1], ...) slows down the cases when there are arguments passed, and there is a slight performance hit in the zero argument case (with apply I saw ~510% increase, but call showed ~470%).

Replacing apply() with a direct function call, passing in the context as an extra argument performs about the same as using .call().

Replacing apply() with a switch on arguments.length and using either .call() or passing the context in the < 3 cases (using .apply() as default), the zero argument case is a bit lower IIRC (~470% increase), but now the non-zero argument cases are no longer affected.

So just using .apply() instead of several-line switch is shorter and even a tad faster on the zero argument case. I haven't tested these scenarios on the master branch (v8 4.2) though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex Is args there an arguments object or an Array? I'm aware that referencing undefined values on an arguments object does have significant overhead, but my benchmarks show that that is not the case for a real array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevnorris I did not test with an array, just arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex neither had I before this PR. Some testing showed that referencing undefined members in an array doesn't have any performance impact. Only side effect is the argument length being too long on the called function.

@mscdex
Copy link
Contributor

mscdex commented Jun 21, 2015

LGTM with one style nit

@dcousens
Copy link

This change exploits the fact that most Buffer#toString() calls are
plain no-arg method calls. Rewriting the method to take no arguments
means a call doesn't go through an ArgumentsAdaptorTrampoline stack
frame in the common case.

Do we have data to back that up? Most of my calls to toString() are typically hex, but maybe that is offset by stream encoding?
Why couldn't the compiler do something to know which path is being [continually] re-used? This shouldn't be an issue here...

Also, why did you split up the function, is this purely something for the compiler to avoid unpacking the arguments until they are used?
If so, this optimization is in the wrong place.

@bnoordhuis
Copy link
Member Author

Do we have data to back that up? Most of my calls to toString() are typically hex, but maybe that is offset by stream encoding?

I'm basing it off the number of implicit toString() calls you get in scripts that do string += buf. If there is compelling evidence that e.g. .toString('hex') calls are more prevalent, then it makes sense to optimize for that. That was actually my initial hunch but a (quick, small, non-scientific) sampling of modules didn't bear that out.

Why couldn't the compiler do something to know which path is being [continually] re-used?

V8 is a method JIT, not a tracing JIT. It optimizes whole methods, not individual code paths.

Also, why did you split up the function, is this purely something for the compiler to avoid unpacking the arguments until they are used?

Two reasons: small methods are more likely to get inlined at the call site and generally result in tighter machine code.

@@ -379,6 +378,16 @@ Buffer.prototype.toString = function(encoding, start, end) {
};


Buffer.prototype.toString = function() {
const length = this.length | 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the int32 conversion on the length a safeguard in case the length has been altered or this isn't actually a Buffer instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither, really. I just like my variables to have the type I expect them to have. I can remove it if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to make sure there wasn't something I didn't see. Don't bother taking it out.

@trevnorris
Copy link
Contributor

One question, but code LGTM.

@dcousens
Copy link

I'm basing it off the number of implicit toString() calls you get in scripts that do string += buf. If there is compelling evidence that e.g. .toString('hex') calls are more prevalent, then it makes sense to optimize for that. That was actually my initial hunch but a (quick, small, non-scientific) sampling of modules didn't bear that out.

This is my only concern going forward, I feel like this 'optimization' is entirely speculative at this point, and it does introduce some more complexity to maintaining this code in the future.

If we can put together some actual statistics that show this will be more beneficial, then I think that'd be great.

However, from a different view point, if we are going to do this code path style optimization, why not offer the same for the other toString methods?

If you really want to do this, why not offer toUTF8, toHex, among others.
I'm not aware of the reasons the all-encompassing toString was opted for 'in the beggining', but I'm sure they exist.

@bnoordhuis
Copy link
Member Author

I feel like this 'optimization' is entirely speculative at this point

It's a win no matter how you slice it: it makes the default case faster without regressing the non-default case.

I don't find the complexity argument convincing. You should see some of the other code in the lib/ directory!

@tellnes
Copy link
Contributor

tellnes commented Jun 23, 2015

I've not run the benchmark, but otherwise LGTM.

@dcousens
Copy link

without regressing the non-default case.

Do we have stats for that? Or we just taking each others word on these things :)

I don't find the complexity argument convincing. You should see some of the other code in the lib/ directory!

Hardly a convincing rebuttal either, "look, its as bad everywhere else!".

Not meaning to be a PITA, its just that this potentially touches on a lot of code, and I'm just trying to help :)

@trevnorris
Copy link
Contributor

Change in complexity is minimal at best, and there's an included benchmark to verify the result.

And in terms of allowed complexity. We allow some crazy stuff to be done, far beyond what this patch does, for even minimal performance gains. I'm responsible for some of those myself. This patch is doing nothing out of the ordinary.

@Fishrock123
Copy link
Contributor

LGTM

@dcousens
Copy link

@trevnorris sorry, I missed that benchmark.
Anyway, LGTM, IMHO this optimization is a lame compromise, but, if it works it works.
In a perfect world... haha.

@Fishrock123
Copy link
Contributor

@dcousens such is the world of JIT compiled languages. :)

Break up Buffer#toString() into a fast and slow path.  The fast path
optimizes for zero-length buffers and no-arg method invocation.

The speedup for zero-length buffers is a satisfying 700%.  The no-arg
toString() operation gets faster by about 13% for a one-byte buffer.

This change exploits the fact that most Buffer#toString() calls are
plain no-arg method calls.  Rewriting the method to take no arguments
means a call doesn't go through an ArgumentsAdaptorTrampoline stack
frame in the common case.

PR-URL: nodejs#2027
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Christian Tellnes <christian@tellnes.no>
Reviewed-By: Daniel Cousens <email@dcousens.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
@bnoordhuis bnoordhuis closed this Jun 25, 2015
@bnoordhuis bnoordhuis deleted the optimize-buffer-tostring branch June 25, 2015 16:33
@bnoordhuis bnoordhuis merged commit 8350f3a into nodejs:master Jun 25, 2015
@rvagg rvagg mentioned this pull request Jun 30, 2015
@rvagg
Copy link
Member

rvagg commented Jul 2, 2015

@bnoordhuis I'm running your new benchmark script against master and v2.3.1 and it seems slower now:

buffers/buffer-tostring.js arg=true len=0 n=10000000    57467728.45114  67528289.82904  85.10%
buffers/buffer-tostring.js arg=true len=1 n=10000000    13827075.37094  14473678.67815  95.53%
buffers/buffer-tostring.js arg=true len=64 n=10000000   10174151.11764  10710577.4752   94.99%
buffers/buffer-tostring.js arg=true len=1024 n=10000000 4867748.07704   5020575.83599   96.96%
buffers/buffer-tostring.js arg=false len=0 n=10000000   70084364.82509  85137567.23931  82.32%
buffers/buffer-tostring.js arg=false len=1 n=10000000   13126817.325    14294504.30314  91.83%
buffers/buffer-tostring.js arg=false len=64 n=10000000  10044569.75487  10748428.7662   93.45%
buffers/buffer-tostring.js arg=false len=1024 n=10000000    4760963.57608   5092279.38181   93.49%

@trevnorris
Copy link
Contributor

@rvagg That may actually be a byproduct of a patch I'm responsible for and landed after this one about preventing Buffer methods from aborting.

@rvagg
Copy link
Member

rvagg commented Jul 2, 2015

@trevnorris 700 steps forward, 800 steps back?

@rvagg
Copy link
Member

rvagg commented Jul 2, 2015

backing up to this commit it looks like you're right @trevnorris, you've ruined great gains!

buffers/buffer-tostring.js arg=true len=0 n=10000000    57467728.45114  67528289.82904  429491405.29717 636.02%
buffers/buffer-tostring.js arg=true len=1 n=10000000    13827075.37094  14473678.67815  14508241.08644  100.24%
buffers/buffer-tostring.js arg=true len=64 n=10000000   10174151.11764  10710577.4752   10397595.66746  97.08%
buffers/buffer-tostring.js arg=true len=1024 n=10000000 4867748.07704   5020575.83599   4967084.6993    98.93%
buffers/buffer-tostring.js arg=false len=0 n=10000000   70084364.82509  85137567.23931  455011800.27603 534.44%
buffers/buffer-tostring.js arg=false len=1 n=10000000   13126817.325    14294504.30314  13154697.29727  92.03%
buffers/buffer-tostring.js arg=false len=64 n=10000000  10044569.75487  10748428.7662   10844602.80418  100.89%
buffers/buffer-tostring.js arg=false len=1024 n=10000000    4760963.57608   5092279.38181   5012971.76375   98.44%

636% and 534% perf improvement at this commit, but going back below 100% @ master

@trevnorris
Copy link
Contributor

Okay, the len=0 case I knew would take a big hit. But realistically how often is that happening? Assumptions aside, I had to drop the quick return in order to properly check the instance on the native side to ensure throwing was consistent despite length.

@rvagg
Copy link
Member

rvagg commented Jul 2, 2015

yeah, I'm not overly concerned since I don't think len=0 is a particularly common case, I just needed to know whether this was a notable item for the changelog and the answer is no

@dcousens
Copy link

dcousens commented Jul 2, 2015

So is this commit still useful?

@trevnorris
Copy link
Contributor

Probably. Will need some massaging to regain as much perf as possible.

mscdex pushed a commit to mscdex/io.js that referenced this pull request Jul 9, 2015
Break up Buffer#toString() into a fast and slow path.  The fast path
optimizes for zero-length buffers and no-arg method invocation.

The speedup for zero-length buffers is a satisfying 700%.  The no-arg
toString() operation gets faster by about 13% for a one-byte buffer.

This change exploits the fact that most Buffer#toString() calls are
plain no-arg method calls.  Rewriting the method to take no arguments
means a call doesn't go through an ArgumentsAdaptorTrampoline stack
frame in the common case.

PR-URL: nodejs#2027
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Christian Tellnes <christian@tellnes.no>
Reviewed-By: Daniel Cousens <email@dcousens.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Issues and PRs related to the benchmark subsystem. buffer Issues and PRs related to the buffer subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants