Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: speed up fs.WriteStream#_write #2944

Closed
wants to merge 1 commit into from

Conversation

ronkorving
Copy link
Contributor

This avoids a number of type checks and a maybeCallback call that was being done by fs.write before.

Benchmarking this is not an exact science. I tend to see some improvement, especially when dealing with the small and countless buffers (like in #2167, it's probably best to focus on size=2 and size=1024 for benchmark consistency). In any case, looking at the code change, it should be fairly obvious that it should be at least somewhat faster.

Previous benchmark:

fs/write-stream-throughput.js dur=5 type=buf size=2: 4.27074
fs/write-stream-throughput.js dur=5 type=buf size=1024: 231.85246
fs/write-stream-throughput.js dur=5 type=buf size=65535: 548.05857
fs/write-stream-throughput.js dur=5 type=buf size=1048576: 524.05158
fs/write-stream-throughput.js dur=5 type=asc size=2: 1.30060
fs/write-stream-throughput.js dur=5 type=asc size=1024: 182.64069
fs/write-stream-throughput.js dur=5 type=asc size=65535: 491.32727
fs/write-stream-throughput.js dur=5 type=asc size=1048576: 513.10271
fs/write-stream-throughput.js dur=5 type=utf size=2: 1.22798
fs/write-stream-throughput.js dur=5 type=utf size=1024: 132.25211
fs/write-stream-throughput.js dur=5 type=utf size=65535: 306.84604
fs/write-stream-throughput.js dur=5 type=utf size=1048576: 375.21379

With this PR:

fs/write-stream-throughput.js dur=5 type=buf size=2: 4.54269
fs/write-stream-throughput.js dur=5 type=buf size=1024: 274.99219
fs/write-stream-throughput.js dur=5 type=buf size=65535: 517.75836
fs/write-stream-throughput.js dur=5 type=buf size=1048576: 524.53301
fs/write-stream-throughput.js dur=5 type=asc size=2: 1.55818
fs/write-stream-throughput.js dur=5 type=asc size=1024: 198.47340
fs/write-stream-throughput.js dur=5 type=asc size=65535: 466.72804
fs/write-stream-throughput.js dur=5 type=asc size=1048576: 506.89006
fs/write-stream-throughput.js dur=5 type=utf size=2: 1.57304
fs/write-stream-throughput.js dur=5 type=utf size=1024: 183.35674
fs/write-stream-throughput.js dur=5 type=utf size=65535: 335.97045
fs/write-stream-throughput.js dur=5 type=utf size=1048576: 374.15673

After #2167, you may wonder why this would have any affect on this benchmark at all. But that's because pretty much each _writev is followed by a _write and vice versa. When tracking whether _writev or _write is called, you see both pop up almost equally. For that reason I thought it would be worthwhile to optimize fs.WriteStream#_write. Independently, it would be interesting to try and figure out if we can optimize Stream.Writable to chunk more through _writev when possible.

@@ -1859,7 +1873,8 @@ WriteStream.prototype._write = function(data, encoding, cb) {
});

var self = this;
fs.write(this.fd, data, 0, data.length, this.pos, function(er, bytes) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is what this PR is all about: avoiding all the stuff in fs.write that we already know does not apply to this use-case.

@ronkorving ronkorving changed the title fs: inline only the required logic into fs.WriteStream#_write fs: speed up fs.WriteStream#_write Sep 18, 2015
@mscdex mscdex added the fs Issues and PRs related to the fs subsystem / file system. label Sep 18, 2015
@ronkorving
Copy link
Contributor Author

This currently breaks test/parallel/test-fs-write-stream-err.js because it overwrites fs.write which the fs.WriteStream no longer depends on. I hope someone can advise me how to deal with this problem.

function wrapper(err, written) {
// Retain a reference to buffer so that it can't be GC'ed too soon.
callback(err, written || 0, buffer);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevnorris Any news on the solution to these wrappers? If we could avoid a closure for each of these write calls everywhere, I think that would improve performance (or lower memory usage) a bit more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't see much if any gain in terms of performance. The change is more for code cleanliness. Will have to make the change of passing buffer to wrapper() so it can be hoisted. Though that shouldn't be difficult.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the plan is, but are you saying we would still need a wrapper? Perhaps you can elaborate a bit on what this is really all about? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the buffer has left JS scope then GC will clean it up. Even though it's still being used. One alternative is that the buffer bubbles through the oncomplete callback. I still need to look into it. While the wrapper is ugly, it doesn't add any noticeable performance overhead in this scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks.

@trevnorris
Copy link
Contributor

Thanks for starting on this. fs.js needs some love and cleanup.

CI: https://ci.nodejs.org/job/node-test-pull-request/345/

return;
}

this.fd = fd;
this.emit('open', fd);
}.bind(this));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good place to put an arrow function. I'm curious about the perf vs this bind and const self = this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik bind is notoriously slow (both at bind time and at call time), so wherever we can remove it (which is everywhere), we most definitely should. Arrow functions, I don't believe perform quite well enough (or prevents optimization?) to start using all over the place I believe. Perhaps @mraleph can chime in on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wary about them as well, but after a bunch of testing found that they're just as fast as normal functions in the same situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, bind() is still the devil's tool. Fortunately arrow functions don't play by the same rules, thus aren't placed under the same performance constraints.

@ronkorving
Copy link
Contributor Author

@trevnorris Thanks. I'm quite invested in good file system performance right now, so you can expect me to continue to contribute here. Btw, CI is too soon, as one test is currently broken because it replaces fs.write with its own version (but our stream no longer depends on fs.write). Advice on this would be appreciated (especially if possible today, as I'm about to go offline for 10 days on vacation).

@trevnorris
Copy link
Contributor

@ronkorving Is the offending test still relevant? If not then just delete it. If it is then we'll figure out a way to rewrite it. If you have to leave before finishing the test then I can look into finishing that for you.

@ronkorving
Copy link
Contributor Author

@trevnorris To be honest, I fear I'm running short on time. Travel preparations will take up most of my time tomorrow (typing this from a phone, won't see my computer again today), so by all means have a look at it and do as you please :) Alternatively, advise me but allow this to sit until I'm back on Oct 3rd. Either is fine by me.

@trevnorris
Copy link
Contributor

The failing test was introduced in d65832c. Reason for it failing isn't obvious to me yet. Will take a look later.

@ronkorving
Copy link
Contributor Author

@trevnorris it fails because the test depends on fs.write being used by WriteStream which is no longer the case in this PR.

@trevnorris
Copy link
Contributor

@ronkorving The description of the original commit is "Close the file descriptor when a write operation fails." Does that still apply? If so, can a similar test be created?

Either way it seems like this failing test will need to be removed.

@reqshark
Copy link

"Close the file descriptor when a write operation fails."

why not do a quick backoff and retry (maybe after a small setTimeout) first before closing the fd?

@@ -1836,30 +1836,42 @@ fs.FileWriteStream = fs.WriteStream; // support the legacy name


WriteStream.prototype.open = function() {
const self = this;

fs.open(this.path, this.flags, this.mode, function(er, fd) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest node supports arrow functions so why not use them?
Ref: http://jsperf.com/arrow-functions-vs-bind
Ref: http://jsperf.com/arrow-vs-bind

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's my call to make. I would like to see the Node core members formally embrace it before throwing my warm and fuzzy ES6 feelings into each PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know about "embracing" it, but we're not going to enforce it. It's a stylistic call to save a few chars in a scenario like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so we're clear on future PRs, are arrow functions permitted in core now? I know they're supported by V8, but is there any reason NOT to use them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're permitted. Performance is on par w/ normal functions. Only place I'd suggest they be used is if this needs to propagate. Other than that it's nothing more than a style thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my benchmark, arrow functions actually seem to be reliably faster than regular functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brendanashworth Try reordering the runs. ;)

$ node /tmp/arrow-bench.js 
function took 87.64109233 ns / op       
regular took 93.22382201 ns / op        

$ node /tmp/arrow-bench.js 
regular took 87.63654512 ns / op        
function took 93.23333455 ns / op       

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh, v8 is confusing sometimes.

@trevnorris
Copy link
Contributor

Is fixing test-fs-write-stream-err.js the last outstanding issue?

@ronkorving
Copy link
Contributor Author

I think so. I'll try to give that some love soon, but have been a bit occupied I'm afraid.

@ronkorving
Copy link
Contributor Author

The test ensures that if there are errors passed to the callback (write failures), the stream closes the file descriptor and emits the error. That seems like a good test to have, but it achieves this by creating a fake write error in fs.write (which is public, so it can do that). In the new WriteStream implementation, it no longer uses fs.write, so I'm not sure how I could inject a fake error. Suggestions welcome (as I continue looking into this).

@trevnorris
Copy link
Contributor

Here's a diff for a fix:

diff --git a/test/parallel/test-fs-write-stream-err.js b/test/parallel/test-fs-write-stream-err.js
index bc289d4..7d64014 100644
--- a/test/parallel/test-fs-write-stream-err.js
+++ b/test/parallel/test-fs-write-stream-err.js
@@ -2,6 +2,7 @@
 var common = require('../common');
 var assert = require('assert');
 var fs = require('fs');
+var binding = process.binding('fs');

 common.refreshTmpDir();

@@ -10,21 +11,18 @@ var stream = fs.createWriteStream(common.tmpDir + '/out', {
 });
 var err = new Error('BAM');

-var write = fs.write;
+var writeBuffer = binding.writeBuffer;
 var writeCalls = 0;
-fs.write = function() {
+binding.writeBuffer = function() {
   switch (writeCalls++) {
     case 0:
-      console.error('first write');
       // first time is ok.
-      return write.apply(fs, arguments);
+      console.error('first write');
+      return writeBuffer.apply(this, arguments);
     case 1:
       // then it breaks
       console.error('second write');
-      var cb = arguments[arguments.length - 1];
-      return process.nextTick(function() {
-        cb(err);
-      });
+      return process.nextTick(arguments[5].oncomplete, err);
     default:
       // and should not be called again!
       throw new Error('BOOM!');

@ronkorving
Copy link
Contributor Author

Smart! Why didn't I think of overriding the methods on the binding object? Thanks @trevnorris :)

@ronkorving
Copy link
Contributor Author

@trevnorris I'm a bit new to this, so please bear with me. I find this very strange, with your patch I can make the test fail or succeed, depending on how I run it:

Success:

$ ./node test/parallel/test-fs-write-stream-err.js && echo $?
first write
first cb
second write
fs.close 17 17
second cb
error handler
0

Failure:

$ python tools/test.py -v --mode=release parallel/test-fs-write-stream-err
[00:00|%   0|+   0|-   0]: release test-fs-write-stream-err # out/Release/node /Users/rk/Projects/node/test/parallel/test-fs-write-stream-err.js
=== release test-fs-write-stream-err ===                   
Path: parallel/test-fs-write-stream-err
/Users/rk/Projects/node/test/parallel/test-fs-write-stream-err.js:28
      throw new Error('BOOM!');
      ^

Error: BOOM!
    at Object.binding.writeBuffer (/Users/rk/Projects/node/test/parallel/test-fs-write-stream-err.js:28:13)
    at Object.fs.writeSync (fs.js:663:20)
    at SyncWriteStream.write (fs.js:1990:6)
    at Console.warn (console.js:44:16)
    at Object.binding.writeBuffer (/Users/rk/Projects/node/test/parallel/test-fs-write-stream-err.js:24:15)
    at Object.fs.writeSync (fs.js:663:20)
    at SyncWriteStream.write (fs.js:1990:6)
    at Console.warn (console.js:44:16)
    at Object.binding.writeBuffer (/Users/rk/Projects/node/test/parallel/test-fs-write-stream-err.js:20:15)
    at write (fs.js:1866:11)
Command: out/Release/node /Users/rk/Projects/node/test/parallel/test-fs-write-stream-err.js
[00:00|% 100|+   0|-   1]: Done 

Any idea how this could happen or how the test can behave in 2 completely different ways depending on how its run?

PS: I've pushed the test update to this PR, so you can try it out.

This avoids a number of type checks and a maybeCallback call that
was being done by fs.write before.
@ronkorving
Copy link
Contributor Author

friendly poke @trevnorris

@trevnorris
Copy link
Contributor

I'm not sure.

@bnoordhuis I believe you were the one to introduce this test. have any thoughts?

@bnoordhuis
Copy link
Member

What happens with ./node test/parallel/test-fs-write-stream-err.js | cat? The test monkey-patches fs.write and expects it to be called n times. I speculate there is something in core calling fs.write when stdout/stderr is a pipe.

@trevnorris
Copy link
Contributor

@bnoordhuis thanks. that doesn't trigger it directly but the following seems to:

./node test/parallel/test-fs-write-stream-err.js 2> /tmp/blah.out

@trevnorris
Copy link
Contributor

By changing console.error() to process._rawDebug() I get the following output from the file:

first write
first cb
second write
fs.close 11 11
/var/projects/node/test/parallel/test-fs-write-stream-err.js:28
      throw new Error('BOOM!');
      ^

Error: BOOM!
    at ...

@trevnorris
Copy link
Contributor

Ah yes. binding.writeBuffer() is being called an additional time with a different file descriptor. Here's a patch to work around that case:

diff --git a/test/parallel/test-fs-write-stream-err.js b/test/parallel/test-fs-write-stream-err.js
index 7d64014..b97fef9 100644
--- a/test/parallel/test-fs-write-stream-err.js
+++ b/test/parallel/test-fs-write-stream-err.js
@@ -14,6 +14,9 @@ var err = new Error('BAM');
 var writeBuffer = binding.writeBuffer;
 var writeCalls = 0;
 binding.writeBuffer = function() {
+  if (arguments[0] != stream.fd)
+    return writeBuffer.apply(this, arguments);
+
   switch (writeCalls++) {
     case 0:
       // first time is ok.

Thanks much @bnoordhuis for the help.

@ronkorving
Copy link
Contributor Author

Wow, thanks so much guys. I wouldn't have been able to figure that one out on my own.

@ChALkeR ChALkeR added the performance Issues and PRs related to the performance of Node.js. label Feb 16, 2016
@jasnell jasnell added the stalled Issues and PRs that are stalled. label Mar 22, 2016
@jasnell
Copy link
Member

jasnell commented Mar 22, 2016

@ronkorving ... any updates on this one?

@ronkorving
Copy link
Contributor Author

I need to get back on this one. No updates at this time, hope to get back to it soon.

@ronkorving
Copy link
Contributor Author

Because of the changes that have happened to fs, I'll restart this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging this pull request may close these issues.