Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: optimize decoding wrapped base64 data #12146

Closed
wants to merge 1 commit into from

Conversation

aqrln
Copy link
Contributor

@aqrln aqrln commented Mar 31, 2017

The fast base64 decoder used to switch to the slow one permanently when
it saw a whitespace or other garbage character. Since the most common
situation such characters may be encountered in is line-wrapped base64
data, a more profitable strategy is to decode a single 24-bit group with
the slow decoder and then continue running the fast algorithm.

Refs: #12114

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

buffer

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. labels Mar 31, 2017
@aqrln
Copy link
Contributor Author

aqrln commented Mar 31, 2017

I haven't run the whole benchmark suite yet, only symlinked the exiting and the new benchmarks into a new directory locally:

                                             improvement confidence      p.value
 base64/buffer-base64-decode-wrapped.js n=32     11.04 %        *** 4.413651e-27
 base64/buffer-base64-decode.js n=32             -1.49 %            5.613341e-02

@Fishrock123
Copy link
Contributor

// eslint-disable-next-line no-unescaped-regexp-dot
data.match(/./); // Flatten the string
const buffer = Buffer.allocUnsafe(bytesCount);
buffer.write(data, 0, bytesCount, 'base64');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit but this can be simplified just a bit by doing...

const line = 'abcd'.repeat(charsPerLine / 4) + '\n';
const buffer = Buffer.alloc(bytesCount, line);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the suggestion!

The fast base64 decoder used to switch to the slow one permanently when
it saw a whitespace or other garbage character.  Since the most common
situation such characters may be encountered in is line-wrapped base64
data, a more profitable strategy is to decode a single 24-bit group with
the slow decoder and then continue running the fast algorithm.

Refs: nodejs#12114
@aqrln
Copy link
Contributor Author

aqrln commented Mar 31, 2017

Rebased to incorporate changes from #11995.

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@aqrln
Copy link
Contributor Author

aqrln commented Apr 4, 2017

Can I get a fresh CI run?

@trevnorris
Copy link
Contributor

Copy link
Contributor

@trevnorris trevnorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I'm qualified to sign off on this but LGTM

@aqrln
Copy link
Contributor Author

aqrln commented Apr 4, 2017

/cc @bnoordhuis (based on Git history)

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM but it would be really nice to have some cctests for this header file

@jasnell
Copy link
Member

jasnell commented Apr 4, 2017

(to be clear, the cctest can be added separately :-) ...)

jasnell pushed a commit that referenced this pull request Apr 4, 2017
The fast base64 decoder used to switch to the slow one permanently when
it saw a whitespace or other garbage character.  Since the most common
situation such characters may be encountered in is line-wrapped base64
data, a more profitable strategy is to decode a single 24-bit group with
the slow decoder and then continue running the fast algorithm.

PR-URL: #12146
Ref: #12114
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@jasnell
Copy link
Member

jasnell commented Apr 4, 2017

Landed in e77a83f

@jasnell jasnell closed this Apr 4, 2017
@aqrln aqrln deleted the base64-optimization branch April 4, 2017 16:46
@jasnell jasnell mentioned this pull request Apr 4, 2017
aqrln added a commit to aqrln/node that referenced this pull request Apr 5, 2017
This commit adds C++ tests for `base64_encode()` and `base64_decode()`
functions defined in `base64.h`.  The functionality is already being
tested indirectly in JavaScript tests for Buffer, but it won't hurt to
test the low-level functions too, especially given that they aren't only
used in the internal Buffer implementation, Chrome inspector protocol
support relies upon them too.

Refs: nodejs#12146 (comment)
@aqrln aqrln mentioned this pull request Apr 5, 2017
3 tasks
aqrln added a commit that referenced this pull request Apr 8, 2017
This commit adds C++ tests for `base64_encode()` and `base64_decode()`
functions defined in `base64.h`.  The functionality is already being
tested indirectly in JavaScript tests for Buffer, but it won't hurt to
test the low-level functions too, especially given that they aren't only
used in the internal Buffer implementation, Chrome inspector protocol
support relies upon them too.

PR-URL: #12238
Refs: #12146 (comment)
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
italoacasas pushed a commit to italoacasas/node that referenced this pull request Apr 10, 2017
The fast base64 decoder used to switch to the slow one permanently when
it saw a whitespace or other garbage character.  Since the most common
situation such characters may be encountered in is line-wrapped base64
data, a more profitable strategy is to decode a single 24-bit group with
the slow decoder and then continue running the fast algorithm.

PR-URL: nodejs#12146
Ref: nodejs#12114
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@italoacasas italoacasas mentioned this pull request Apr 10, 2017
2 tasks
@MylesBorins
Copy link
Contributor

should we backport?

Assuming if so we should wait a bit

@MylesBorins MylesBorins added the baking-for-lts PRs that need to wait before landing in a LTS release. label May 15, 2017
@addaleax
Copy link
Member

should we backport?

It seems to be the root cause of #13657, so no.

@addaleax addaleax added dont-land-on-v4.x and removed baking-for-lts PRs that need to wait before landing in a LTS release. lts-watch-v6.x labels Jun 13, 2017
gibfahn pushed a commit that referenced this pull request Jun 18, 2017
This commit adds C++ tests for `base64_encode()` and `base64_decode()`
functions defined in `base64.h`.  The functionality is already being
tested indirectly in JavaScript tests for Buffer, but it won't hurt to
test the low-level functions too, especially given that they aren't only
used in the internal Buffer implementation, Chrome inspector protocol
support relies upon them too.

PR-URL: #12238
Refs: #12146 (comment)
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
gibfahn pushed a commit that referenced this pull request Jun 20, 2017
This commit adds C++ tests for `base64_encode()` and `base64_decode()`
functions defined in `base64.h`.  The functionality is already being
tested indirectly in JavaScript tests for Buffer, but it won't hurt to
test the low-level functions too, especially given that they aren't only
used in the internal Buffer implementation, Chrome inspector protocol
support relies upon them too.

PR-URL: #12238
Refs: #12146 (comment)
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jul 11, 2017
This commit adds C++ tests for `base64_encode()` and `base64_decode()`
functions defined in `base64.h`.  The functionality is already being
tested indirectly in JavaScript tests for Buffer, but it won't hurt to
test the low-level functions too, especially given that they aren't only
used in the internal Buffer implementation, Chrome inspector protocol
support relies upon them too.

PR-URL: #12238
Refs: #12146 (comment)
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants