Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: added big-endian check and support to UTF-16 encode #3410

Closed
wants to merge 1 commit into from

Conversation

exinfinitum
Copy link
Contributor

Added for loop to Encode function to support big-endian machines.
Used stack allocation instead of heap allocation to improve speed of endianness correction.
Resolves #3409

@mscdex mscdex added the string_decoder Issues and PRs related to the string_decoder subsystem. label Oct 16, 2015
@jasnell
Copy link
Member

jasnell commented Oct 17, 2015

@srl295

@@ -857,7 +857,17 @@ Local<Value> StringBytes::Encode(Isolate* isolate,
const uint16_t* buf,
size_t buflen) {
Local<String> val;

uint16_t dst[buflen];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this. buflen may be on the order of 100s of megabytes.

@bnoordhuis
Copy link
Member

Instead of copy/pasting the byte-swapping code, it would be much better to move it into a small utility class.

// http://nodejs.org/api/buffer.html regarding Node's "ucs2"
// encoding specification
for (size_t i = 0; i < buflen; i++) {
dst[i] = (buf[i] << 8) | (buf[i] >> 8);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth using intrinsics to ensure efficient byteswap instructions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need. gcc and clang recognize the byte swap just fine.

@brendanashworth brendanashworth added the c++ Issues and PRs that require attention from people who are familiar with C++. label Oct 19, 2015
@jasnell
Copy link
Member

jasnell commented Oct 19, 2015

agree with @bnoordhuis ... there are a few places in the code now where we do this kind of reordering. We should be refactoring it down rather than cutting and pasting it elsewhere.

@jasnell
Copy link
Member

jasnell commented Oct 20, 2015

@srl295 @bnoordhuis ...

@exinfinitum ... just an fyi... we don't get notified automatically when a PR has been updated with a new push. It's helpful if you comment and mention us after the new push so we can get notified and come back to review.

@@ -23,7 +24,6 @@ using v8::String;
using v8::Value;
using v8::MaybeLocal;


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No whitespace changes please.

@bnoordhuis
Copy link
Member

Commit log style nit: please see CONTRIBUTING.md for how it should be written.

@exinfinitum
Copy link
Contributor Author

PR Updated

@@ -3,9 +3,11 @@
#include "node.h"
#include "node_buffer.h"
#include "v8.h"
#include "util.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check to see if we need to include util.h here?

@exinfinitum
Copy link
Contributor Author

Updated

@@ -857,7 +856,15 @@ Local<Value> StringBytes::Encode(Isolate* isolate,
const uint16_t* buf,
size_t buflen) {
Local<String> val;

std::vector<uint16_t> dst(buflen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd declare an empty std::vector here and .resize() it inside the if block. That avoids the memory allocation on little-endian architectures.

@exinfinitum
Copy link
Contributor Author

Updated

static_assert(sizeof(TypeName) <= 8
&& (sizeof(TypeName) & 0x1) == 0, "Unsupported type");

// reinterpret src as unsigned because we want unsigned shifts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no shift operation anymore :)

MylesBorins pushed a commit that referenced this pull request Feb 18, 2016
Versions of Node.js after v0.12 have relocated byte-swapping away from
the StringBytes::Encode function, thereby causing a nan test (which
accesses this function directly) to fail on big-endian machines.

This change re-introduces byte swapping in StringBytes::Encode,
done via a call to a function in util-inl. Another change in
NodeBuffer::StringSlice was necessary to avoid double byte swapping
in big-endian function calls to StringSlice.

PR-URL: #3410
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Feb 18, 2016
MylesBorins pushed a commit that referenced this pull request Feb 18, 2016
Notable changes:

This update to the LTS line includes a number of semver minor changes
that have been staged for a number of months. This includes:

  * deps: backport 9da3ab6 from V8 upstream (Ali Ijaz Sheikh)
    - #3609
  * http: handle errors on idle sockets (José F. Romaniello)
    - #4482
  * src: add BE support to StringBytes::Encode() (Bryon Leung)
    - #3410
  * tls: add `options` argument to createSecurePair (Коренберг Марк)
    - #2441

There are also quite a large number of semver patch changes
including over 20 doc fixes and almost 50 test fixes.

Notable semver patch changes include:

  * deps: upgrade to npm 2.14.18 (Kat Marchán)
    - #5245
  * https: evict cached sessions on error (Fedor Indutny)
    - #4982
  * process: support symbol events (cjihrig)
    - #4798
  * querystring: improve parse() performance (Brian White)
    - #4675

PR-URL: #5301
MylesBorins pushed a commit that referenced this pull request Feb 23, 2016
In December we announced that we would be doing a minor release in
order to get a number of voted on SEMVER-MINOR changes into LTS.
Our ability to release this was delayed due to the unforeseen security
release v4.3. We are quickly bumping to v4.4 in order to bring you the
features that we had committed to releasing.

Notable changes:

The SEMVER-MINOR changes include:
  * **deps**:
    - An update to v8 that introduces a new flag
      --perf_basic_prof_only_functions (Ali Ijaz Sheikh)
      #3609
  * **http**:
    - A new feature in http(s) agent that catches errors on
      *keep alived* connections (José F. Romaniello)
      #4482
  * **src**:
    - Better support for Big-Endian systems (Bryon Leung)
      #3410
  * **tls**:
    - A new feature that allows you to pass common SSL options to
      `tls.createSecurePair` (Коренберг Марк)
      #2441

Notable semver patch changes include:

  * **npm**
    - upgrade to npm 2.14.19 (Kat Marchán)
      #5335
  * **https**:
    - A potential fix for #3692)
      HTTP/HTTPS client requests throwing EPROTO (Fedor Indutny)
      #4982
  * **process**:
    - Add support for symbols in event emitters. Symbols didn't exist
      when it was written ¯\_(ツ)_/¯ (cjihrig)
      #4798
  * **querystring**:
    - querystring.parse() is now 13-22% faster! (Brian White)
      #4675

PR-URL: #5301
MylesBorins pushed a commit that referenced this pull request Mar 1, 2016
In December we announced that we would be doing a minor release in order to
get a number of voted on SEMVER-MINOR changes into LTS. Our ability to release this
was delayed due to the unforeseen security release v4.3. We are quickly bumping to
v4.4 in order to bring you the features that we had committed to releasing.

This release also includes security updates to openssl. More information can be found [on nodejs.org](https://nodejs.org/en/blog/vulnerability/openssl-march-2016/)

This release also includes over 70 fixes to our docs and over 50 fixes to tests.

The SEMVER-MINOR changes include:
  * deps:
    - An update to v8 that introduces a new flag --perf_basic_prof_only_functions (Ali Ijaz Sheikh) #3609
  * http:
    - A new feature in http(s) agent that catches errors on *keep alived* connections (José F. Romaniello) #4482
  * src:
    - Better support for Big-Endian systems (Bryon Leung) #3410
  * tls:
    - A new feature that allows you to pass common SSL options to `tls.createSecurePair` (Коренберг Марк) #2441
  * tools
    - a new flag `--prof-process` which will execute the tick processor on the provided isolate files (Matt Loring) #4021

Notable semver patch changes include:

  * buld:
    - Support python path that includes spaces. This should be of particular interest to our Windows users who may have python living in `c:/Program Files` (Felix Becker) #4841
  * https:
    - A potential fix for #3692 HTTP/HTTPS client requests throwing EPROTO (Fedor Indutny) #4982
  * installer:
    - More readable profiling information from isolate tick logs (Matt Loring) #3032
  * *npm:
    - upgrade to npm 2.14.20 (Kat Marchán) #5510
  * *openssl:
    - upgrade openssl to 1.0.2g (Ben Noordhuis) #5507
  * process:
    - Add support for symbols in event emitters. Symbols didn't exist when it was written ¯\_(ツ)_/¯ (cjihrig) #4798
  * querystring:
    - querystring.parse() is now 13-22% faster! (Brian White) #4675
  * streams:
    - performance improvements for moving small buffers that shows a 5% throughput gain. IoT projects have been seen to be as much as 10% faster with this change! (Matteo Collina) #4354
  * tools:
    - eslint has been updated to version 2.1.0 (Rich Trott) #5214

PR-URL: #5301
MylesBorins pushed a commit to MylesBorins/node that referenced this pull request Mar 1, 2016
In December we announced that we would be doing a minor release in order to
get a number of voted on SEMVER-MINOR changes into LTS. Our ability to release this
was delayed due to the unforeseen security release v4.3. We are quickly bumping to
v4.4 in order to bring you the features that we had committed to releasing.

This release also includes security updates to openssl. More information can be found [on nodejs.org](https://nodejs.org/en/blog/vulnerability/openssl-march-2016/)

This release also includes over 70 fixes to our docs and over 50 fixes to tests.

The SEMVER-MINOR changes include:
  * deps:
    - An update to v8 that introduces a new flag --perf_basic_prof_only_functions (Ali Ijaz Sheikh) nodejs#3609
  * http:
    - A new feature in http(s) agent that catches errors on *keep alived* connections (José F. Romaniello) nodejs#4482
  * src:
    - Better support for Big-Endian systems (Bryon Leung) nodejs#3410
  * tls:
    - A new feature that allows you to pass common SSL options to `tls.createSecurePair` (Коренберг Марк) nodejs#2441
  * tools
    - a new flag `--prof-process` which will execute the tick processor on the provided isolate files (Matt Loring) nodejs#4021

Notable semver patch changes include:

  * buld:
    - Support python path that includes spaces. This should be of particular interest to our Windows users who may have python living in `c:/Program Files` (Felix Becker) nodejs#4841
  * https:
    - A potential fix for nodejs#3692 HTTP/HTTPS client requests throwing EPROTO (Fedor Indutny) nodejs#4982
  * installer:
    - More readable profiling information from isolate tick logs (Matt Loring) nodejs#3032
  * *npm:
    - upgrade to npm 2.14.20 (Kat Marchán) nodejs#5510
  * *openssl:
    - upgrade openssl to 1.0.2g (Ben Noordhuis) nodejs#5507
  * process:
    - Add support for symbols in event emitters. Symbols didn't exist when it was written ¯\_(ツ)_/¯ (cjihrig) nodejs#4798
  * querystring:
    - querystring.parse() is now 13-22% faster! (Brian White) nodejs#4675
  * streams:
    - performance improvements for moving small buffers that shows a 5% throughput gain. IoT projects have been seen to be as much as 10% faster with this change! (Matteo Collina) nodejs#4354
  * tools:
    - eslint has been updated to version 2.1.0 (Rich Trott) nodejs#5214

PR-URL: nodejs#5301
MylesBorins pushed a commit that referenced this pull request Mar 2, 2016
Versions of Node.js after v0.12 have relocated byte-swapping away from
the StringBytes::Encode function, thereby causing a nan test (which
accesses this function directly) to fail on big-endian machines.

This change re-introduces byte swapping in StringBytes::Encode,
done via a call to a function in util-inl. Another change in
NodeBuffer::StringSlice was necessary to avoid double byte swapping
in big-endian function calls to StringSlice.

PR-URL: #3410
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
MylesBorins pushed a commit that referenced this pull request Mar 2, 2016
In December we announced that we would be doing a minor release in order to
get a number of voted on SEMVER-MINOR changes into LTS. Our ability to release this
was delayed due to the unforeseen security release v4.3. We are quickly bumping to
v4.4 in order to bring you the features that we had committed to releasing.

This release also includes over 70 fixes to our docs and over 50 fixes to tests.

The SEMVER-MINOR changes include:
  * deps:
    - An update to v8 that introduces a new flag --perf_basic_prof_only_functions (Ali Ijaz Sheikh) #3609
  * http:
    - A new feature in http(s) agent that catches errors on *keep alived* connections (José F. Romaniello) #4482
  * src:
    - Better support for Big-Endian systems (Bryon Leung) #3410
  * tls:
    - A new feature that allows you to pass common SSL options to `tls.createSecurePair` (Коренберг Марк) #2441
  * tools
    - a new flag `--prof-process` which will execute the tick processor on the provided isolate files (Matt Loring) #4021

Notable semver patch changes include:

  * buld:
    - Support python path that includes spaces. This should be of particular interest to our Windows users who may have python living in `c:/Program Files` (Felix Becker) #4841
  * https:
    - A potential fix for #3692 HTTP/HTTPS client requests throwing EPROTO (Fedor Indutny) #4982
  * installer:
    - More readable profiling information from isolate tick logs (Matt Loring) #3032
  * *npm:
    - upgrade to npm 2.14.20 (Kat Marchán) #5510
  * process:
    - Add support for symbols in event emitters. Symbols didn't exist when it was written ¯\_(ツ)_/¯ (cjihrig) #4798
  * querystring:
    - querystring.parse() is now 13-22% faster! (Brian White) #4675
  * streams:
    - performance improvements for moving small buffers that shows a 5% throughput gain. IoT projects have been seen to be as much as 10% faster with this change! (Matteo Collina) #4354
  * tools:
    - eslint has been updated to version 2.1.0 (Rich Trott) #5214

PR-URL: #5301
MylesBorins pushed a commit that referenced this pull request Mar 3, 2016
In December we announced that we would be doing a minor release in order to
get a number of voted on SEMVER-MINOR changes into LTS. Our ability to release this
was delayed due to the unforeseen security release v4.3. We are quickly bumping to
v4.4 in order to bring you the features that we had committed to releasing.

This release also includes over 70 fixes to our docs and over 50 fixes to tests.

The SEMVER-MINOR changes include:
  * deps:
    - An update to v8 that introduces a new flag --perf_basic_prof_only_functions (Ali Ijaz Sheikh) #3609
  * http:
    - A new feature in http(s) agent that catches errors on *keep alived* connections (José F. Romaniello) #4482
  * src:
    - Better support for Big-Endian systems (Bryon Leung) #3410
  * tls:
    - A new feature that allows you to pass common SSL options to `tls.createSecurePair` (Коренберг Марк) #2441
  * tools
    - a new flag `--prof-process` which will execute the tick processor on the provided isolate files (Matt Loring) #4021

Notable semver patch changes include:

  * buld:
    - Support python path that includes spaces. This should be of particular interest to our Windows users who may have python living in `c:/Program Files` (Felix Becker) #4841
  * https:
    - A potential fix for #3692 HTTP/HTTPS client requests throwing EPROTO (Fedor Indutny) #4982
  * installer:
    - More readable profiling information from isolate tick logs (Matt Loring) #3032
  * *npm:
    - upgrade to npm 2.14.20 (Kat Marchán) #5510
  * process:
    - Add support for symbols in event emitters. Symbols didn't exist when it was written ¯\_(ツ)_/¯ (cjihrig) #4798
  * querystring:
    - querystring.parse() is now 13-22% faster! (Brian White) #4675
  * streams:
    - performance improvements for moving small buffers that shows a 5% throughput gain. IoT projects have been seen to be as much as 10% faster with this change! (Matteo Collina) #4354
  * tools:
    - eslint has been updated to version 2.1.0 (Rich Trott) #5214

PR-URL: #5301
MylesBorins pushed a commit that referenced this pull request Mar 8, 2016
In December we announced that we would be doing a minor release in order to
get a number of voted on SEMVER-MINOR changes into LTS. Our ability to release this
was delayed due to the unforeseen security release v4.3. We are quickly bumping to
v4.4 in order to bring you the features that we had committed to releasing.

This release also includes over 70 fixes to our docs and over 50 fixes to tests.

The SEMVER-MINOR changes include:
  * deps:
    - An update to v8 that introduces a new flag --perf_basic_prof_only_functions (Ali Ijaz Sheikh) #3609
  * http:
    - A new feature in http(s) agent that catches errors on *keep alived* connections (José F. Romaniello) #4482
  * src:
    - Better support for Big-Endian systems (Bryon Leung) #3410
  * tls:
    - A new feature that allows you to pass common SSL options to `tls.createSecurePair` (Коренберг Марк) #2441
  * tools
    - a new flag `--prof-process` which will execute the tick processor on the provided isolate files (Matt Loring) #4021

Notable semver patch changes include:

  * buld:
    - Support python path that includes spaces. This should be of particular interest to our Windows users who may have python living in `c:/Program Files` (Felix Becker) #4841
  * https:
    - A potential fix for #3692 HTTP/HTTPS client requests throwing EPROTO (Fedor Indutny) #4982
  * installer:
    - More readable profiling information from isolate tick logs (Matt Loring) #3032
  * *npm:
    - upgrade to npm 2.14.20 (Kat Marchán) #5510
  * process:
    - Add support for symbols in event emitters. Symbols didn't exist when it was written ¯\_(ツ)_/¯ (cjihrig) #4798
  * querystring:
    - querystring.parse() is now 13-22% faster! (Brian White) #4675
  * streams:
    - performance improvements for moving small buffers that shows a 5% throughput gain. IoT projects have been seen to be as much as 10% faster with this change! (Matteo Collina) #4354
  * tools:
    - eslint has been updated to version 2.1.0 (Rich Trott) #5214

PR-URL: #5301
MylesBorins pushed a commit that referenced this pull request Mar 9, 2016
In December we announced that we would be doing a minor release in order to
get a number of voted on SEMVER-MINOR changes into LTS. Our ability to release this
was delayed due to the unforeseen security release v4.3. We are quickly bumping to
v4.4 in order to bring you the features that we had committed to releasing.

This release also includes over 70 fixes to our docs and over 50 fixes to tests.

The SEMVER-MINOR changes include:
  * deps:
    - An update to v8 that introduces a new flag --perf_basic_prof_only_functions (Ali Ijaz Sheikh) #3609
  * http:
    - A new feature in http(s) agent that catches errors on *keep alived* connections (José F. Romaniello) #4482
  * src:
    - Better support for Big-Endian systems (Bryon Leung) #3410
  * tls:
    - A new feature that allows you to pass common SSL options to `tls.createSecurePair` (Коренберг Марк) #2441
  * tools
    - a new flag `--prof-process` which will execute the tick processor on the provided isolate files (Matt Loring) #4021

Notable semver patch changes include:

  * buld:
    - Support python path that includes spaces. This should be of particular interest to our Windows users who may have python living in `c:/Program Files` (Felix Becker) #4841
  * https:
    - A potential fix for #3692 HTTP/HTTPS client requests throwing EPROTO (Fedor Indutny) #4982
  * installer:
    - More readable profiling information from isolate tick logs (Matt Loring) #3032
  * *npm:
    - upgrade to npm 2.14.20 (Kat Marchán) #5510
  * process:
    - Add support for symbols in event emitters. Symbols didn't exist when it was written ¯\_(ツ)_/¯ (cjihrig) #4798
  * querystring:
    - querystring.parse() is now 13-22% faster! (Brian White) #4675
  * streams:
    - performance improvements for moving small buffers that shows a 5% throughput gain. IoT projects have been seen to be as much as 10% faster with this change! (Matteo Collina) #4354
  * tools:
    - eslint has been updated to version 2.1.0 (Rich Trott) #5214

PR-URL: #5301
for (size_t i = 0; i < nchars; i++)
dst[i] = dst[i] << 8 | dst[i] >> 8;
break;
SwapBytes(dst, dst, nchars);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was pointed out in #7645 that this change introduces a bug: the break statement should have been retained because now the bytes are swapped back by the fallback path below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out! Does anyone already develop a fix? if not, I can do the favour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7645 should fix this when it lands but I've made a note in case that PR stalls.

zbjornson added a commit to zbjornson/node that referenced this pull request Sep 30, 2016
Between nodejs#3410 and nodejs#7645, bytes were swapped twice on bigendian
platforms if buffer was not two-byte aligned. See comment in nodejs#7645.
gibfahn pushed a commit that referenced this pull request Oct 4, 2016
Removes use of builtins that are unavailable for older clang. Per
benchmarks, only uses builtins on Windows, where speedup is
significant.

Also adds test for unaligned ucs2 buffer write. Between #3410
and #7645, bytes were swapped twice on bigendian platforms if buffer
was not two-byte aligned. See comment in #7645.

PR-URL: #7645
Fixes: #7618
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
jasnell pushed a commit that referenced this pull request Oct 6, 2016
Removes use of builtins that are unavailable for older clang. Per
benchmarks, only uses builtins on Windows, where speedup is
significant.

Also adds test for unaligned ucs2 buffer write. Between #3410
and #7645, bytes were swapped twice on bigendian platforms if buffer
was not two-byte aligned. See comment in #7645.

PR-URL: #7645
Fixes: #7618
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Fishrock123 pushed a commit that referenced this pull request Oct 11, 2016
Removes use of builtins that are unavailable for older clang. Per
benchmarks, only uses builtins on Windows, where speedup is
significant.

Also adds test for unaligned ucs2 buffer write. Between #3410
and #7645, bytes were swapped twice on bigendian platforms if buffer
was not two-byte aligned. See comment in #7645.

PR-URL: #7645
Fixes: #7618
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>

 Conflicts:
	src/node_buffer.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. semver-minor PRs that contain new features and should be released in the next minor version. string_decoder Issues and PRs related to the string_decoder subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.