Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: implement minimal v8 snapshot integration #27321

Closed
wants to merge 10 commits into from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Apr 20, 2019

This PR adds minimal support for embedded v8 snapshot. The generated snapshot currently includes:

  • Per-isolate data (symbols and strings)
  • The main context snapshotted after running the per-context scripts

At the moment, we do not create any external reference at the point when the snapshot is captured.
To snapshot more of the bootstrap result, we'll need to implement external reference bookkeeping
in a future PR.
In this PR the setup of the error handlers is delayed until after the context is created, because
attaching the error message handler creates an external reference to the C++ function.

This means any errors thrown during the snapshot generation will be handled by the default
handler and won't have the fancy error stack traces generated by node:: PrintException,
but that should be rare and should only happen during local development (it was broken until
#27236 anyway).

src: allow creating NodeMainInstance that does not own the isolate

Allows instantiating a NodeMainInstance with an isolate
whose initialization and disposal are controlled by the caller.

src: implement IsolateData serialization and deserialization

This patch allows serializing per-isolate data into an isolate
snapshot and deserializing them from an isolate snapshot.

tools: implement node_mksnapshot

Implements a node_mksnapshot target that generates a snapshot blob
from a Node.js main instance's isolate, and serializes the data blob
with other additional data into a C++ file that can be embedded into
the Node.js binary.

src: enable snapshot with per-isolate data

Enable serializing the isolate from an isolate snapshot generated
by node_mksnapshot with per-isolate data.

src: enable context snapshot after running per-context scripts

At build time, snapshot the context after running per-context scripts
in the main instance, and in the final build, deserialize the
context in the main instance.

This provides a ~5% in the misc/startup benchmark when the instance
is launched within a process that runs test/fixtures/semicolon.js.

Refs: #17058

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Apr 20, 2019
@nodejs-github-bot

This comment has been minimized.

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 20, 2019

Benchmark: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/350/

Results:

confidence improvement accuracy (*)   (**)  (***)
19:03:25  misc/startup.js mode='process' script='benchmark/fixtures/require-cachable' dur=1        ***      2.49 %       ±0.90% ±1.20% ±1.56%
19:03:25  misc/startup.js mode='process' script='test/fixtures/semicolon' dur=1                    ***      4.83 %       ±0.74% ±0.99% ±1.30%
19:03:25  misc/startup.js mode='worker' script='benchmark/fixtures/require-cachable' dur=1           *     -2.06 %       ±1.68% ±2.24% ±2.91%
19:03:25  misc/startup.js mode='worker' script='test/fixtures/semicolon' dur=1                              0.58 %       ±2.80% ±3.73% ±4.85%

@joyeecheung joyeecheung marked this pull request as ready for review April 20, 2019 11:22
@joyeecheung
Copy link
Member Author

CI is green. PTAL, thanks!

@hashseed @nodejs/build-files @nodejs/process

@addaleax
Copy link
Member

This looks good, although it’ll take some time for me to get used to spelling it “indexes” 😄

@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 20, 2019

@addaleax wait..is it "indices"? 🤦‍♀️

(Funny enough there are 1206 "indexes" in the code base and 950 "indices")

@devsnek
Copy link
Member

devsnek commented Apr 20, 2019

indices is linguistically correct, but indexes is acceptable as well

Both "indexes" and "indices" are acceptable plural forms of the word "index" or to refer to more than one index. Index is one of those rare words that have two different plurals in English. "Indices" is originally a Latin plural, while "Indexes" has taken the English way of making plurals, using –s or –es.

Allows instantiating a NodeMainInstance with an isolate
whose initialization and disposal are controlled by the caller.
This patch allows serializing per-isolate data into an isolate
snapshot and deserializing them from an isolate snapthot.
Implements a node_mksnapshot target that generates a snapshot blob
from a Node.js main instance's isolate, and serializes the data blob
with other additional data into a C++ file that can be embedded into
the Node.js binary.
Enable serializing the isolate from an isolate snapshot generated
by node_mksnapshot with per-isolate data.
At build time, snapshot the context after running per-context scripts
in the main instance, and in the final build, deserialize the
context in the main instance.

This provides a ~5% in the misc/startup benchmark when the instance
is launched within a process that runs test/fixtures/semicolon.js.
@nodejs-github-bot

This comment has been minimized.

@joyeecheung
Copy link
Member Author

Rebased now that the first commit landed in #27276

@@ -885,8 +885,25 @@ int Start(int argc, char** argv) {
}

{
NodeMainInstance main_instance(
uv_default_loop(), result.args, result.exec_args);
Isolate::CreateParams params;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this bit into the NodeMainInstance::main_instance overload? Sine 2 out of 3 params are anyway calculated using NodeMainInstance:: static methods, and the third is {nullptr};
i.e. something like:

    NodeMainInstance main_instance({nullptr},
                                   uv_default_loop(),
                                   per_process::v8_platform.Platform(),
                                   result.args,
                                   result.exec_args);
...
    NodeMainInstance main_instance(const std::vector<intptr_t> external_references&&,
                                   uv_loop_t* event_loop,
                                   MultiIsolatePlatform* platform,
                                   const std::vector<std::string>& args,
                                   const std::vector<std::string>& exec_args);
      Isolate::CreateParams params;
      v8::StartupData* blob = NodeMainInstance::GetEmbeddedSnapshotBlob();
      const NodeMainInstance::IndexArray* indexes =
          NodeMainInstance::GetIsolateDataIndexes();
      if (blob != nullptr) {
        params.external_references = external_references.data();
        params.snapshot_blob = blob;
      }
      ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about moving the GetEmbeddedSnapshotBlob() etc. out of NodeMainInstance at some point when we add support for other snapshots, e.g. workers, and maybe restructuring the class somehow so we could reuse it in the cctest, it would be easier to experiment if the deserialization can be switched off by the caller even when the binary is built with embedded snapshot.

src/node_main_instance.h Outdated Show resolved Hide resolved
node.gyp Show resolved Hide resolved
@refack
Copy link
Contributor

refack commented Apr 20, 2019

YAAAS! Great work!

Two build related comments:

  1. Optimally we should merge this with mkcodecache (and even with js2c.py) since linking a second binary is time-expensive. Also have 2 build time actions that each create 1 file but must run together is cumbersome to for make as such to track.
  2. If we are creating a snapshot anyway, could we drop V8's mksnapshot? IIUC the bootstrap is now essentially:
    1. V8 loads snapshot.
    2. We load our snapshot, overwriting V8's.

@refack refack added the process Issues and PRs related to the process subsystem. label Apr 20, 2019
node.gyp Outdated Show resolved Hide resolved
@joyeecheung
Copy link
Member Author

joyeecheung commented Apr 21, 2019

@refack

Optimally we should merge this with mkcodecache (and even with js2c.py) since linking a second binary is time-expensive. Also have 2 build time actions that each create 1 file but must run together is cumbersome to for make as such to track.

I have thought about that, but as a start I'd like to pay a little bit more build time and keep the two targets separate, since those are both pretty new and we may need to turn them on/off when new bugs come in (e.g. #27307 is still a confirmed bug, and it's not really fixed yet - pending a upstream CL - it only shows up in a specific case in the CI because it's very GC sensitive but it could still reproduce in the wild).

I think we could observe for a while and when they are stable enough, merge the two into one - at that point it's unlikely that we'd want to go back and revert them.

If we are creating a snapshot anyway, could we drop V8's mksnapshot? IIUC the bootstrap is now essentially:
V8 loads snapshot.
We load our snapshot, overwriting V8's.

The first is always done by the cross-compiled builds right? But second step is not currently done for the them. Before we enable this for the cross-compiled builds, keeping V8's snapshot at least gives them one.

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot
Copy link
Collaborator

isolate->SetPromiseRejectCallback(task_queue::PromiseRejectCallback);
v8::CpuProfiler::UseDetailedSourcePositionsForProfiling(isolate);
SetIsolateUpForNode(isolate, IsolateSettingCategories::kErrorHandlers);
SetIsolateUpForNode(isolate, IsolateSettingCategories::kMisc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a bad API (although it's not a public API so it matters less) in that order might be important but that's easy to get wrong for callers. A single call where you pass in an options bitmask would be more robust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the APIs that are called here...I don't think the order actually matters for now? The only dangerous bit is that you could call with one category twice, but that's also true before this change.

I think the current categories is not good as final for sure, but I don't really know the use cases for this other than this patch, so we might as well polish it as we add more use cases (I tried to organize the setup a bit better in another cctest attempt before but that didn't really go anywhere).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the APIs that are called here...I don't think the order actually matters for now?

No, but as a caller I don't know that unless I (closely) read the function's source. It would just be a bit easier all around if I could just say "do this and that" without having to worry about order. Declarative vs. imperative if you will.

Copy link
Member Author

@joyeecheung joyeecheung Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but as a caller I don't know that unless I (closely) read the function's source. It would just be a bit easier all around if I could just say "do this and that" without having to worry about order. Declarative vs. imperative if you will.

This is more like a temporary DRY thing here, when we figure out other use cases we could just tweak it for them, but it’s probably too early to spend time designing it. In the initial prototype I actually copy pasted this into SetIsolateUpForSnapshot() but it looked worse..

src/node_main_instance.h Outdated Show resolved Hide resolved
src/node_main_instance.h Outdated Show resolved Hide resolved
tools/snapshot/node_mksnapshot.cc Outdated Show resolved Hide resolved
tools/snapshot/node_mksnapshot.cc Show resolved Hide resolved
tools/snapshot/node_mksnapshot.cc Outdated Show resolved Hide resolved
tools/snapshot/snapshot_builder.cc Outdated Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member Author

Landed in d66c7e3...d5d9c34

joyeecheung added a commit that referenced this pull request Apr 23, 2019
Allows instantiating a NodeMainInstance with an isolate
whose initialization and disposal are controlled by the caller.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
joyeecheung added a commit that referenced this pull request Apr 23, 2019
This patch allows serializing per-isolate data into an isolate
snapshot and deserializing them from an isolate snapthot.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
joyeecheung added a commit that referenced this pull request Apr 23, 2019
Implements a node_mksnapshot target that generates a snapshot blob
from a Node.js main instance's isolate, and serializes the data blob
with other additional data into a C++ file that can be embedded into
the Node.js binary.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
joyeecheung added a commit that referenced this pull request Apr 23, 2019
Enable serializing the isolate from an isolate snapshot generated
by node_mksnapshot with per-isolate data.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
joyeecheung added a commit that referenced this pull request Apr 23, 2019
At build time, snapshot the context after running per-context scripts
in the main instance, and in the final build, deserialize the
context in the main instance.

This provides a ~5% in the misc/startup benchmark when the instance
is launched within a process that runs test/fixtures/semicolon.js.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
joyeecheung added a commit that referenced this pull request Apr 23, 2019
PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
joyeecheung added a commit that referenced this pull request Apr 23, 2019
This currently breaks `test/pummel/test-hash-seed.js`

PR-URL: #27365
Refs: #27321
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Ujjwal Sharma <usharma1998@gmail.com>
targos pushed a commit that referenced this pull request Apr 27, 2019
Allows instantiating a NodeMainInstance with an isolate
whose initialization and disposal are controlled by the caller.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
targos pushed a commit that referenced this pull request Apr 27, 2019
This patch allows serializing per-isolate data into an isolate
snapshot and deserializing them from an isolate snapthot.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
targos pushed a commit that referenced this pull request Apr 27, 2019
Implements a node_mksnapshot target that generates a snapshot blob
from a Node.js main instance's isolate, and serializes the data blob
with other additional data into a C++ file that can be embedded into
the Node.js binary.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
targos pushed a commit that referenced this pull request Apr 27, 2019
Enable serializing the isolate from an isolate snapshot generated
by node_mksnapshot with per-isolate data.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
targos pushed a commit that referenced this pull request Apr 27, 2019
At build time, snapshot the context after running per-context scripts
in the main instance, and in the final build, deserialize the
context in the main instance.

This provides a ~5% in the misc/startup benchmark when the instance
is launched within a process that runs test/fixtures/semicolon.js.

PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
targos pushed a commit that referenced this pull request Apr 27, 2019
PR-URL: #27321
Refs: #17058
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
targos pushed a commit that referenced this pull request Apr 27, 2019
This currently breaks `test/pummel/test-hash-seed.js`

PR-URL: #27365
Refs: #27321
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Ujjwal Sharma <usharma1998@gmail.com>
@targos targos mentioned this pull request Apr 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. process Issues and PRs related to the process subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants