src: limit foreground tasks draining loop #19987

ulan · 2018-04-12T20:43:25Z

Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

Fixes: #19937

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

Foreground tasks that repost themselves can force the draining loop to run indefinitely long without giving other tasks chance to run. This limits the foreground task draining loop to run only the tasks that were in the tasks queue at the beginning of the loop. Fixes: #19937

addaleax · 2018-04-12T20:46:22Z

src/node_platform.h

@@ -66,6 +67,7 @@ class PerIsolatePlatformData :
  int unref();

  // Returns true iff work was dispatched or executed.
+  // New tasks that are posted during flushing of the queue are not run.


Tiny nit: Maybe be more specific here and say that they are not run as part of this iteration, rather than implying that they aren’t run at all?

Thanks. Done.

jasnell · 2018-04-12T20:56:26Z

hmmm... definitely +1 on this but should this be semver-major?

hashseed · 2018-04-13T07:10:10Z

CI job: https://ci.nodejs.org/job/node-test-pull-request/14243/

jasnell · 2018-04-14T19:41:44Z

There are some CI failures. it's not clear if they are unrelated.

BridgeAR · 2018-04-15T21:31:43Z

New CI https://ci.nodejs.org/job/node-test-pull-request/14311/

BridgeAR · 2018-04-15T21:31:58Z

Not sure but there might be issues with Windows.

BridgeAR · 2018-04-16T02:48:51Z

Windows failures seem related.

ulan · 2018-04-16T12:14:57Z

Thanks. I'll try to reproduce the windows failure and debug it.

ulan · 2018-04-16T14:29:34Z

The windows failure should be fixed now. The problem was caused by the incorrect repost count in the new test. That resulted in one left-over task that tried to access Environment after its destruction.

hashseed · 2018-04-16T15:15:13Z

New CI: https://ci.nodejs.org/job/node-test-pull-request/14331/

ulan · 2018-04-17T12:13:39Z

sequential/test-inspector-stop-profile-after-done is failing on ubuntu1710-x64 on all three CI jobs.
https://ci.nodejs.org/job/node-test-commit-linux/18054/nodes=ubuntu1710-x64/consoleFull

Unfortunately it doesn't reproduce on my machine. I will have to find ubuntu 17.10.

11:36:59 not ok 2197 sequential/test-inspector-stop-profile-after-done
11:36:59   ---
11:36:59   duration_ms: 1.293
11:36:59   severity: fail
11:36:59   exitcode: 1
11:36:59   stack: |-
11:36:59     [test] Connecting to a child Node process
11:36:59     [test] Testing /json/list
11:36:59     [err] Debugger listening on ws://127.0.0.1:38959/3e238875-261d-468a-ae06-ed825beac8af
11:36:59     [err] 
11:36:59     [err] For help, see: https://nodejs.org/en/docs/inspector
11:36:59     [err] 
11:36:59     [err] Debugger attached.
11:36:59     [err] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [err] Waiting for the debugger to disconnect...
11:36:59     [err] 
11:36:59     { AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
11:36:59     + expected - actual
11:36:59     
11:36:59     - null
11:36:59     + 0
11:36:59         at runTests (/mnt/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1710-x64/test/sequential/test-inspector-stop-profile-after-done.js:28:10)
11:36:59         at process._tickCallback (internal/process/next_tick.js:178:7)
11:36:59       generatedMessage: true,
11:36:59       name: 'AssertionError [ERR_ASSERTION]',
11:36:59       code: 'ERR_ASSERTION',
11:36:59       actual: null,
11:36:59       expected: 0,
11:36:59       operator: 'strictEqual' }
11:36:59     1
11:36:59   ...

BridgeAR · 2018-04-17T12:16:08Z

@ulan that is a known flake. It is independent from this PR and you can safely ignore it :-)

hashseed · 2018-04-17T12:19:31Z

Guess this is ready then?

ulan · 2018-04-17T12:20:04Z

@BridgeAR, thank you!
@hashseed, yes, it is ready.

bnoordhuis

Thanks, LGTM. CI: https://ci.nodejs.org/job/node-test-pull-request/14369/

apapirovski · 2018-04-19T17:25:59Z

Ok, I know that sequential/test-inspector-stop-profile-after-done is a flake but this PR might've exacerbated it? Last CI it got triggered 5 times and I'm pretty sure it's highly uncommon to hit that one more than once in a CI run. Anyone have any thoughts?

ulan · 2018-04-19T17:43:25Z

@apapirovski yes, that seems strange. Could it be caused by a "unlucky" base revision? (I don't know if CI rebases the patch to the tip of tree for each run).

ulan · 2018-04-19T17:56:08Z

After further inspection, I think it might be related because the test uses the sampling heap profiler, which internally relies on V8's allocation observer. Incremental marker also uses the allocation observer.

There is probably some timing change that causing the test to fail more frequently. I will investigate.

ulan · 2018-04-20T14:01:07Z

Sampling heap profiler theory was incorrect. The test is about CPU profiler.

I found a bug in the initial patch: NodeInspectorClient::runMessageLoopOnPause uses FlushForegroundTasks, so it has to be called now in a loop to preserve the old behaviour. I uploaded a patch that will hopefully fix the failure on the bot.

addaleax

Updated patch LGTM

CI: https://ci.nodejs.org/job/node-test-pull-request/14399/

apapirovski

Thank you for following up on this.

apapirovski · 2018-04-20T14:05:27Z

src/node_platform.h

-  // Returns true iff work was dispatched or executed.
-  // New tasks that are posted during flushing of the queue are postponed until
-  // the next flushing.
+  // Returns true iff work was dispatched or executed. New tasks that are


Nit: s/iff/if.

@apapirovski I guess these were intentional?

Yeah but I know the meaning and I didn't even think of it until you mentioned it... but maybe that just reflects badly on me... 🤔 😆

I guess what I'm saying: I don't think this is the place for iff. It doesn't make it clearer.

But I'm not going to object if someone prefers this. It's a low priority thing for me.

Done. The iff was there before my change. I agree that if seems slightly better.

apapirovski · 2018-04-20T14:05:37Z

src/node_platform.h

@@ -133,7 +133,10 @@ class NodePlatform : public MultiIsolatePlatform {
  double CurrentClockTimeMillis() override;
  v8::TracingController* GetTracingController() override;

-  void FlushForegroundTasks(v8::Isolate* isolate);
+  // Returns true iff work was dispatched or executed. New tasks that are


Same here, s/iff/if

ulan · 2018-04-20T17:02:14Z

There are two windows failures in the recent CI job:
1)

not ok 246 parallel/test-tls-server-verify
  ---
  duration_ms: 60.164
  severity: fail
  stack: |-
    timeout
  ...

FATAL: command execution failed
java.nio.channels.ClosedChannelException
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:18

Both failures seem to me unrelated to the PR.

BridgeAR · 2018-04-23T14:04:04Z

CI https://ci.nodejs.org/job/node-test-pull-request/14440/

@ulan there are often some flakes.

apapirovski · 2018-04-25T08:49:03Z

Landed in d3edf2f

Thank you for the contribution @ulan and congrats on becoming a Contributor! 🎉

Foreground tasks that repost themselves can force the draining loop to run indefinitely long without giving other tasks chance to run. This limits the foreground task draining loop to run only the tasks that were in the tasks queue at the beginning of the loop. PR-URL: #19987 Fixes: #19937 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Khaidi Chu <i@2333.moe>

ulan · 2018-04-25T09:34:23Z

@apapirovski thanks for landing!

Lucretiel · 2018-04-26T21:32:41Z

src/inspector_agent.cc

@@ -316,7 +316,7 @@ class NodeInspectorClient : public V8InspectorClient {
    terminated_ = false;
    running_nested_loop_ = true;
    while (!terminated_ && channel_->waitForFrontendMessage()) {
-      platform_->FlushForegroundTasks(env_->isolate());
+      while (platform_->FlushForegroundTasks(env_->isolate())) {}


Doesn't this line preserve the bug? Because, even though you swap the queue before draining it, you're still running this loop forever if new tasks are constantly added to the original queue

The inspector posts foreground tasks and requires that they are all processed before going to the outer loop.

AFAIK this code runs only when the inspector is active and the program is paused. The normal libuv tasks are not processed here. Latency shouldn't be an issue.

My understanding is that the original bug this is trying to fix (#19937) is that there are cases where foreground tasks can add additional tasks to the queue. The bug was fixed by freezing the queue inside of FlushForegroundTasks, but this line of code I'm commenting on appears to loop THAT CALL so that the freeze fix doesn't actually help anything. It still will run forever, if foreground tasks add themselves.

Unless there's more than one place where FlushForegroundTasks is being called, and that's not an issue for this line?

There is certainly more than one place. This particular line addresses a very specific interaction.

Foreground tasks that repost themselves can force the draining loop to run indefinitely long without giving other tasks chance to run. This limits the foreground task draining loop to run only the tasks that were in the tasks queue at the beginning of the loop. PR-URL: #19987 Fixes: #19937 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Khaidi Chu <i@2333.moe>

Foreground tasks that repost themselves can force the draining loop to run indefinitely long without giving other tasks chance to run. This limits the foreground task draining loop to run only the tasks that were in the tasks queue at the beginning of the loop. PR-URL: nodejs#19987 Fixes: nodejs#19937 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Khaidi Chu <i@2333.moe>

nodejs-github-bot added build Issues and PRs related to build files or the CI. c++ Issues and PRs that require attention from people who are familiar with C++. labels Apr 12, 2018

addaleax approved these changes Apr 12, 2018

View reviewed changes

jasnell approved these changes Apr 12, 2018

View reviewed changes

hashseed approved these changes Apr 13, 2018

View reviewed changes

[squash] clarify the comment of flushing

4dbc4aa

cjihrig approved these changes Apr 13, 2018

View reviewed changes

TimothyGu approved these changes Apr 14, 2018

View reviewed changes

BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 15, 2018

BridgeAR removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 16, 2018

[squash] fix the repost count in the new test.

c435401

hashseed added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 17, 2018

bnoordhuis approved these changes Apr 18, 2018

View reviewed changes

[squash] drain all tasks in inspector-agent.

167ae0e

addaleax approved these changes Apr 20, 2018

View reviewed changes

apapirovski approved these changes Apr 20, 2018

View reviewed changes

[squash] fix comment s/iff/if/.

f46763b

XadillaX approved these changes Apr 24, 2018

View reviewed changes

apapirovski closed this Apr 25, 2018

Lucretiel reviewed Apr 26, 2018

View reviewed changes

MylesBorins mentioned this pull request May 8, 2018

v10.1.0 proposal #20606

Merged

addaleax added the v8 platform Issues and PRs related to Node's v8::Platform implementation. label Feb 18, 2020

This was referenced Jun 3, 2020

[Snyk] Security upgrade gatsby from 2.13.73 to 2.18.10 kissofire/node#15

Open

[Snyk] Security upgrade gatsby from 2.13.73 to 2.18.10 meghasfdc/node#16

Open

snyk-bot mentioned this pull request Jun 14, 2020

[Snyk] Security upgrade gatsby from 2.13.73 to 2.18.10 kissofire/node#16

Open

kissofire mentioned this pull request Nov 14, 2023

[Snyk] Security upgrade gatsby from 2.13.73 to 2.18.10 kissofire/node#105

Open

src: limit foreground tasks draining loop #19987

src: limit foreground tasks draining loop #19987

Conversation

ulan commented Apr 12, 2018

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasnell commented Apr 12, 2018

hashseed commented Apr 13, 2018

jasnell commented Apr 14, 2018

BridgeAR commented Apr 15, 2018

BridgeAR commented Apr 15, 2018

BridgeAR commented Apr 16, 2018

ulan commented Apr 16, 2018

ulan commented Apr 16, 2018

hashseed commented Apr 16, 2018

ulan commented Apr 17, 2018

BridgeAR commented Apr 17, 2018 • edited Loading

hashseed commented Apr 17, 2018

ulan commented Apr 17, 2018

bnoordhuis left a comment

Choose a reason for hiding this comment

apapirovski commented Apr 19, 2018

ulan commented Apr 19, 2018

ulan commented Apr 19, 2018

ulan commented Apr 20, 2018

addaleax left a comment

Choose a reason for hiding this comment

apapirovski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apapirovski Apr 20, 2018 • edited Loading

Choose a reason for hiding this comment

apapirovski Apr 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ulan commented Apr 20, 2018

BridgeAR commented Apr 23, 2018

apapirovski commented Apr 25, 2018

ulan commented Apr 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BridgeAR commented Apr 17, 2018 •

edited

Loading

apapirovski Apr 20, 2018 •

edited

Loading

apapirovski Apr 20, 2018 •

edited

Loading