Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: limit foreground tasks draining loop #19987

Closed
wants to merge 5 commits into from
Closed

src: limit foreground tasks draining loop #19987

wants to merge 5 commits into from

Conversation

ulan
Copy link
Contributor

@ulan ulan commented Apr 12, 2018

Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

Fixes: #19937

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

Fixes: #19937
@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. c++ Issues and PRs that require attention from people who are familiar with C++. labels Apr 12, 2018
@@ -66,6 +67,7 @@ class PerIsolatePlatformData :
int unref();

// Returns true iff work was dispatched or executed.
// New tasks that are posted during flushing of the queue are not run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: Maybe be more specific here and say that they are not run as part of this iteration, rather than implying that they aren’t run at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

@jasnell
Copy link
Member

jasnell commented Apr 12, 2018

hmmm... definitely +1 on this but should this be semver-major?

@hashseed
Copy link
Member

@jasnell
Copy link
Member

jasnell commented Apr 14, 2018

There are some CI failures. it's not clear if they are unrelated.

@BridgeAR
Copy link
Member

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 15, 2018
@BridgeAR
Copy link
Member

Not sure but there might be issues with Windows.

@BridgeAR BridgeAR removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 16, 2018
@BridgeAR
Copy link
Member

Windows failures seem related.

@ulan
Copy link
Contributor Author

ulan commented Apr 16, 2018

Thanks. I'll try to reproduce the windows failure and debug it.

@ulan
Copy link
Contributor Author

ulan commented Apr 16, 2018

The windows failure should be fixed now. The problem was caused by the incorrect repost count in the new test. That resulted in one left-over task that tried to access Environment after its destruction.

@hashseed
Copy link
Member

@ulan
Copy link
Contributor Author

ulan commented Apr 17, 2018

sequential/test-inspector-stop-profile-after-done is failing on ubuntu1710-x64 on all three CI jobs.
https://ci.nodejs.org/job/node-test-commit-linux/18054/nodes=ubuntu1710-x64/consoleFull

Unfortunately it doesn't reproduce on my machine. I will have to find ubuntu 17.10.

11:36:59 not ok 2197 sequential/test-inspector-stop-profile-after-done
11:36:59   ---
11:36:59   duration_ms: 1.293
11:36:59   severity: fail
11:36:59   exitcode: 1
11:36:59   stack: |-
11:36:59     [test] Connecting to a child Node process
11:36:59     [test] Testing /json/list
11:36:59     [err] Debugger listening on ws://127.0.0.1:38959/3e238875-261d-468a-ae06-ed825beac8af
11:36:59     [err] 
11:36:59     [err] For help, see: https://nodejs.org/en/docs/inspector
11:36:59     [err] 
11:36:59     [err] Debugger attached.
11:36:59     [err] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [out] {}
11:36:59     [out] 
11:36:59     [err] Waiting for the debugger to disconnect...
11:36:59     [err] 
11:36:59     { AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
11:36:59     + expected - actual
11:36:59     
11:36:59     - null
11:36:59     + 0
11:36:59         at runTests (/mnt/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1710-x64/test/sequential/test-inspector-stop-profile-after-done.js:28:10)
11:36:59         at process._tickCallback (internal/process/next_tick.js:178:7)
11:36:59       generatedMessage: true,
11:36:59       name: 'AssertionError [ERR_ASSERTION]',
11:36:59       code: 'ERR_ASSERTION',
11:36:59       actual: null,
11:36:59       expected: 0,
11:36:59       operator: 'strictEqual' }
11:36:59     1
11:36:59   ...

@BridgeAR
Copy link
Member

BridgeAR commented Apr 17, 2018

@ulan that is a known flake. It is independent from this PR and you can safely ignore it :-)

@hashseed hashseed added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 17, 2018
@hashseed
Copy link
Member

Guess this is ready then?

@ulan
Copy link
Contributor Author

ulan commented Apr 17, 2018

@BridgeAR, thank you!
@hashseed, yes, it is ready.

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apapirovski
Copy link
Member

Ok, I know that sequential/test-inspector-stop-profile-after-done is a flake but this PR might've exacerbated it? Last CI it got triggered 5 times and I'm pretty sure it's highly uncommon to hit that one more than once in a CI run. Anyone have any thoughts?

@ulan
Copy link
Contributor Author

ulan commented Apr 19, 2018

@apapirovski yes, that seems strange. Could it be caused by a "unlucky" base revision? (I don't know if CI rebases the patch to the tip of tree for each run).

@ulan
Copy link
Contributor Author

ulan commented Apr 19, 2018

After further inspection, I think it might be related because the test uses the sampling heap profiler, which internally relies on V8's allocation observer. Incremental marker also uses the allocation observer.

There is probably some timing change that causing the test to fail more frequently. I will investigate.

@ulan
Copy link
Contributor Author

ulan commented Apr 20, 2018

Sampling heap profiler theory was incorrect. The test is about CPU profiler.

I found a bug in the initial patch: NodeInspectorClient::runMessageLoopOnPause uses FlushForegroundTasks, so it has to be called now in a loop to preserve the old behaviour. I uploaded a patch that will hopefully fix the failure on the bot.

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@apapirovski apapirovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for following up on this.

// Returns true iff work was dispatched or executed.
// New tasks that are posted during flushing of the queue are postponed until
// the next flushing.
// Returns true iff work was dispatched or executed. New tasks that are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/iff/if.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apapirovski I guess these were intentional?

Copy link
Member

@apapirovski apapirovski Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but I know the meaning and I didn't even think of it until you mentioned it... but maybe that just reflects badly on me... 🤔 😆

Copy link
Member

@apapirovski apapirovski Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm saying: I don't think this is the place for iff. It doesn't make it clearer.

But I'm not going to object if someone prefers this. It's a low priority thing for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The iff was there before my change. I agree that if seems slightly better.

@@ -133,7 +133,10 @@ class NodePlatform : public MultiIsolatePlatform {
double CurrentClockTimeMillis() override;
v8::TracingController* GetTracingController() override;

void FlushForegroundTasks(v8::Isolate* isolate);
// Returns true iff work was dispatched or executed. New tasks that are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, s/iff/if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ulan
Copy link
Contributor Author

ulan commented Apr 20, 2018

There are two windows failures in the recent CI job:
1)

not ok 246 parallel/test-tls-server-verify
  ---
  duration_ms: 60.164
  severity: fail
  stack: |-
    timeout
  ...
FATAL: command execution failed
java.nio.channels.ClosedChannelException
	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:18 

Both failures seem to me unrelated to the PR.

@BridgeAR
Copy link
Member

CI https://ci.nodejs.org/job/node-test-pull-request/14440/

@ulan there are often some flakes.

@apapirovski
Copy link
Member

Landed in d3edf2f

Thank you for the contribution @ulan and congrats on becoming a Contributor! 🎉

apapirovski pushed a commit that referenced this pull request Apr 25, 2018
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

PR-URL: #19987
Fixes: #19937
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Khaidi Chu <i@2333.moe>
@ulan
Copy link
Contributor Author

ulan commented Apr 25, 2018

@apapirovski thanks for landing!

@@ -316,7 +316,7 @@ class NodeInspectorClient : public V8InspectorClient {
terminated_ = false;
running_nested_loop_ = true;
while (!terminated_ && channel_->waitForFrontendMessage()) {
platform_->FlushForegroundTasks(env_->isolate());
while (platform_->FlushForegroundTasks(env_->isolate())) {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this line preserve the bug? Because, even though you swap the queue before draining it, you're still running this loop forever if new tasks are constantly added to the original queue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inspector posts foreground tasks and requires that they are all processed before going to the outer loop.

AFAIK this code runs only when the inspector is active and the program is paused. The normal libuv tasks are not processed here. Latency shouldn't be an issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the original bug this is trying to fix (#19937) is that there are cases where foreground tasks can add additional tasks to the queue. The bug was fixed by freezing the queue inside of FlushForegroundTasks, but this line of code I'm commenting on appears to loop THAT CALL so that the freeze fix doesn't actually help anything. It still will run forever, if foreground tasks add themselves.

Unless there's more than one place where FlushForegroundTasks is being called, and that's not an issue for this line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is certainly more than one place. This particular line addresses a very specific interaction.

MylesBorins pushed a commit that referenced this pull request May 4, 2018
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

PR-URL: #19987
Fixes: #19937
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Khaidi Chu <i@2333.moe>
@MylesBorins MylesBorins mentioned this pull request May 8, 2018
blattersturm pushed a commit to citizenfx/node that referenced this pull request Nov 3, 2018
Foreground tasks that repost themselves can force the draining loop
to run indefinitely long without giving other tasks chance to run.

This limits the foreground task draining loop to run only the tasks
that were in the tasks queue at the beginning of the loop.

PR-URL: nodejs#19987
Fixes: nodejs#19937
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Yang Guo <yangguo@chromium.org>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Khaidi Chu <i@2333.moe>
@addaleax addaleax added the v8 platform Issues and PRs related to Node's v8::Platform implementation. label Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. build Issues and PRs related to build files or the CI. c++ Issues and PRs that require attention from people who are familiar with C++. v8 platform Issues and PRs related to Node's v8::Platform implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit foreground task draining loop in NodePlatform