Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: schedule unref immediate for uv_close in destructors #18307

Closed

Conversation

apapirovski
Copy link
Member

Calling uv_close directly in destructors seems to be problematic as it can bring a loop back alive after EmitBeforeExit. The problem is that this only happens sometimes as it depends on GC timing, making the behaviour of these two classes unpredictable.

This PR should fix that behaviour and also (I think?) fix the flaky sequential/test-async-wrap-getasyncid.js test.

Fixes: #18190

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

src

@apapirovski apapirovski added c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. cares Issues and PRs related to the c-ares dependency or the cares_wrap binding. labels Jan 23, 2018
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. cares Issues and PRs related to the c-ares dependency or the cares_wrap binding. fs Issues and PRs related to the fs subsystem / file system. labels Jan 23, 2018
@apapirovski
Copy link
Member Author

apapirovski commented Jan 23, 2018

@apapirovski
Copy link
Member Author

CI & stress test are both clean.

@jasnell jasnell requested a review from addaleax January 23, 2018 19:43
@addaleax
Copy link
Member

Can we also make this deterministic by adding explicit gc to the test?

That feels less complex and I'd like to be careful about delaying resource cleanup in an unrefed resource; I think this would e.g. lead to a memory leak in embedders that just wait for the loop to finish?

@apapirovski
Copy link
Member Author

apapirovski commented Feb 1, 2018

@addaleax Sorry, missed your comment. The problem is that these uv_close callbacks might or might not run in their code. The fact that they're triggered by GC means that they might get triggered after the loop has already shutdown which means they won't execute. It seems tricky either way.

(For example, we also don't call uv_close on a bunch of internal handles like the ones for Immediates because by the time we know to do that it's already too late.)

To be clear, I agree this is far from ideal but I also don't like papering over the issue by calling gc explicitly in the test. I'll think on this a bit longer.

@addaleax
Copy link
Member

addaleax commented Feb 1, 2018

@apapirovski An embedder would run the loop until its empty if it wants proper cleanup … which is not really exposed by Node right now, that’s true.

I know on some level this doesn’t “count”, but it’s what my Workers implementation (in Ayo) does, and I’d like to PR that some time in the next 1 or 2 months or so; for that I would basically have to re-do this PR with that approach. (Which is fine! I can’t expect you to do work just because my PR might need it. 😄)

@apapirovski
Copy link
Member Author

apapirovski commented Feb 1, 2018

@addaleax Just so I'm not misunderstanding, wouldn't the cleanup code run in that case since the unrefed Immediate would run? (Just asking so I have full clarity on this, not trying to poke holes or something. I might be misunderstanding what we're discussing.)

But either way, I'll keep thinking on this. I wasn't super happy with this solution but I'm also not super happy with the current approach.

@addaleax
Copy link
Member

addaleax commented Feb 1, 2018

@apapirovski I guess you are right… my line of thinking was that there could be a situation where only the unrefed immediate was would be active, so the loop would stop. But if we’re spinning it anyway afterwards until everything is closed that doesn’t really matter…

Maybe the real question here is what the API contract for beforeExit is? I.e., whether the thing that resurrects the loop always needs to be visible to userland?

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if you want to land this

I think I’d still prefer the other approach, but I’ll keep thinking about it as well

@apapirovski apapirovski added the blocked PRs that are blocked by other issues or PRs. label Feb 1, 2018
@apapirovski
Copy link
Member Author

apapirovski commented Feb 1, 2018

Maybe the real question here is what the API contract for beforeExit is? I.e., whether the thing that resurrects the loop always needs to be visible to userland?

I don't think it needs to be strictly "visible" but we should minimize arbitrary awakening via stuff like GC (since it's so heavily tied to timing and can also differ from platform to platform). Even just for the sake of avoiding false positives or negatives in our own tests. (For example, something like this can lead to a test involving an unintentionally unref'd handle succeeding because the GC keeps the loop alive for one more run.)

Anyway, I'm marking as blocked for now. I appreciate the approval but I do think this needs to be thought through a bit more on my part.

@@ -74,7 +69,12 @@ StatWatcher::StatWatcher(Environment* env, Local<Object> wrap)

StatWatcher::~StatWatcher() {
Stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at it... calling Stop() here is probably a minor (if harmless) bug as it in turn calls MakeWeak() when the destructor is almost certainly invoked from a weak callback.

You didn't introduce it so it's fine to leave it be for now but I figured I'd point it out.

@Fishrock123
Copy link
Contributor

I'm curious - was this caused by the uv_run changes in 9e08695?

@apapirovski
Copy link
Member Author

@Fishrock123 It's likely but we would've had other issues due to the implementation before. That change was before my time but on the whole it seems positive after reviewing it in detail.

Ultimately, we just need to be more mindful of GC-triggered operations. But still not sure of a good solution. 🤔

@addaleax
Copy link
Member

Ultimately, we just need to be more mindful of GC-triggered operations. But still not sure of a good solution. 🤔

By the way, I think for some things we could also use a GCEpilogueCallback instead of an event-loop-based deferral … for cleanup from destructors that might be okay?

I think we’d even be allowed to call into JS in such a callback, but I would really prefer not to; having async_hooks callbacks fire at what is essentially a completely arbitrary point in your program doesn’t seem like a good idea…

@BridgeAR
Copy link
Member

BridgeAR commented Mar 2, 2018

What shall we do here to progress further?

@BridgeAR
Copy link
Member

Ping @apapirovski

@apapirovski
Copy link
Member Author

@BridgeAR Your guess is as good as mine. I don't love this solution. The issue is an extreme edge case and I haven't seen it on our CI lately. I would prefer to keep it open, if nothing else then as a reminder to myself to continue investigating.

@BridgeAR
Copy link
Member

Investigating further would be great :)

@BridgeAR
Copy link
Member

I am going to close this for now. I guess it would be best to open an issue if a reminder is still necessary.

@apapirovski please feel free to reopen in case you want to work on it again.

@BridgeAR BridgeAR closed this May 15, 2018
apapirovski added a commit that referenced this pull request Jun 10, 2018
Instead of relying on garbage collection to close the timer handle,
manage its state more explicitly.

PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
apapirovski added a commit that referenced this pull request Jun 10, 2018
Instead of relying on garbage collection to close the handle,
manage its state more explicitly.

PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
apapirovski added a commit that referenced this pull request Jun 10, 2018
PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this pull request Jun 10, 2018
Instead of relying on garbage collection to close the timer handle,
manage its state more explicitly.

PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this pull request Jun 10, 2018
Instead of relying on garbage collection to close the handle,
manage its state more explicitly.

PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this pull request Jun 10, 2018
PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this pull request Jun 13, 2018
Instead of relying on garbage collection to close the timer handle,
manage its state more explicitly.

PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this pull request Jun 13, 2018
Instead of relying on garbage collection to close the handle,
manage its state more explicitly.

PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
targos pushed a commit that referenced this pull request Jun 13, 2018
PR-URL: #21093
Fixes: #18190
Refs: #18307
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked PRs that are blocked by other issues or PRs. c++ Issues and PRs that require attention from people who are familiar with C++. cares Issues and PRs related to the c-ares dependency or the cares_wrap binding. fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate flaky sequential/test-async-wrap-getasyncid
6 participants