Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-child-process-fork-dgram Intermittent timeout on AIX #8271

Closed
mhdawson opened this issue Aug 25, 2016 · 8 comments
Closed

test-child-process-fork-dgram Intermittent timeout on AIX #8271

mhdawson opened this issue Aug 25, 2016 · 8 comments
Labels
child_process Issues and PRs related to the child_process subsystem. dgram Issues and PRs related to the dgram subsystem / UDP. test Issues and PRs related to the tests.

Comments

@mhdawson
Copy link
Member

  • Version: master
  • Platform: AIX
  • Subsystem: process

Noticed this failure in a recent AIX run when investigating intermittent addon issue:

https://ci.nodejs.org/job/node-test-commit-aix/472/->

test-child-process-fork-dgram failed with a timeout

not ok 55 parallel/test-child-process-fork-dgram
# TIMEOUT

Likely a pre-existing intermittent failure as nothing in the PR looks related to the failure.

@mhdawson mhdawson changed the title test-child-process-fork-dgram on AIX - intermittent? test-child-process-fork-dgram Intermittent timeout on AIX Aug 25, 2016
@mhdawson
Copy link
Member Author

Will look to mark this as flaky for AIX ASAP

@mhdawson
Copy link
Member Author

From the test description sounds like it is a good candidate for being flaky:

Because it's not really possible to predict how the messages will be
distributed among the parent and the child processes, we keep sending
messages until both the parent and the child received at least one
message. The worst case scenario is when either one never receives
a message. In this case the test runner will timeout after 60 secs
and the test will fail.

@mhdawson
Copy link
Member Author

PR to mark as flaky here: #8274

@mscdex mscdex added child_process Issues and PRs related to the child_process subsystem. dgram Issues and PRs related to the dgram subsystem / UDP. test Issues and PRs related to the tests. labels Aug 25, 2016
@gibfahn
Copy link
Member

gibfahn commented Sep 9, 2016

In passing test runs on both AIX and Linux, usually the vast majority of the messages go to the parent, so the test continues until the first message is sent to the child.

Every time the test fails on the community machines, all the messages (1 per ms for 60s, or 60,000 messages) go to the child, which almost certainly means that the parent isn't able to receive messages.

I have been unable to get this to fail on non-community AIX machines (in about 40,000 runs).

@gibfahn
Copy link
Member

gibfahn commented Sep 14, 2016

This change seems to fix the problem, though I'm not sure why it would. I'll investigate further.

diff --git a/test/parallel/test-child-process-fork-dgram.js b/test/parallel/test-child-process-fork-dgram.js
index 5a00dca..eaca6e9 100644
--- a/test/parallel/test-child-process-fork-dgram.js
+++ b/test/parallel/test-child-process-fork-dgram.js
@@ -32,10 +32,10 @@ if (process.argv[2] === 'child') {

       server.on('message', function() {
         process.send('gotMessage');
+        server.close();
       });

     } else if (msg === 'stop') {
-      server.close();
       process.removeListener('message', removeMe);
     }
   });

@mhdawson
Copy link
Member Author

Investigated this as it was leaving lingering processes and did not want to have to clean them up regularly.

There were 2 problems:

  1. test left lingering processes on failures
  2. no guarantee that the test would pass.

PR to resolve here: #8697

500 runs on AIX without a failure.

@AndreasMadsen as the original author can you take a look at the PR.

@AndreasMadsen
Copy link
Member

please see #8549


I wrote it back in 2012, and haven't worked in that field since. I really don't feel qualified to review it.

@mhdawson
Copy link
Member Author

mhdawson commented Sep 21, 2016

Missed #8549 (I must really be blind) so some discussion there as well.

Trott added a commit to Trott/io.js that referenced this issue Oct 14, 2016
`test-child-process-fork-dgram` is unreliable on some platforms,
especially FreeBSD and AIX within the project's continuous integration
testing. It has also been observed to be flaky on macos.

* Confirm child has received the server before sending packets
* Close the server instance on the parent or child after receiving a

Refs: nodejs#8697
Fixes: nodejs#8949
Fixes: nodejs#8271
@Trott Trott closed this as completed in 03afecd Oct 18, 2016
jasnell pushed a commit that referenced this issue Oct 18, 2016
`test-child-process-fork-dgram` is unreliable on some platforms,
especially FreeBSD and AIX within the project's continuous integration
testing. It has also been observed to be flaky on macos.

* Confirm child has received the server before sending packets
* Close the server instance on the parent or child after receiving a

Refs: #8697
Fixes: #8949
Fixes: #8271
PR-URL: #9098
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Trott added a commit to Trott/io.js that referenced this issue Nov 12, 2016
`test-child-process-fork-dgram` is unreliable on some platforms,
especially FreeBSD and AIX within the project's continuous integration
testing. It has also been observed to be flaky on macos.

* Confirm child has received the server before sending packets
* Close the server instance on the parent or child after receiving a

Refs: nodejs#8697
Fixes: nodejs#8949
Fixes: nodejs#8271
PR-URL: nodejs#9098
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Trott added a commit to Trott/io.js that referenced this issue Nov 12, 2016
`test-child-process-fork-dgram` is unreliable on some platforms,
especially FreeBSD and AIX within the project's continuous integration
testing. It has also been observed to be flaky on macos.

* Confirm child has received the server before sending packets
* Close the server instance on the parent or child after receiving a

Refs: nodejs#8697
Fixes: nodejs#8949
Fixes: nodejs#8271
PR-URL: nodejs#9098
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Nov 14, 2016
`test-child-process-fork-dgram` is unreliable on some platforms,
especially FreeBSD and AIX within the project's continuous integration
testing. It has also been observed to be flaky on macos.

* Confirm child has received the server before sending packets
* Close the server instance on the parent or child after receiving a

Refs: #8697
Fixes: #8949
Fixes: #8271
PR-URL: #9098
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
MylesBorins pushed a commit that referenced this issue Nov 15, 2016
`test-child-process-fork-dgram` is unreliable on some platforms,
especially FreeBSD and AIX within the project's continuous integration
testing. It has also been observed to be flaky on macos.

* Confirm child has received the server before sending packets
* Close the server instance on the parent or child after receiving a

Refs: #8697
Fixes: #8949
Fixes: #8271
PR-URL: #9098
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. dgram Issues and PRs related to the dgram subsystem / UDP. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants