Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawn a new worker when an old one is stopped #62

Merged
merged 6 commits into from
Jun 13, 2023

Conversation

yurynix
Copy link
Contributor

@yurynix yurynix commented Jun 12, 2023

Hi 👋
Thank you for your work in this great library 🙏

In my scenario, I run a task in a process that contaminates global process space, I would like each task to have a separate process (workerEndurance = 1).

When worker is exhausted at the moment, no other worker replaces it if the queue is empty, that makes the worker pool to go to zero and then start spawning workers only when new tasks arrive, that's not so ideal IMO.

This PR changes the behavior in such a way that for each exited worker, if we can accommodate it, we spawn a new one immediately.

Have a great day!

await wait(200);

// then
t.is(workerNodes.workersQueue.storage.length, 0);
t.is(workerNodes.workersQueue.storage.filter(worker => worker.id === executingWorkerId).length, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion wasn't entirely accurate IMO, since

should kill the worker that was involved in processing the task

Previous assertion checks that pool is empty, while the correct assertion is that the worker in question was replaced.

@bgalek
Copy link
Member

bgalek commented Jun 12, 2023

hi @yurynix, LGTM! will be happy to release next version - but please check out failing tests ;)

@yurynix
Copy link
Contributor Author

yurynix commented Jun 13, 2023

Thank you for taking the time to review this @bgalek 🙏

The tests were passing on my machine, so I think the problem is on CI it takes more time to spawn the workers back,
so the wait(200) in the test is not the best approach for that.

I've added a utility function eventually() that will wait for a predicate, hope that's ok by you.

@yurynix
Copy link
Contributor Author

yurynix commented Jun 13, 2023

It seems there's an issue indeed with node@11 (can reproduce locally), the tests take much longer to execute than on newer node versions.
I'm trying to understand why...

@yurynix
Copy link
Contributor Author

yurynix commented Jun 13, 2023

On my machine (arm64 macOS 13.4 (22F66)) it reproduces:

➜  node-worker-nodes git:(spawn-new-worker-when-old-one-dies) ✗ node -v       
v12.22.12
➜  node-worker-nodes git:(spawn-new-worker-when-old-one-dies) ✗ time npx ava  

start profiling server worker #61861 for 200 ms
write profiler result to file - ./Profile-61861-1686653265838.cpuprofile
  58 tests passed
  2 tests skipped
npx ava  7.06s user 1.27s system 137% cpu 6.085 total

And on node@11:

➜  node-worker-nodes git:(spawn-new-worker-when-old-one-dies) ✗ nvm use 11
Now using node v11.15.0 (npm v6.7.0)
➜  node-worker-nodes git:(spawn-new-worker-when-old-one-dies) ✗ time npx ava

start profiling server worker #62136 for 200 ms
write profiler result to file - ./Profile-62136-1686653330009.cpuprofile
  58 tests passed
  2 tests skipped
npx ava  197.67s user 47.56s system 376% cpu 1:05.13 total

@bgalek
Copy link
Member

bgalek commented Jun 13, 2023

@yurynix I've updated supported versions to incldue most popular ones, rebase your branch with master ;)

@yurynix
Copy link
Contributor Author

yurynix commented Jun 13, 2023

@bgalek Sure thing 🙏 , nevertheless I want to understand what happened with node@11, there might be an actual issue it exposed.

@bgalek
Copy link
Member

bgalek commented Jun 13, 2023

@yurynix totally agree - we should get to the bottom of this to know what was the cause ;)

@yurynix
Copy link
Contributor Author

yurynix commented Jun 13, 2023

Ok, so it seems that on all versions before node v12.6.3, the code change proposed in this PR will fail.

I'm unsure which one of those actually fixed the issue:

  • [7e5e34d01e] - src: simplify node_worker.cc using new KVStore API (Denys Otrishko) #31773
  • [0adc867d59] - test: add Worker initialization failure test case (Harshitha KP) #31929
  • [32ab30cc35] - test: use common.mustCall in test-worker-esm-exit (himself65) #32544
  • [aeea7d9c1f] - worker: do not emit 'exit' events during process.exit() (Anna Henningsen) #32546
  • [28cb7e78ff] - worker: improve MessagePort performance (Anna Henningsen) #31605

This is happens because worker_thread would emit exit for some unknown to me reason before doing any initialzation,
This will cause us to be stuck in a loop of creating a failed worker that will exit before initialzation and so on.

There are 2 solutions that can work:

Solution1:

diff --git a/lib/pool.js b/lib/pool.js
index c75bbf5..575add6 100644
--- a/lib/pool.js
+++ b/lib/pool.js
@@ -120,7 +120,11 @@ class WorkerNodes extends EventEmitter {
 
         this.workersQueue.remove(worker);
 
-        if (this.canStartWorker()) this.startWorker();
+        if (!worker.isFailedToInit()) {
+            if (this.canStartWorker()) this.startWorker();
+        } else {
+            console.warn(`Worker exited before finished initialization!`);
+        }
 
         this.processQueue();
     }
diff --git a/lib/worker.js b/lib/worker.js
index 9512892..c0be0a3 100644
--- a/lib/worker.js
+++ b/lib/worker.js
@@ -20,6 +20,7 @@ class Worker extends EventEmitter {
         this.endurance = endurance;
         this.isTerminating = false;
         this.isProcessAlive = false;
+        this.failedToInit = false;
 
         const process = this.process = new WorkerProcess(srcFilePath, { stopTimeout, asyncWorkerInitialization, resourceLimits });
 
@@ -35,6 +36,7 @@ class Worker extends EventEmitter {
 
         process.once('exit', code => {
             this.exitCode = code;
+            this.failedToInit = !this.isProcessAlive;
             this.isProcessAlive = false;
             this.emit('exit', code);
         });
@@ -91,6 +93,14 @@ class Worker extends EventEmitter {
         return this.activeCalls > 0;
     }
 
+    /**
+     * Worker process exited before initialization
+     * @returns {boolean}
+     */
+    isFailedToInit() {
+        return this.failedToInit;
+    }
+
     /**
      *
      * @returns {boolean}

Solution2 (which i commited):

index c75bbf5..5d14c5e 100644
--- a/lib/pool.js
+++ b/lib/pool.js
@@ -120,7 +120,9 @@ class WorkerNodes extends EventEmitter {
 
         this.workersQueue.remove(worker);
 
-        if (this.canStartWorker()) this.startWorker();
+        setImmediate(() => {
+            if (this.canStartWorker()) this.startWorker();
+        });
 
         this.processQueue();
     }

@bgalek
Copy link
Member

bgalek commented Jun 13, 2023

nice! setImmediate() seems less disrupting

@yurynix yurynix force-pushed the spawn-new-worker-when-old-one-dies branch from 3a0a6fd to 9cc42ed Compare June 13, 2023 12:27
@bgalek
Copy link
Member

bgalek commented Jun 13, 2023

@yurynix it's still not working, I'm ok with dropping node12 support, LTS in on 18 right now

@yurynix
Copy link
Contributor Author

yurynix commented Jun 13, 2023

@bgalek it's now not working because it seems eslint-plugin-ava/ does not support node 12, the error is:

/home/runner/work/node-worker-nodes/node-worker-nodes/node_modules/eslint-plugin-ava/rules/assertion-arguments.js:214
	const options = context.options[0] ?? {};

It started to break in this commit IMO...

I'm fine with dropping node 12, do you want me to change it in this PR?

@bgalek
Copy link
Member

bgalek commented Jun 13, 2023

Yes please, you can już remove v12 from matrix :)

@bgalek
Copy link
Member

bgalek commented Jun 13, 2023

@yurynix thank you for your contribution!

@bgalek bgalek merged commit caa47b5 into allegro:master Jun 13, 2023
@bgalek
Copy link
Member

bgalek commented Jun 13, 2023

@yurynix v2.5.0 released!

@yurynix
Copy link
Contributor Author

yurynix commented Jun 14, 2023

Thank you for the help on this one @bgalek 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants