-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Custom Balancer #590
base: next
Are you sure you want to change the base?
Conversation
Still need to figure out a set of things but it should be on a good shape to bring up the idea of how the balancer can look like. Let me know what you think, and I'll adjust over the week 👍 |
* [Symbol.iterator] () { | ||
yield * this.pendingItems; | ||
yield * this.readyItems; | ||
} | ||
|
||
get size () { | ||
return this.pendingItems.size + this.readyItems.size; | ||
} | ||
|
||
maybeAvailable (item : T) { | ||
/* istanbul ignore else */ | ||
if (item.currentUsage() < this.maximumUsage) { | ||
for (const listener of this.onAvailableListeners) { | ||
listener(item); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of:
- iterator
- maybeAvailable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybeAvailable
-- It is meant to hint that a worker is available, either because has been bootstrapped recently or has finished executing a taskiterator
-- to support the pool iterate over the workers, e.g. to process pending messages (on teardown)
}; | ||
export type PisicnaLoadBalancer = ( | ||
task: PiscinaTask, | ||
workers: PiscinaWorker[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the PiscinaWorker[] exposing their usage statistics? Without them, the load balancer code is not able to take any decision besides simple algorithms like round robin. And since the usage statistics are currently dispatched in different thread pool private properties, adding them to the API without refactoring will end with an quite ugly LB API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assessing wether or not to provide that detailed information; currently my goal was to provide the overall usage (max capacity, tasks running, etc.) to help the LB make decisions.
IMO it provides enough for simple algorithms like RB, but for more complex ones based on usage over time, it will require further data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronag is contacting worker_threads
pool maintainers one after the other to ask for load balancing features :)
Since I've already spent some times on them in poolifier, without per worker usage stats, designing useful LB will be impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While thinking of it: non ready workers must be removed from the array supplied to the LB to avoid code to test for it on each LB implementation. Is this.workers
safe regarding worker readyness?
Plenty of works for the months to come :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While thinking of it: non ready workers must be removed from the array supplied to the LB to avoid code to test for it on each LB implementation. Is this.workers safe regarding worker readyness?
Partially, the Pool
categorizes the workers between ready and non-ready, but when iterating it goes through all of them instead of just the ready ones.
Room for improvement for sure
Plenty of works for the months to come :)
Definitely, but plenty of good ideas and optimizations are being made here; thanks for the insightful discussion!
src/worker_pool/index.ts
Outdated
currentUsage: number; | ||
isRunningAbortableTask: boolean; | ||
[kWorkerData]: WorkerInfo; | ||
// TODO: maybe add histogram data here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a maybe, it's a requirement to implement load balancing algorithm.
|
||
return false; | ||
} | ||
case 1: { // add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the LB code that must take such a decision. It's only responsibility is to pick up a worker given custom criteria among the ones available if any. Worker creation is orthogonal to load balancing (and expected to be done before invoking the LB).
Furthermore, it will make writing custom LB more complex and error prone than it should.
Untangling the worker creation code from the LB one a prerequisite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is an interesting point; maybe I saw it in another way.
My initial idea was to help the balancer hint the pool wether to instantiate a new worker or not; but overall, if the balancer decides not to pick one, the pool can make the further decision of spawning and enqueuing.
That can make it cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're targeting a public API, I really think it should be as simple as possible, like the custom queue one.
Regarding the worker creation usage, you can add optional tunables with sensible defaults to the LB API to allow to give hints to the pool behavior, such as the direct usage of any newly created worker instead of running the LB algo with it added.
Something like:
{
useCreatedWorker: boolean,
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be definitely something interesting to explore, but maybe we can start simply and state that if the balancer returns null
, Piscina will attempt to enqueue the task or start a new worker.
Further we can explore how a balancer and scheduler can collaborate together; but maybe for simplicity just keep it as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it's a good idea to expect the worker creation decision logic to be part of the LB code. It will end by having the same code in all LBs:
Is there a worker available? -> Yes -> Run the LB algo | -> No -> (Is pool full -> Yes -> End | -> No -> Ask for worker creation)
The LB code will need to be called twice.
That logic can be factored out the LB code and makes LB implementation easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further we can explore how a balancer and scheduler can collaborate together; but maybe for simplicity just keep it as is
Another point: please version the public API since the beginning. The extra work worth it, it will evolve since it's not like a queue API, it's more like a plugin API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LB code will need to be called twice.
Hmm, this is interesting. It will require to split the handling of a new worker from when a worker finishes a task, which is currently coupled. That will need to be handled differently.
In that way, upon creation we deliver the first task in the queue; but I imagined the LB, although called twice, might have some benefits if the algorithm is based on some affinity.
The extra work worth it, it will evolve since it's not like a queue API, it's more like a plugin API.
Yeah, that's the intention. Thanks for the callout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LB code will need to be called twice.
Hmm, this is interesting. It will require to split the handling of a new worker from when a worker finishes a task, which is currently coupled. That will need to be handled differently.
I guess the code entanglement for worker creation logic and load balancing logic can be part of another PR. I would do it prior to LB API integration. If I remember correctly, in piscina, at task end (message channel response):
- lock is taken on the shared tasks queue
- LB is run to find an available worker with a minimum of executing tasks, if none available, worker is created if pool not full
- lock is released
I'm not sure about the locking, it's probably taken only if an eligible worker is found. Whatever, testing for workers availability and readyness and create one if needed logic can be factored out and run once in the current code as an optimization. And will help for that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is not really locking (but mostly a side-effect). Piscina infers that if you already have a task queue that hasn't been emptied yet, new tasks are enqueued so this ensures the tasks are solved in the same way they entered (unless a custom task queue has been set).
Nevertheless, the thread is stopped until a further task is picked.
That sounds good, think that gonna make the implementation easier and move forward.
} | ||
} | ||
|
||
return command; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The array index is usually enough. There's race condition left if the worker crash: the LB API shall include a method to handle worker removal.
The API needs also to include a method to update the LB internal that will be called at the end of each task execution. LB algos might maintain internal statistics to take decision. The method handling worker removal must then also remove its internal stats relative to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So reduce the communication semantics to just the id of the worker, so the pool and lb can react to it.
This can be interesting, an onClose
or onDestroy
can be called from the pool to hear whenever a worker is down so it can update its internal statistics.
Do you have something specific in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interesting stat is the average tasks runtime per worker. A custom LB might implement the algo to pick up the worker with the minimum.
But it will need:
- per worker histogram for runtime stats
- sensible default stats runtime at worker creation to avoid biased decision in LB algo
The median runtime is also interesting and the LB code will need to compute it since histogram does not compute it.
Using the worker_threads
worker id is better than using the array id but at a cost of doing a lookup to find the worker instance. The array index usage has also drawbacks: the code must be very careful at worker removal and async task handling, the index can stay valid while not pointing to the correct worker.
Piscina runtime calculation is also lacking some accuracy: it's not exactly the task execution time on the worker, it also includes worker implementation overhead + message channel latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point you made at the latter was one of the reasons I've had some concerns exposing histograms. The time measured is including the message channel overhead + some extra here and there; it will require some tuning to make it fully valuable for something sensitive to precision as an LB; for overall metric gathering can be enough tho.
Using the worker_threads worker id is better than using the array id but at a cost of doing a lookup to find the worker instance.
Overall a Map
might not impose too much overhead, but we can explore; we can start with that and maybe tune for performance afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time measured is including the message channel overhead + some extra here and there; it will require some tuning to make it fully valuable for something sensitive to precision as an LB; for overall metric gathering can be enough tho.
Doing accurate runtime measurement will require to change the current worker API: the measurement needs to be done on the worker side. I've not been able to come up with a better solution than extending an existing worker class.
Using the worker_threads worker id is better than using the array id but at a cost of doing a lookup to find the worker instance.
Overall a
Map
might not impose too much overhead, but we can explore; we can start with that and maybe tune for performance afterwards.
Been there. And finally I've reverted to using an array:
.reduce()
was faster than iteration onMap
to consolidate any information from the workers such as finding a min.- the number of records in the array will stay low, so
Map
will probably remain overkill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing accurate runtime measurement will require to change the current worker API: the measurement needs to be done on the worker side. I've not been able to come up with a better solution than extending an existing worker class.
That should be fine, the result comes packed with some extra data from the Worker, we can extend that schema to add that calculation.
Been there. And finally I've reverted to using an array:
Good catch.
Overall sticking to an array and leaving the LB to decide should be fine, any optimization can come afterwards
I've already the list of thing to add to this PR. I'll expedite new Will work on this over the upcoming months |
If someone is willing to open a PR tackle this work, happy to have a look 👍 |
Co-authored-by: Robert Nagy <ronagy@icloud.com>
Co-authored-by: Robert Nagy <ronagy@icloud.com>
worker
ornull
)