Parallelism in Bevy #2875

tigregalis · 2021-09-26T16:59:22Z

tigregalis
Sep 26, 2021

Bevy has two general types of parallelism, but there is possibly a missed opportunity to implement a third kind.

Is this third kind useful? Is it possible to implement practically and ergonomically?

Outer parallelism

Systems run in parallel. This is standard behaviour in the app schedule. This works with systems that process queries that are "disjointed".

Inner parallelism

Batches of entities are processed in parallel. This is implemented via the parallel iterators, and works with a single query (at a time).

Just in time? Greedy? For-each?

Systems are processed on entities (or batches of entities) as soon as they can be. In theory, this could have been implemented via for-each systems, but that had its own problems. Perhaps users could just write bigger systems that process these in parallel using tasks and parallel iterators, but the ergonomics aren't amazing. For example, as depicted below, you may have 3 unique "archetypes" of entities that undergo unique transformations (A, B, and C), then a fourth transformation (D) that depends on the results of the previous transformation, but could take in any of these entities, and processes these as soon as the relevant preceding transformation is completed.

alice-i-cecile · 2021-09-26T20:35:37Z

alice-i-cecile
Sep 26, 2021
Maintainer

Great diagrams!

This was briefly proposed in #2747 as well :) flecs uses a variant on this strategy for its parallelism, basically releasing entities into the next system as soon as their processing is complete.

Unfortunately, this doesn't play very nicely with our ownership model. Archetype-components are the fundamental unit of parallelism that we use (which is nice for cache-friendly processing). Right now, if these are accessed mutably, they're locked until the system is complete.

In order to bypass this, we'd need a) a way to signal that we're done processing this data b) a way to feed this information back into the scheduler before system completion.

The strategy would basically be to release each archetype-component back into the available pool as soon as that batch completes, updating the bitset of current data accesses. Then, allow systems to start as long as at least some data is available to access, only completing when all of the relevant data has been handled.

In general I think this would be possible but:

a) I have no idea how we could do so ergonomically.
b) I suspect the overhead costs would be substantially higher than any performance gains: increased parallelism is only really useful if you have idle cores. The communication back-and-forth about archetype completion in particular seems expensive. In general, for a lot of games I worry that our parallelization overhead is already higher than is worthwhile: if you're doing simple work on a dozen entities recomputing the bitset collision on data access may be more work than just running the system.

Now, if we were working in an environment where system completion times were measured in minutes and we had more opportunities for parallelism than we knew what to do with (e.g. extract-transform-load in a cloud setting), I could totally see this approach being worth pursuing.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelism in Bevy #2875

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Parallelism in Bevy #2875

tigregalis Sep 26, 2021

Outer parallelism

Inner parallelism

Just in time? Greedy? For-each?

Replies: 1 comment

alice-i-cecile Sep 26, 2021 Maintainer

tigregalis
Sep 26, 2021

alice-i-cecile
Sep 26, 2021
Maintainer