Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NonMaxUsize for non-component SparseSets #12083

Merged
merged 2 commits into from
Mar 3, 2024

Conversation

james7132
Copy link
Member

Objective

Adoption of #2104 and #11843. The Option<usize> wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows.

The goal of this PR is to reduce memory usage without significantly impacting common use cases.

Co-Authored By: @NathanSWard
Co-Authored By: @tygyh

Solution

Replace usize in SparseSet's sparse array with nonmax::NonMaxUsize. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the Option inside the sparse array, while moving the niche value from 0 to usize::MAX instead.

Checking the diff in x86 generated assembly, this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there.

Note: unlike #2104 and #11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to #9907 targeting Entity::generation instead of Entity::index, ComponentSparseSet storing only up to u32::MAX elements would become a correctness issue.

This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application.


Changelog

Changed: SparseSet now can only store up to usize::MAX - 1 elements instead of usize::MAX.
Changed: SparseSet now uses 33-50% less memory overhead per stored item.

@james7132 james7132 added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Feb 24, 2024
@james7132
Copy link
Member Author

james7132 commented Feb 24, 2024

A quick run through of the micro-benchmarks seems to show at least no consistent regression in performance:

group                                           main                                    nonmax-sparse-sets
-----                                           ----                                    ------------------
add_remove/sparse_set                           1.00   707.5±20.46µs        ? ?/sec     1.02   722.2±15.01µs        ? ?/sec
add_remove/table                                1.00  1097.2±32.92µs        ? ?/sec     1.00  1100.0±34.44µs        ? ?/sec
add_remove_big/sparse_set                       1.02   714.7±15.63µs        ? ?/sec     1.00   699.6±15.28µs        ? ?/sec
add_remove_big/table                            1.05      2.8±0.13ms        ? ?/sec     1.00      2.7±0.08ms        ? ?/sec
added_archetypes/archetype_count/100            1.00     47.0±0.99µs        ? ?/sec     1.02     48.0±0.55µs        ? ?/sec
added_archetypes/archetype_count/1000           1.06   624.4±29.27µs        ? ?/sec     1.00   587.4±11.28µs        ? ?/sec
added_archetypes/archetype_count/10000          1.17     12.6±0.87ms        ? ?/sec     1.00     10.8±0.75ms        ? ?/sec
added_archetypes/archetype_count/200            1.00     94.4±1.32µs        ? ?/sec     1.02     96.3±1.40µs        ? ?/sec
added_archetypes/archetype_count/2000           1.05  1377.8±70.38µs        ? ?/sec     1.00  1316.3±38.17µs        ? ?/sec
added_archetypes/archetype_count/500            1.01    267.7±5.21µs        ? ?/sec     1.00    266.0±7.65µs        ? ?/sec
added_archetypes/archetype_count/5000           1.24      4.8±0.58ms        ? ?/sec     1.00      3.9±0.23ms        ? ?/sec
build_schedule/1000_schedule                    1.04       3.8±0.06s        ? ?/sec     1.00       3.7±0.04s        ? ?/sec
build_schedule/1000_schedule_noconstraints      1.02     30.8±0.59ms        ? ?/sec     1.00     30.3±0.69ms        ? ?/sec
build_schedule/100_schedule                     1.01     16.1±0.22ms        ? ?/sec     1.00     16.0±0.44ms        ? ?/sec
build_schedule/100_schedule_noconstraints       1.00   558.8±11.74µs        ? ?/sec     1.00   557.8±13.50µs        ? ?/sec
build_schedule/500_schedule                     1.04    647.0±9.90ms        ? ?/sec     1.00   622.7±12.21ms        ? ?/sec
build_schedule/500_schedule_noconstraints       1.00      8.8±0.16ms        ? ?/sec     1.01      8.9±0.18ms        ? ?/sec
busy_systems/01x_entities_03_systems            1.05     33.4±1.16µs        ? ?/sec     1.00     31.9±0.96µs        ? ?/sec
busy_systems/01x_entities_06_systems            1.19     71.9±8.28µs        ? ?/sec     1.00     60.5±1.32µs        ? ?/sec
busy_systems/01x_entities_09_systems            1.17    91.2±10.68µs        ? ?/sec     1.00     78.1±2.25µs        ? ?/sec
busy_systems/01x_entities_12_systems            1.14   109.9±10.24µs        ? ?/sec     1.00     96.7±2.47µs        ? ?/sec
busy_systems/01x_entities_15_systems            1.10    126.2±7.11µs        ? ?/sec     1.00    114.7±1.98µs        ? ?/sec
busy_systems/02x_entities_03_systems            1.06     50.1±2.70µs        ? ?/sec     1.00     47.2±1.51µs        ? ?/sec
busy_systems/02x_entities_06_systems            1.03     94.4±4.19µs        ? ?/sec     1.00     91.9±1.96µs        ? ?/sec
busy_systems/02x_entities_09_systems            1.12   143.9±13.60µs        ? ?/sec     1.00    128.7±5.34µs        ? ?/sec
busy_systems/02x_entities_12_systems            1.06    168.5±9.57µs        ? ?/sec     1.00    159.5±2.18µs        ? ?/sec
busy_systems/02x_entities_15_systems            1.02   202.9±11.67µs        ? ?/sec     1.00    199.8±9.75µs        ? ?/sec
busy_systems/03x_entities_03_systems            1.05     68.3±5.33µs        ? ?/sec     1.00     65.3±3.07µs        ? ?/sec
busy_systems/03x_entities_06_systems            1.07   137.6±12.33µs        ? ?/sec     1.00    128.1±2.31µs        ? ?/sec
busy_systems/03x_entities_09_systems            1.01    179.1±5.25µs        ? ?/sec     1.00    178.2±8.73µs        ? ?/sec
busy_systems/03x_entities_12_systems            1.04   231.3±11.26µs        ? ?/sec     1.00    222.3±3.97µs        ? ?/sec
busy_systems/03x_entities_15_systems            1.05   279.2±16.99µs        ? ?/sec     1.00    266.6±5.15µs        ? ?/sec
busy_systems/04x_entities_03_systems            1.10    86.5±11.25µs        ? ?/sec     1.00     78.9±2.55µs        ? ?/sec
busy_systems/04x_entities_06_systems            1.02    161.0±7.75µs        ? ?/sec     1.00    158.0±7.32µs        ? ?/sec
busy_systems/04x_entities_09_systems            1.03   228.2±14.73µs        ? ?/sec     1.00    222.2±9.10µs        ? ?/sec
busy_systems/04x_entities_12_systems            1.00   282.5±10.31µs        ? ?/sec     1.03   290.0±17.79µs        ? ?/sec
busy_systems/04x_entities_15_systems            1.00   347.2±18.50µs        ? ?/sec     1.07   371.9±27.41µs        ? ?/sec
busy_systems/05x_entities_03_systems            1.00     93.9±3.45µs        ? ?/sec     1.07    100.1±4.35µs        ? ?/sec
busy_systems/05x_entities_06_systems            1.00    182.7±3.60µs        ? ?/sec     1.06    192.9±9.28µs        ? ?/sec
busy_systems/05x_entities_09_systems            1.00   261.7±12.15µs        ? ?/sec     1.05   275.4±11.14µs        ? ?/sec
busy_systems/05x_entities_12_systems            1.00   335.4±22.35µs        ? ?/sec     1.08   362.5±21.86µs        ? ?/sec
busy_systems/05x_entities_15_systems            1.00   415.4±18.57µs        ? ?/sec     1.06   438.3±16.12µs        ? ?/sec
contrived/01x_entities_03_systems               1.00     34.1±1.38µs        ? ?/sec     1.03     35.0±1.83µs        ? ?/sec
contrived/01x_entities_06_systems               1.00     47.2±1.47µs        ? ?/sec     1.03     48.8±1.68µs        ? ?/sec
contrived/01x_entities_09_systems               1.00     62.0±3.77µs        ? ?/sec     1.03     63.8±3.20µs        ? ?/sec
contrived/01x_entities_12_systems               1.00     75.9±4.27µs        ? ?/sec     1.02     77.1±2.10µs        ? ?/sec
contrived/01x_entities_15_systems               1.00     87.3±5.29µs        ? ?/sec     1.05     91.3±2.75µs        ? ?/sec
contrived/02x_entities_03_systems               1.00     36.9±0.95µs        ? ?/sec     1.00     37.0±1.52µs        ? ?/sec
contrived/02x_entities_06_systems               1.00     62.2±1.71µs        ? ?/sec     1.01     63.1±2.21µs        ? ?/sec
contrived/02x_entities_09_systems               1.03     90.5±9.98µs        ? ?/sec     1.00     87.4±4.51µs        ? ?/sec
contrived/02x_entities_12_systems               1.00    108.1±7.10µs        ? ?/sec     1.00    107.9±7.43µs        ? ?/sec
contrived/02x_entities_15_systems               1.00    124.5±2.83µs        ? ?/sec     1.03    128.5±4.53µs        ? ?/sec
contrived/03x_entities_03_systems               1.00     44.1±4.85µs        ? ?/sec     1.02     45.0±1.68µs        ? ?/sec
contrived/03x_entities_06_systems               1.04     76.1±5.36µs        ? ?/sec     1.00     73.0±2.63µs        ? ?/sec
contrived/03x_entities_09_systems               1.00    101.5±5.97µs        ? ?/sec     1.06    107.4±6.70µs        ? ?/sec
contrived/03x_entities_12_systems               1.00    130.8±6.77µs        ? ?/sec     1.09    142.1±8.92µs        ? ?/sec
contrived/03x_entities_15_systems               1.00    155.3±3.72µs        ? ?/sec     1.10    170.9±9.11µs        ? ?/sec
contrived/04x_entities_03_systems               1.00     53.3±3.58µs        ? ?/sec     1.12     59.7±3.73µs        ? ?/sec
contrived/04x_entities_06_systems               1.00     89.5±7.25µs        ? ?/sec     1.02     91.0±5.74µs        ? ?/sec
contrived/04x_entities_09_systems               1.00   123.8±11.26µs        ? ?/sec     1.00    123.2±3.74µs        ? ?/sec
contrived/04x_entities_12_systems               1.01    158.0±7.43µs        ? ?/sec     1.00    156.6±4.05µs        ? ?/sec
contrived/04x_entities_15_systems               1.00    188.6±8.03µs        ? ?/sec     1.02   191.9±10.54µs        ? ?/sec
contrived/05x_entities_03_systems               1.00     58.7±1.37µs        ? ?/sec     1.00     58.8±1.74µs        ? ?/sec
contrived/05x_entities_06_systems               1.00     99.3±3.02µs        ? ?/sec     1.03    102.1±3.98µs        ? ?/sec
contrived/05x_entities_09_systems               1.00    137.5±4.32µs        ? ?/sec     1.02    140.4±5.33µs        ? ?/sec
contrived/05x_entities_12_systems               1.00    173.7±2.49µs        ? ?/sec     1.05   183.2±14.86µs        ? ?/sec
contrived/05x_entities_15_systems               1.00   215.9±13.00µs        ? ?/sec     1.00    216.9±8.52µs        ? ?/sec
empty_commands/0_entities                       1.06      4.3±0.03ns        ? ?/sec     1.00      4.1±0.21ns        ? ?/sec
empty_systems/000_systems                       1.00      6.5±0.22ns        ? ?/sec     1.00      6.5±0.06ns        ? ?/sec
empty_systems/001_systems                       1.07      3.3±0.23µs        ? ?/sec     1.00      3.1±0.10µs        ? ?/sec
empty_systems/002_systems                       1.10      5.3±0.63µs        ? ?/sec     1.00      4.8±0.18µs        ? ?/sec
empty_systems/003_systems                       1.07      6.6±0.51µs        ? ?/sec     1.00      6.2±0.23µs        ? ?/sec
empty_systems/004_systems                       1.05      6.2±0.46µs        ? ?/sec     1.00      5.9±0.19µs        ? ?/sec
empty_systems/005_systems                       1.08      6.9±0.53µs        ? ?/sec     1.00      6.4±0.36µs        ? ?/sec
empty_systems/010_systems                       1.17      9.7±0.96µs        ? ?/sec     1.00      8.3±1.30µs        ? ?/sec
empty_systems/015_systems                       1.22     13.7±1.51µs        ? ?/sec     1.00     11.3±0.59µs        ? ?/sec
empty_systems/020_systems                       1.29     19.6±2.68µs        ? ?/sec     1.00     15.2±1.45µs        ? ?/sec
empty_systems/025_systems                       1.22     25.3±3.12µs        ? ?/sec     1.00     20.7±1.44µs        ? ?/sec
empty_systems/030_systems                       1.05     26.8±2.02µs        ? ?/sec     1.00     25.6±1.56µs        ? ?/sec
empty_systems/035_systems                       1.14     33.2±3.74µs        ? ?/sec     1.00     29.1±2.26µs        ? ?/sec
empty_systems/040_systems                       1.07     34.8±2.46µs        ? ?/sec     1.00     32.7±1.67µs        ? ?/sec
empty_systems/045_systems                       1.03     39.8±2.50µs        ? ?/sec     1.00     38.7±3.65µs        ? ?/sec
empty_systems/050_systems                       1.02     43.0±2.52µs        ? ?/sec     1.00     42.3±2.77µs        ? ?/sec
empty_systems/055_systems                       1.02     46.3±2.27µs        ? ?/sec     1.00     45.4±2.43µs        ? ?/sec
empty_systems/060_systems                       1.07     51.4±2.46µs        ? ?/sec     1.00     47.8±2.52µs        ? ?/sec
empty_systems/065_systems                       1.06     56.1±3.63µs        ? ?/sec     1.00     53.1±2.11µs        ? ?/sec
empty_systems/070_systems                       1.06     61.9±4.66µs        ? ?/sec     1.00     58.2±2.75µs        ? ?/sec
empty_systems/075_systems                       1.01     66.7±3.65µs        ? ?/sec     1.00     66.1±6.45µs        ? ?/sec
empty_systems/080_systems                       1.00     72.3±4.00µs        ? ?/sec     1.02     73.9±3.87µs        ? ?/sec
empty_systems/085_systems                       1.00     78.1±3.77µs        ? ?/sec     1.01     79.1±6.32µs        ? ?/sec
empty_systems/090_systems                       1.20     89.7±7.22µs        ? ?/sec     1.00     74.8±4.70µs        ? ?/sec
empty_systems/095_systems                       1.07     88.0±5.44µs        ? ?/sec     1.00     82.2±4.22µs        ? ?/sec
empty_systems/100_systems                       1.14     95.1±8.69µs        ? ?/sec     1.00     83.5±5.32µs        ? ?/sec
fake_commands/2000_commands                     1.00      7.1±0.02µs        ? ?/sec     1.04      7.4±0.07µs        ? ?/sec
fake_commands/4000_commands                     1.00     14.3±0.14µs        ? ?/sec     1.04     14.8±0.09µs        ? ?/sec
fake_commands/6000_commands                     1.00     21.4±0.16µs        ? ?/sec     1.04     22.2±0.06µs        ? ?/sec
fake_commands/8000_commands                     1.00     28.5±0.27µs        ? ?/sec     1.04     29.6±0.12µs        ? ?/sec
get_or_spawn/batched                            1.01   269.8±14.69µs        ? ?/sec     1.00   266.0±11.74µs        ? ?/sec
get_or_spawn/individual                         1.01   424.2±62.87µs        ? ?/sec     1.00   420.3±60.11µs        ? ?/sec
heavy_compute/base                              1.03    223.3±6.59µs        ? ?/sec     1.00    216.6±2.32µs        ? ?/sec
insert_commands/insert                          1.00   326.8±30.18µs        ? ?/sec     1.00   325.7±27.59µs        ? ?/sec
insert_commands/insert_batch                    1.00   271.4±16.29µs        ? ?/sec     1.03   278.8±36.80µs        ? ?/sec
insert_simple/base                              1.05   397.8±16.94µs        ? ?/sec     1.00    377.1±7.09µs        ? ?/sec
insert_simple/unbatched                         1.12  901.7±124.13µs        ? ?/sec     1.00   806.5±33.89µs        ? ?/sec
iter_fragmented/base                            1.02    484.3±8.28ns        ? ?/sec     1.00    475.8±2.27ns        ? ?/sec
iter_fragmented/foreach                         1.04   182.8±22.04ns        ? ?/sec     1.00   175.0±21.53ns        ? ?/sec
iter_fragmented/foreach_wide                    1.00      3.7±0.07µs        ? ?/sec     1.00      3.8±0.08µs        ? ?/sec
iter_fragmented/wide                            1.03      4.0±0.13µs        ? ?/sec     1.00      3.9±0.09µs        ? ?/sec
iter_fragmented_sparse/base                     1.00      9.6±0.17ns        ? ?/sec     1.07     10.2±0.18ns        ? ?/sec
iter_fragmented_sparse/foreach                  1.00      8.2±0.71ns        ? ?/sec     1.00      8.1±0.13ns        ? ?/sec
iter_fragmented_sparse/foreach_wide             1.02     43.3±5.51ns        ? ?/sec     1.00     42.4±2.85ns        ? ?/sec
iter_fragmented_sparse/wide                     1.00     40.8±1.18ns        ? ?/sec     1.07     43.5±1.08ns        ? ?/sec
iter_simple/base                                1.00      8.4±0.10µs        ? ?/sec     1.00      8.3±0.07µs        ? ?/sec
iter_simple/foreach                             1.00      8.2±0.21µs        ? ?/sec     1.00      8.2±0.15µs        ? ?/sec
iter_simple/foreach_sparse_set                  1.00     25.7±0.41µs        ? ?/sec     1.00     25.7±0.32µs        ? ?/sec
iter_simple/foreach_wide                        1.00     36.0±0.72µs        ? ?/sec     1.05     37.9±0.26µs        ? ?/sec
iter_simple/foreach_wide_sparse_set             1.00    109.8±0.64µs        ? ?/sec     1.13    123.7±0.84µs        ? ?/sec
iter_simple/sparse_set                          1.08     32.8±0.45µs        ? ?/sec     1.00     30.4±0.18µs        ? ?/sec
iter_simple/system                              1.01      8.4±0.19µs        ? ?/sec     1.00      8.4±0.05µs        ? ?/sec
iter_simple/wide                                1.00     37.9±0.45µs        ? ?/sec     1.00     37.8±0.23µs        ? ?/sec
iter_simple/wide_sparse_set                     1.00    120.4±2.14µs        ? ?/sec     1.04    125.0±1.54µs        ? ?/sec
no_archetypes/system_count/0                    1.00      8.4±0.09ns        ? ?/sec     1.03      8.7±0.03ns        ? ?/sec
no_archetypes/system_count/100                  1.04  1057.3±30.52ns        ? ?/sec     1.00   1021.4±3.32ns        ? ?/sec
no_archetypes/system_count/20                   1.00    223.3±3.52ns        ? ?/sec     1.00    222.9±9.86ns        ? ?/sec
no_archetypes/system_count/40                   1.02   428.0±11.51ns        ? ?/sec     1.00    419.6±0.91ns        ? ?/sec
no_archetypes/system_count/60                   1.01    629.4±2.78ns        ? ?/sec     1.00    620.7±3.86ns        ? ?/sec
no_archetypes/system_count/80                   1.02    837.3±7.60ns        ? ?/sec     1.00    817.6±4.03ns        ? ?/sec
query_get/50000_entities_sparse                 1.04    317.1±4.86µs        ? ?/sec     1.00   305.8±12.17µs        ? ?/sec
query_get/50000_entities_table                  1.00    282.4±5.45µs        ? ?/sec     1.00    281.0±3.69µs        ? ?/sec
query_get_many_10/50000_calls_sparse            1.11      5.1±0.58ms        ? ?/sec     1.00      4.6±0.37ms        ? ?/sec
query_get_many_10/50000_calls_table             1.00      3.7±0.15ms        ? ?/sec     1.00      3.7±0.14ms        ? ?/sec
query_get_many_2/50000_calls_sparse             1.05  599.8±116.58µs        ? ?/sec     1.00   572.2±83.08µs        ? ?/sec
query_get_many_2/50000_calls_table              1.00   583.0±57.24µs        ? ?/sec     1.00   582.3±91.52µs        ? ?/sec
query_get_many_5/50000_calls_sparse             1.04      2.6±0.22ms        ? ?/sec     1.00      2.5±0.30ms        ? ?/sec
query_get_many_5/50000_calls_table              1.03      2.1±0.25ms        ? ?/sec     1.00      2.0±0.24ms        ? ?/sec
run_condition/no/001_systems                    1.00    134.9±0.75ns        ? ?/sec     1.08    145.5±1.75ns        ? ?/sec
run_condition/no/006_systems                    1.00    300.9±3.95ns        ? ?/sec     1.00    300.8±2.00ns        ? ?/sec
run_condition/no/011_systems                    1.00    454.2±1.11ns        ? ?/sec     1.00    454.5±2.07ns        ? ?/sec
run_condition/no/016_systems                    1.02    613.2±3.28ns        ? ?/sec     1.00    601.7±5.49ns        ? ?/sec
run_condition/no/021_systems                    1.02    771.6±5.64ns        ? ?/sec     1.00    754.6±2.44ns        ? ?/sec
run_condition/no/026_systems                    1.05   958.0±63.31ns        ? ?/sec     1.00    914.3±7.66ns        ? ?/sec
run_condition/no/031_systems                    1.00  1068.7±13.92ns        ? ?/sec     1.01  1074.4±30.53ns        ? ?/sec
run_condition/no/036_systems                    1.00  1272.4±111.20ns        ? ?/sec    1.00  1267.9±16.42ns        ? ?/sec
run_condition/no/041_systems                    1.10  1568.9±46.05ns        ? ?/sec     1.00   1420.5±6.60ns        ? ?/sec
run_condition/no/046_systems                    1.00  1591.7±47.39ns        ? ?/sec     1.00   1584.1±2.74ns        ? ?/sec
run_condition/no/051_systems                    1.01  1764.8±93.11ns        ? ?/sec     1.00  1751.6±10.92ns        ? ?/sec
run_condition/no/056_systems                    1.00  1894.5±29.01ns        ? ?/sec     1.00   1896.8±4.60ns        ? ?/sec
run_condition/no/061_systems                    1.00      2.1±0.03µs        ? ?/sec     1.00      2.1±0.06µs        ? ?/sec
run_condition/no/066_systems                    1.01      2.3±0.02µs        ? ?/sec     1.00      2.3±0.01µs        ? ?/sec
run_condition/no/071_systems                    1.00      2.4±0.05µs        ? ?/sec     1.00      2.4±0.03µs        ? ?/sec
run_condition/no/076_systems                    1.02      2.6±0.04µs        ? ?/sec     1.00      2.6±0.00µs        ? ?/sec
run_condition/no/081_systems                    1.00      2.8±0.06µs        ? ?/sec     1.00      2.8±0.02µs        ? ?/sec
run_condition/no/086_systems                    1.01      2.9±0.07µs        ? ?/sec     1.00      2.9±0.06µs        ? ?/sec
run_condition/no/091_systems                    1.00      3.0±0.04µs        ? ?/sec     1.01      3.1±0.01µs        ? ?/sec
run_condition/no/096_systems                    1.00      3.2±0.06µs        ? ?/sec     1.00      3.2±0.02µs        ? ?/sec
run_condition/no/101_systems                    1.00      3.4±0.01µs        ? ?/sec     1.00      3.4±0.01µs        ? ?/sec
run_condition/yes/001_systems                   1.03      3.2±0.26µs        ? ?/sec     1.00      3.1±0.15µs        ? ?/sec
run_condition/yes/006_systems                   1.00      6.8±0.39µs        ? ?/sec     1.00      6.8±0.29µs        ? ?/sec
run_condition/yes/011_systems                   1.03      8.9±0.60µs        ? ?/sec     1.00      8.6±0.48µs        ? ?/sec
run_condition/yes/016_systems                   1.04     12.4±0.86µs        ? ?/sec     1.00     11.9±0.49µs        ? ?/sec
run_condition/yes/021_systems                   1.03     15.9±1.15µs        ? ?/sec     1.00     15.5±0.75µs        ? ?/sec
run_condition/yes/026_systems                   1.01     19.8±1.25µs        ? ?/sec     1.00     19.6±1.78µs        ? ?/sec
run_condition/yes/031_systems                   1.09     24.5±1.72µs        ? ?/sec     1.00     22.5±1.89µs        ? ?/sec
run_condition/yes/036_systems                   1.10     28.4±2.48µs        ? ?/sec     1.00     25.9±1.70µs        ? ?/sec
run_condition/yes/041_systems                   1.11     32.4±2.83µs        ? ?/sec     1.00     29.1±1.79µs        ? ?/sec
run_condition/yes/046_systems                   1.04     34.6±2.36µs        ? ?/sec     1.00     33.3±1.74µs        ? ?/sec
run_condition/yes/051_systems                   1.07     38.4±2.45µs        ? ?/sec     1.00     35.9±2.65µs        ? ?/sec
run_condition/yes/056_systems                   1.04     42.2±2.93µs        ? ?/sec     1.00     40.5±2.06µs        ? ?/sec
run_condition/yes/061_systems                   1.07     47.0±3.01µs        ? ?/sec     1.00     43.8±2.71µs        ? ?/sec
run_condition/yes/066_systems                   1.02     53.2±5.44µs        ? ?/sec     1.00     52.2±7.02µs        ? ?/sec
run_condition/yes/071_systems                   1.17     61.5±5.40µs        ? ?/sec     1.00     52.6±2.78µs        ? ?/sec
run_condition/yes/076_systems                   1.02     61.7±6.68µs        ? ?/sec     1.00     60.6±6.41µs        ? ?/sec
run_condition/yes/081_systems                   1.06     65.3±4.08µs        ? ?/sec     1.00     61.5±3.19µs        ? ?/sec
run_condition/yes/086_systems                   1.07     72.0±5.71µs        ? ?/sec     1.00     67.3±3.51µs        ? ?/sec
run_condition/yes/091_systems                   1.07     78.6±5.02µs        ? ?/sec     1.00     73.4±3.48µs        ? ?/sec
run_condition/yes/096_systems                   1.00     80.5±3.71µs        ? ?/sec     1.05    84.7±11.80µs        ? ?/sec
run_condition/yes/101_systems                   1.03     86.3±5.80µs        ? ?/sec     1.00     83.5±3.96µs        ? ?/sec
run_condition/yes_using_query/001_systems       1.24      3.9±1.63µs        ? ?/sec     1.00      3.2±0.34µs        ? ?/sec
run_condition/yes_using_query/006_systems       1.07      7.4±1.12µs        ? ?/sec     1.00      6.9±0.69µs        ? ?/sec
run_condition/yes_using_query/011_systems       1.03      9.3±1.63µs        ? ?/sec     1.00      9.1±0.48µs        ? ?/sec
run_condition/yes_using_query/016_systems       1.10     13.9±2.67µs        ? ?/sec     1.00     12.6±1.29µs        ? ?/sec
run_condition/yes_using_query/021_systems       1.26     19.8±3.91µs        ? ?/sec     1.00     15.8±0.68µs        ? ?/sec
run_condition/yes_using_query/026_systems       1.12     21.5±3.35µs        ? ?/sec     1.00     19.2±2.21µs        ? ?/sec
run_condition/yes_using_query/031_systems       1.20     27.1±4.70µs        ? ?/sec     1.00     22.5±1.21µs        ? ?/sec
run_condition/yes_using_query/036_systems       1.19     31.0±4.80µs        ? ?/sec     1.00     26.1±2.07µs        ? ?/sec
run_condition/yes_using_query/041_systems       1.12     33.5±3.45µs        ? ?/sec     1.00     30.0±1.77µs        ? ?/sec
run_condition/yes_using_query/046_systems       1.23     41.2±2.89µs        ? ?/sec     1.00     33.5±1.27µs        ? ?/sec
run_condition/yes_using_query/051_systems       1.27     48.4±3.81µs        ? ?/sec     1.00     38.1±2.83µs        ? ?/sec
run_condition/yes_using_query/056_systems       1.03     43.8±3.50µs        ? ?/sec     1.00     42.6±1.94µs        ? ?/sec
run_condition/yes_using_query/061_systems       1.01     50.2±4.34µs        ? ?/sec     1.00     49.7±6.22µs        ? ?/sec
run_condition/yes_using_query/066_systems       1.10     55.9±5.09µs        ? ?/sec     1.00     50.7±2.77µs        ? ?/sec
run_condition/yes_using_query/071_systems       1.23     66.5±7.08µs        ? ?/sec     1.00     54.3±3.34µs        ? ?/sec
run_condition/yes_using_query/076_systems       1.22     70.4±7.98µs        ? ?/sec     1.00     58.0±3.51µs        ? ?/sec
run_condition/yes_using_query/081_systems       1.09     71.8±7.71µs        ? ?/sec     1.00     66.1±4.00µs        ? ?/sec
run_condition/yes_using_query/086_systems       1.16     84.8±9.08µs        ? ?/sec     1.00     73.3±4.73µs        ? ?/sec
run_condition/yes_using_query/091_systems       1.09     84.1±7.24µs        ? ?/sec     1.00     77.3±4.70µs        ? ?/sec
run_condition/yes_using_query/096_systems       1.14    92.8±10.56µs        ? ?/sec     1.00     81.7±3.56µs        ? ?/sec
run_condition/yes_using_query/101_systems       1.11     94.6±8.02µs        ? ?/sec     1.00     85.0±3.97µs        ? ?/sec
run_condition/yes_using_resource/001_systems    1.08      3.4±0.41µs        ? ?/sec     1.00      3.2±0.30µs        ? ?/sec
run_condition/yes_using_resource/006_systems    1.17      7.9±0.77µs        ? ?/sec     1.00      6.8±0.29µs        ? ?/sec
run_condition/yes_using_resource/011_systems    1.27     11.4±1.94µs        ? ?/sec     1.00      9.0±0.41µs        ? ?/sec
run_condition/yes_using_resource/016_systems    1.29     15.3±2.14µs        ? ?/sec     1.00     11.9±0.49µs        ? ?/sec
run_condition/yes_using_resource/021_systems    1.25     19.7±2.42µs        ? ?/sec     1.00     15.8±1.21µs        ? ?/sec
run_condition/yes_using_resource/026_systems    1.02     19.8±1.79µs        ? ?/sec     1.00     19.5±3.05µs        ? ?/sec
run_condition/yes_using_resource/031_systems    1.25     27.8±4.13µs        ? ?/sec     1.00     22.2±1.57µs        ? ?/sec
run_condition/yes_using_resource/036_systems    1.08     29.7±3.73µs        ? ?/sec     1.00     27.7±4.27µs        ? ?/sec
run_condition/yes_using_resource/041_systems    1.00     31.7±2.89µs        ? ?/sec     1.00     31.6±3.35µs        ? ?/sec
run_condition/yes_using_resource/046_systems    1.00     36.1±3.38µs        ? ?/sec     1.04     37.5±3.09µs        ? ?/sec
run_condition/yes_using_resource/051_systems    1.07     43.7±4.83µs        ? ?/sec     1.00     41.1±3.00µs        ? ?/sec
run_condition/yes_using_resource/056_systems    1.22     49.2±5.61µs        ? ?/sec     1.00     40.3±2.97µs        ? ?/sec
run_condition/yes_using_resource/061_systems    1.10     52.1±5.87µs        ? ?/sec     1.00     47.4±5.14µs        ? ?/sec
run_condition/yes_using_resource/066_systems    1.15     57.4±5.69µs        ? ?/sec     1.00     50.1±2.97µs        ? ?/sec
run_condition/yes_using_resource/071_systems    1.17     61.5±7.40µs        ? ?/sec     1.00     52.5±4.04µs        ? ?/sec
run_condition/yes_using_resource/076_systems    1.07     61.5±3.85µs        ? ?/sec     1.00     57.3±3.41µs        ? ?/sec
run_condition/yes_using_resource/081_systems    1.14     69.9±7.37µs        ? ?/sec     1.00     61.4±3.57µs        ? ?/sec
run_condition/yes_using_resource/086_systems    1.13     77.1±6.44µs        ? ?/sec     1.00     68.0±2.82µs        ? ?/sec
run_condition/yes_using_resource/091_systems    1.08     81.1±9.73µs        ? ?/sec     1.00     75.0±3.63µs        ? ?/sec
run_condition/yes_using_resource/096_systems    1.09     88.4±6.37µs        ? ?/sec     1.00     80.8±3.71µs        ? ?/sec
run_condition/yes_using_resource/101_systems    1.05     87.6±5.14µs        ? ?/sec     1.00     83.7±3.46µs        ? ?/sec
run_empty_schedule/MultiThreaded                1.00      7.3±0.05ns        ? ?/sec     1.04      7.6±0.06ns        ? ?/sec
run_empty_schedule/Simple                       1.00      8.2±0.12ns        ? ?/sec     1.03      8.4±0.01ns        ? ?/sec
run_empty_schedule/SingleThreaded               1.00      9.2±0.09ns        ? ?/sec     1.06      9.8±0.03ns        ? ?/sec
schedule/base                                   1.00     33.3±1.20µs        ? ?/sec     1.00     33.3±1.07µs        ? ?/sec
sized_commands_0_bytes/2000_commands            1.01      3.9±0.30µs        ? ?/sec     1.00      3.8±0.04µs        ? ?/sec
sized_commands_0_bytes/4000_commands            1.00      7.6±0.06µs        ? ?/sec     1.00      7.6±0.02µs        ? ?/sec
sized_commands_0_bytes/6000_commands            1.00     11.5±0.14µs        ? ?/sec     1.02     11.7±0.07µs        ? ?/sec
sized_commands_0_bytes/8000_commands            1.00     15.4±0.07µs        ? ?/sec     1.01     15.6±0.24µs        ? ?/sec
sized_commands_12_bytes/2000_commands           1.00      4.7±0.05µs        ? ?/sec     1.04      4.9±0.09µs        ? ?/sec
sized_commands_12_bytes/4000_commands           1.00      9.6±0.03µs        ? ?/sec     1.04      9.9±0.05µs        ? ?/sec
sized_commands_12_bytes/6000_commands           1.00     14.3±0.09µs        ? ?/sec     1.03     14.8±0.04µs        ? ?/sec
sized_commands_12_bytes/8000_commands           1.00     19.1±0.11µs        ? ?/sec     1.03     19.7±0.12µs        ? ?/sec
sized_commands_512_bytes/2000_commands          1.00     55.9±1.88µs        ? ?/sec     1.02     56.7±2.19µs        ? ?/sec
sized_commands_512_bytes/4000_commands          1.00   114.8±10.15µs        ? ?/sec     1.02   116.6±15.54µs        ? ?/sec
sized_commands_512_bytes/6000_commands          1.00   177.9±27.12µs        ? ?/sec     1.01   179.8±27.65µs        ? ?/sec
sized_commands_512_bytes/8000_commands          1.00   238.6±43.06µs        ? ?/sec     1.01   240.6±40.02µs        ? ?/sec
spawn_commands/2000_entities                    1.04    178.2±6.17µs        ? ?/sec     1.00    171.0±4.74µs        ? ?/sec
spawn_commands/4000_entities                    1.05   367.0±14.18µs        ? ?/sec     1.00   349.0±13.97µs        ? ?/sec
spawn_commands/6000_entities                    1.04   549.6±23.70µs        ? ?/sec     1.00   529.0±23.34µs        ? ?/sec
spawn_commands/8000_entities                    1.05   744.1±26.86µs        ? ?/sec     1.00   706.7±27.67µs        ? ?/sec
spawn_world/10000_entities                      1.00   901.1±68.32µs        ? ?/sec     1.00   901.7±82.90µs        ? ?/sec
spawn_world/1000_entities                       1.01     89.9±7.67µs        ? ?/sec     1.00     89.4±7.98µs        ? ?/sec
spawn_world/100_entities                        1.01      9.1±0.85µs        ? ?/sec     1.00      9.0±0.78µs        ? ?/sec
spawn_world/10_entities                         1.00   909.9±82.62ns        ? ?/sec     1.00   910.2±86.39ns        ? ?/sec
spawn_world/1_entities                          1.03     93.3±7.97ns        ? ?/sec     1.00     90.8±7.58ns        ? ?/sec
world_entity/50000_entities                     1.00    124.0±0.34µs        ? ?/sec     1.00    124.0±0.79µs        ? ?/sec
world_get/50000_entities_sparse                 1.00    194.1±2.38µs        ? ?/sec     1.00    193.2±2.90µs        ? ?/sec
world_get/50000_entities_table                  1.00    144.4±0.96µs        ? ?/sec     1.01    146.1±3.60µs        ? ?/sec
world_query_for_each/50000_entities_sparse      1.01     51.6±1.92µs        ? ?/sec     1.00     51.3±0.11µs        ? ?/sec
world_query_for_each/50000_entities_table       1.00     27.1±0.19µs        ? ?/sec     1.00     27.1±0.05µs        ? ?/sec
world_query_get/50000_entities_sparse           1.00     81.7±0.59µs        ? ?/sec     1.00     81.9±0.97µs        ? ?/sec
world_query_get/50000_entities_sparse_wide      1.05    273.3±4.97µs        ? ?/sec     1.00    259.1±1.06µs        ? ?/sec
world_query_get/50000_entities_table            1.00     81.9±1.43µs        ? ?/sec     1.00     81.7±0.89µs        ? ?/sec
world_query_get/50000_entities_table_wide       1.00    179.3±4.56µs        ? ?/sec     1.00    179.0±5.14µs        ? ?/sec
world_query_iter/50000_entities_sparse          1.10     68.2±0.56µs        ? ?/sec     1.00     62.1±0.62µs        ? ?/sec
world_query_iter/50000_entities_table           1.00     40.7±0.42µs        ? ?/sec     1.00     40.8±0.76µs        ? ?/sec

@MiniaczQ
Copy link
Contributor

Looks good, but I can't provide any feedback about the generated instructions

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Mar 2, 2024
@alice-i-cecile
Copy link
Member

alice-i-cecile commented Mar 2, 2024

I think this is a good set of trade-offs, and the added crate dep seems sensible.

Copy link
Contributor

@jdm jdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, ad I appreciate the thorough explanation of why this matters!

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Mar 3, 2024
Merged via the queue into bevyengine:main with commit 57733bb Mar 3, 2024
27 of 28 checks passed
spectria-limina pushed a commit to spectria-limina/bevy that referenced this pull request Mar 9, 2024
# Objective
Adoption of bevyengine#2104 and bevyengine#11843. The `Option<usize>` wastes 3-7 bytes of
memory per potential entry, and represents a scaling memory overhead as
the ID space grows.

The goal of this PR is to reduce memory usage without significantly
impacting common use cases.

Co-Authored By: @NathanSWard 
Co-Authored By: @tygyh 

## Solution
Replace `usize` in `SparseSet`'s sparse array with
`nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a
bitwise NOT to the value when accessing it. This allows the compiler to
niche the value and eliminate the extra padding used for the `Option`
inside the sparse array, while moving the niche value from 0 to
usize::MAX instead.

Checking the [diff in x86 generated
assembly](james7132/bevy_asm_tests@6e4da65),
this change actually results in fewer instructions generated. One
potential downside is that it seems to have moved a load before a
branch, which means we may be incurring a cache miss even if the element
is not there.

Note: unlike bevyengine#2104 and bevyengine#11843, this PR only targets the metadata stores
for the ECS and not the component storage itself. Due to bevyengine#9907 targeting
`Entity::generation` instead of `Entity::index`, `ComponentSparseSet`
storing only up to `u32::MAX` elements would become a correctness issue.

This will come with a cost when inserting items into the SparseSet, as
now there is a potential for a panic. These cost are really only
incurred when constructing a new Table, Archetype, or Resource that has
never been seen before by the World. All operations that are fairly cold
and not on any particular hotpath, even for command application.

---

## Changelog
Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements
instead of `usize::MAX`.
Changed: `SparseSet` now uses 33-50% less memory overhead per stored
item.
@james7132 james7132 deleted the nonmax-sparse-set branch March 10, 2024 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants