Skip to content

Conversation

@james7132
Copy link
Member

@james7132 james7132 commented Aug 19, 2025

Objective

This is a follow-up to #20331 on the path to resolve #1907 sand supersedes #12990. Fixes #11849.

async_executor internally locks a Mutex each time it spawns and finishes a task. This adds a significant amount of overhead for any operation interacting with the task pools, even in single-threaded cases. For more information, see smol-rs/async-executor#112.

A diff against #20331 can be see here.

Solution

Change all of Executor to operate on &'static Self. This avoids both the need for Arc clones and the active Slab for tracking tasks. This removes the need for the Mutex lock and two Arc counter increment/decrements when a task is pawned and finishes.

TaskPool now wraps a &'static Executor, either taken from a static variable or via a leaked Box. It's now strongly advised not to use your own TaskPool and use and configure your the pre-existiing usages (e.g. ComputeTaskPool).

NOTE: Right now, the static TaskPools are never dropped due to being in static variables. This means that the tasks scheduled onto them are never dropped when the program terminates, meaning that graceful shutdown is harder than it otherwise would be to just drop a TaskPool. This is true without this PR, but this PR bakes that assumption in and will require dedicated graceful shutdown alternatives that require all requisite living tasks be dropped or awaited by the user, then the queues flushed before shutting down the TaskPool threads. This can be quite involved, so an equivalent to #[tokio::main] may be required to fully enforce this, if desired.

This PR will remain in draft until #20331 or some equivalent is merged.

Testing

Local testing and benchmarking. Benchmarking results:

group                                                                                                     bevy_executor                            main                                     static-executor
-----                                                                                                     -------------                            ----                                     ---------------
added_archetypes/archetype_count/100                                                                      1.00     36.8±1.13µs        ? ?/sec      1.00     36.8±2.14µs        ? ?/sec      1.07     39.3±2.37µs        ? ?/sec
added_archetypes/archetype_count/1000                                                                     1.04   399.8±15.31µs        ? ?/sec      1.00   384.9±19.62µs        ? ?/sec      1.04   398.9±16.30µs        ? ?/sec
added_archetypes/archetype_count/10000                                                                    1.09     11.0±0.95ms        ? ?/sec      1.00     10.1±1.24ms        ? ?/sec      1.21     12.2±1.85ms        ? ?/sec
busy_systems/01x_entities_03_systems                                                                      1.03     18.6±0.56µs        ? ?/sec      2.22     40.3±1.18µs        ? ?/sec      1.00     18.1±0.70µs        ? ?/sec
busy_systems/01x_entities_09_systems                                                                      1.03     49.8±1.41µs        ? ?/sec      1.62     79.0±1.81µs        ? ?/sec      1.00     48.6±1.53µs        ? ?/sec
busy_systems/01x_entities_15_systems                                                                      1.07     84.0±1.46µs        ? ?/sec      1.48    116.4±2.48µs        ? ?/sec      1.00     78.5±3.04µs        ? ?/sec
busy_systems/03x_entities_03_systems                                                                      1.05     30.7±1.12µs        ? ?/sec      1.86     54.6±1.68µs        ? ?/sec      1.00     29.3±0.87µs        ? ?/sec
busy_systems/03x_entities_09_systems                                                                      1.03     83.3±2.12µs        ? ?/sec      1.55    125.3±3.44µs        ? ?/sec      1.00     81.0±2.23µs        ? ?/sec
busy_systems/03x_entities_15_systems                                                                      1.03    140.2±5.51µs        ? ?/sec      1.40    189.2±6.06µs        ? ?/sec      1.00    135.6±3.16µs        ? ?/sec
busy_systems/05x_entities_03_systems                                                                      1.14     50.8±1.00µs        ? ?/sec      1.74     77.7±6.65µs        ? ?/sec      1.00     44.6±1.26µs        ? ?/sec
busy_systems/05x_entities_09_systems                                                                      1.12    132.3±2.39µs        ? ?/sec      1.59    187.4±6.30µs        ? ?/sec      1.00    117.8±2.33µs        ? ?/sec
busy_systems/05x_entities_15_systems                                                                      1.13    217.3±5.15µs        ? ?/sec      1.45    279.7±5.10µs        ? ?/sec      1.00    193.0±5.40µs        ? ?/sec
contrived/01x_entities_03_systems                                                                         1.05      9.5±0.31µs        ? ?/sec      1.75     15.8±1.12µs        ? ?/sec      1.00      9.0±0.61µs        ? ?/sec
contrived/01x_entities_09_systems                                                                         1.08     23.1±0.65µs        ? ?/sec      1.68     36.1±5.34µs        ? ?/sec      1.00     21.5±0.91µs        ? ?/sec
contrived/01x_entities_15_systems                                                                         1.07     36.5±1.19µs        ? ?/sec      1.61     55.2±2.02µs        ? ?/sec      1.00     34.2±1.52µs        ? ?/sec
contrived/03x_entities_03_systems                                                                         1.03     21.1±1.03µs        ? ?/sec      1.76     35.9±2.03µs        ? ?/sec      1.00     20.4±0.85µs        ? ?/sec
contrived/03x_entities_09_systems                                                                         1.13     47.8±3.19µs        ? ?/sec      1.77     75.1±1.82µs        ? ?/sec      1.00     42.3±1.87µs        ? ?/sec
contrived/03x_entities_15_systems                                                                         1.13    77.7±13.55µs        ? ?/sec      1.74    119.7±9.98µs        ? ?/sec      1.00     68.8±2.30µs        ? ?/sec
contrived/05x_entities_03_systems                                                                         1.00     27.0±0.96µs        ? ?/sec      1.54     41.7±2.99µs        ? ?/sec      1.02     27.5±1.35µs        ? ?/sec
contrived/05x_entities_09_systems                                                                         1.11     67.1±4.48µs        ? ?/sec      1.74    105.7±5.74µs        ? ?/sec      1.00     60.7±2.25µs        ? ?/sec
contrived/05x_entities_15_systems                                                                         1.10   107.1±10.20µs        ? ?/sec      1.67   163.2±10.30µs        ? ?/sec      1.00     97.7±4.58µs        ? ?/sec
empty_archetypes/for_each/10                                                                              1.50  1792.3±56.15ns        ? ?/sec      1.35  1607.9±121.49ns        ? ?/sec     1.00  1193.3±22.70ns        ? ?/sec
empty_archetypes/for_each/100                                                                             1.48  1752.9±71.42ns        ? ?/sec      1.42  1683.9±117.60ns        ? ?/sec     1.00  1184.4±27.06ns        ? ?/sec
empty_archetypes/for_each/1000                                                                            1.26  1801.1±160.99ns        ? ?/sec     1.72      2.5±1.27µs        ? ?/sec      1.00  1426.1±25.37ns        ? ?/sec
empty_archetypes/for_each/10000                                                                           1.96     13.3±0.55µs        ? ?/sec      2.74     18.6±1.52µs        ? ?/sec      1.00      6.8±0.07µs        ? ?/sec
empty_archetypes/iter/10                                                                                  1.37  1655.2±47.72ns        ? ?/sec      1.35  1624.9±138.81ns        ? ?/sec     1.00  1206.7±61.47ns        ? ?/sec
empty_archetypes/iter/100                                                                                 1.43  1694.9±46.59ns        ? ?/sec      1.47  1746.4±202.97ns        ? ?/sec     1.00  1186.2±24.38ns        ? ?/sec
empty_archetypes/iter/1000                                                                                1.27  1752.9±97.17ns        ? ?/sec      1.94      2.7±0.48µs        ? ?/sec      1.00  1376.8±19.78ns        ? ?/sec
empty_archetypes/iter/10000                                                                               2.46      8.9±1.19µs        ? ?/sec      5.69     20.6±2.13µs        ? ?/sec      1.00      3.6±0.11µs        ? ?/sec
empty_archetypes/par_for_each/10                                                                          2.44      5.1±0.87µs        ? ?/sec      9.21     19.3±1.98µs        ? ?/sec      1.00      2.1±0.03µs        ? ?/sec
empty_archetypes/par_for_each/100                                                                         2.71      5.9±0.89µs        ? ?/sec      5.91     12.9±1.54µs        ? ?/sec      1.00      2.2±0.06µs        ? ?/sec
empty_archetypes/par_for_each/1000                                                                        1.92      5.3±0.39µs        ? ?/sec      7.06     19.5±1.76µs        ? ?/sec      1.00      2.8±0.06µs        ? ?/sec
empty_archetypes/par_for_each/10000                                                                       1.53     21.4±1.09µs        ? ?/sec      2.54     35.5±1.46µs        ? ?/sec      1.00     14.0±0.50µs        ? ?/sec
empty_commands/0_entities                                                                                 1.00      6.1±0.17ns        ? ?/sec      1.05      6.4±0.73ns        ? ?/sec      1.00      6.1±0.04ns        ? ?/sec
empty_systems/0_systems                                                                                   1.03     10.4±0.47ns        ? ?/sec      1.00     10.1±0.49ns        ? ?/sec      1.00     10.1±0.09ns        ? ?/sec
empty_systems/1000_systems                                                                                2.83   603.4±38.38µs        ? ?/sec      1.93   411.6±12.47µs        ? ?/sec      1.00   212.8±18.80µs        ? ?/sec
empty_systems/100_systems                                                                                 2.11     44.4±2.15µs        ? ?/sec      1.76     37.0±0.94µs        ? ?/sec      1.00     21.0±2.21µs        ? ?/sec
empty_systems/10_systems                                                                                  1.11      4.3±0.76µs        ? ?/sec      2.59     10.1±0.56µs        ? ?/sec      1.00      3.9±0.82µs        ? ?/sec
empty_systems/2_systems                                                                                   1.29  1964.4±158.36ns        ? ?/sec     3.79      5.8±0.96µs        ? ?/sec      1.00  1523.1±111.01ns        ? ?/sec
empty_systems/4_systems                                                                                   1.31      2.6±0.13µs        ? ?/sec      4.52      8.9±0.92µs        ? ?/sec      1.00  1975.4±119.20ns        ? ?/sec
for_each_par_iter/threads/1                                                                               1.00     11.1±0.17ms        ? ?/sec      1.89     20.9±1.50ms        ? ?/sec      1.00     11.1±0.15ms        ? ?/sec
for_each_par_iter/threads/16                                                                              1.51      3.1±0.10ms        ? ?/sec      1.50      3.1±0.10ms        ? ?/sec      1.00      2.1±0.11ms        ? ?/sec
for_each_par_iter/threads/2                                                                               1.01      7.6±0.14ms        ? ?/sec      1.51     11.4±0.29ms        ? ?/sec      1.00      7.6±0.10ms        ? ?/sec
for_each_par_iter/threads/32                                                                              1.09      2.2±0.04ms        ? ?/sec      1.08      2.2±0.06ms        ? ?/sec      1.00      2.0±0.06ms        ? ?/sec
for_each_par_iter/threads/4                                                                               1.02      4.8±0.16ms        ? ?/sec      1.65      7.6±0.16ms        ? ?/sec      1.00      4.6±0.11ms        ? ?/sec
for_each_par_iter/threads/8                                                                               1.56      4.6±0.74ms        ? ?/sec      1.61      4.7±0.10ms        ? ?/sec      1.00      2.9±0.11ms        ? ?/sec
heavy_compute/base                                                                                        1.13    246.2±8.26µs        ? ?/sec      1.07    233.0±4.16µs        ? ?/sec      1.00    217.9±3.17µs        ? ?/sec
many_maps_iter                                                                                            1.38     30.7±6.03ms        ? ?/sec      1.04     23.0±1.02ms        ? ?/sec      1.00     22.2±0.22ms        ? ?/sec
many_maps_par_iter/threads/1                                                                              1.02     11.6±0.37ms        ? ?/sec      1.86     21.2±1.42ms        ? ?/sec      1.00     11.4±0.22ms        ? ?/sec
many_maps_par_iter/threads/16                                                                             1.00      2.1±0.08ms        ? ?/sec      1.50      3.1±0.11ms        ? ?/sec      1.00      2.1±0.10ms        ? ?/sec
many_maps_par_iter/threads/2                                                                              1.00      7.7±0.11ms        ? ?/sec      1.48     11.4±0.30ms        ? ?/sec      1.00      7.7±0.12ms        ? ?/sec
many_maps_par_iter/threads/32                                                                             1.00      2.0±0.03ms        ? ?/sec      1.09      2.2±0.02ms        ? ?/sec      1.01      2.0±0.06ms        ? ?/sec
many_maps_par_iter/threads/4                                                                              1.00      4.7±0.08ms        ? ?/sec      1.66      7.8±0.12ms        ? ?/sec      1.02      4.8±0.13ms        ? ?/sec
many_maps_par_iter/threads/8                                                                              1.04      3.1±0.13ms        ? ?/sec      1.64      4.9±0.14ms        ? ?/sec      1.00      3.0±0.12ms        ? ?/sec
no_archetypes/system_count/0                                                                              1.00     12.9±0.12ns        ? ?/sec      1.04     13.4±0.53ns        ? ?/sec      1.05     13.6±1.17ns        ? ?/sec
no_archetypes/system_count/10                                                                             1.00    105.5±0.43ns        ? ?/sec      1.06    111.7±6.65ns        ? ?/sec      1.05    110.8±5.26ns        ? ?/sec
no_archetypes/system_count/100                                                                            1.00   921.4±20.19ns        ? ?/sec      1.03   945.7±56.13ns        ? ?/sec      1.04   954.7±60.21ns        ? ?/sec
overhead_iter                                                                                             1.01      0.2±0.01ns        ? ?/sec      1.02      0.2±0.01ns        ? ?/sec      1.00      0.2±0.00ns        ? ?/sec
overhead_par_iter/threads/1                                                                               1.47     22.9±1.95µs        ? ?/sec      1.34     21.0±0.87µs        ? ?/sec      1.00     15.7±1.73µs        ? ?/sec
overhead_par_iter/threads/16                                                                              3.11     41.4±2.99µs        ? ?/sec      2.22     29.6±0.77µs        ? ?/sec      1.00     13.3±1.03µs        ? ?/sec
overhead_par_iter/threads/2                                                                               1.89     29.2±1.36µs        ? ?/sec      1.72     26.6±1.12µs        ? ?/sec      1.00     15.5±2.65µs        ? ?/sec
overhead_par_iter/threads/32                                                                              3.26     43.8±2.40µs        ? ?/sec      2.37     31.8±0.93µs        ? ?/sec      1.00     13.4±0.94µs        ? ?/sec
overhead_par_iter/threads/4                                                                               2.64     36.2±2.77µs        ? ?/sec      2.09     28.7±1.09µs        ? ?/sec      1.00     13.7±1.13µs        ? ?/sec
overhead_par_iter/threads/8                                                                               3.09     41.5±2.50µs        ? ?/sec      2.24     30.1±0.86µs        ? ?/sec      1.00     13.4±0.79µs        ? ?/sec
par_iter_simple/hybrid                                                                                    1.40     81.6±4.33µs        ? ?/sec      1.41     82.3±6.10µs        ? ?/sec      1.00     58.2±7.17µs        ? ?/sec
par_iter_simple/with_0_fragment                                                                           1.64     51.7±4.35µs        ? ?/sec      1.44     45.6±3.59µs        ? ?/sec      1.00     31.6±3.24µs        ? ?/sec
par_iter_simple/with_1000_fragment                                                                        1.50     67.8±3.77µs        ? ?/sec      1.44     65.1±5.16µs        ? ?/sec      1.00     45.1±6.33µs        ? ?/sec
par_iter_simple/with_100_fragment                                                                         1.57     53.2±3.21µs        ? ?/sec      1.44     48.8±3.43µs        ? ?/sec      1.00     33.9±3.82µs        ? ?/sec
par_iter_simple/with_10_fragment                                                                          1.56     51.4±4.00µs        ? ?/sec      1.44     47.5±5.10µs        ? ?/sec      1.00     33.0±4.56µs        ? ?/sec
param/combinator_system/8_dyn_params_system                                                               1.15  1429.6±60.37ns        ? ?/sec      1.33  1651.4±217.47ns        ? ?/sec     1.00  1242.0±47.33ns        ? ?/sec
param/combinator_system/8_piped_systems                                                                   1.17  1382.3±62.79ns        ? ?/sec      1.39  1633.6±52.53ns        ? ?/sec      1.00  1178.0±39.70ns        ? ?/sec
param/combinator_system/8_variant_param_set_system                                                        1.15  1375.6±49.81ns        ? ?/sec      1.51  1801.9±1117.63ns        ? ?/sec    1.00  1192.8±24.91ns        ? ?/sec
run_condition/no/1000_systems                                                                             1.14     38.8±3.10µs        ? ?/sec      1.00     34.2±0.42µs        ? ?/sec      1.49    50.9±10.63µs        ? ?/sec
run_condition/no/100_systems                                                                              1.09      2.2±0.13µs        ? ?/sec      1.00      2.0±0.07µs        ? ?/sec      1.33      2.7±0.77µs        ? ?/sec
run_condition/no/10_systems                                                                               1.15   301.7±16.65ns        ? ?/sec      1.00   263.0±10.48ns        ? ?/sec      1.51  397.4±110.20ns        ? ?/sec
run_condition/yes/1000_systems                                                                            1.55   427.3±58.83µs        ? ?/sec      1.97   542.9±39.23µs        ? ?/sec      1.00   275.7±28.12µs        ? ?/sec
run_condition/yes/100_systems                                                                             1.68     37.9±1.33µs        ? ?/sec      2.05     46.2±3.51µs        ? ?/sec      1.00     22.6±2.54µs        ? ?/sec
run_condition/yes/10_systems                                                                              1.29      5.2±0.40µs        ? ?/sec      1.82      7.3±0.56µs        ? ?/sec      1.00      4.0±0.41µs        ? ?/sec
run_condition/yes_using_query/1000_systems                                                                1.28   413.4±22.99µs        ? ?/sec      1.77   570.8±77.20µs        ? ?/sec      1.00   321.9±22.87µs        ? ?/sec
run_condition/yes_using_query/100_systems                                                                 1.30     40.6±2.63µs        ? ?/sec      1.38     43.0±3.78µs        ? ?/sec      1.00     31.2±3.55µs        ? ?/sec
run_condition/yes_using_query/10_systems                                                                  1.06      6.0±1.24µs        ? ?/sec      1.33      7.4±0.52µs        ? ?/sec      1.00      5.6±0.54µs        ? ?/sec
run_condition/yes_using_resource/1000_systems                                                             1.48   423.0±29.47µs        ? ?/sec      1.38   394.9±29.42µs        ? ?/sec      1.00   285.6±19.17µs        ? ?/sec
run_condition/yes_using_resource/100_systems                                                              1.41     38.8±2.51µs        ? ?/sec      1.35     37.3±1.54µs        ? ?/sec      1.00     27.5±1.44µs        ? ?/sec
run_condition/yes_using_resource/10_systems                                                               1.00      5.4±0.51µs        ? ?/sec      1.95     10.5±0.75µs        ? ?/sec      1.01      5.4±0.35µs        ? ?/sec
run_empty_schedule/MultiThreaded                                                                          1.00      9.7±0.14ns        ? ?/sec      1.43     13.9±0.25ns        ? ?/sec      1.03     10.0±0.14ns        ? ?/sec
run_empty_schedule/Simple                                                                                 1.00     10.7±0.10ns        ? ?/sec      1.37     14.7±0.50ns        ? ?/sec      1.01     10.8±0.07ns        ? ?/sec
run_empty_schedule/SingleThreaded                                                                         1.00     12.5±0.22ns        ? ?/sec      1.34     16.8±1.34ns        ? ?/sec      1.00     12.5±0.10ns        ? ?/sec
schedule/base                                                                                             1.00     18.1±1.10µs        ? ?/sec      2.18     39.6±1.89µs        ? ?/sec      1.03     18.7±1.72µs        ? ?/sec

#12990 did see some significant improvements to how much overhead was seen in the ECS multi-threaded executor. On many_foxes:

image

@james7132 james7132 added the S-Blocked This cannot move forward until something else changes label Aug 22, 2025
@james7132
Copy link
Member Author

Ran another set of benchmarks against main, #20331, and this PR. See the PR description for the results.

@james7132 james7132 removed the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Aug 24, 2025
@NthTensor NthTensor self-requested a review August 31, 2025 19:34
@NthTensor
Copy link
Contributor

I mixed up which PR I was applying comments to. Disregard this approval for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Tasks Tools for parallel and async work C-Performance A change motivated by improving speed, memory usage or compile times M-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide S-Blocked This cannot move forward until something else changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch to using static async executors Using ComputeTaskPool for a par_for_each query only uses half of available logical cores

2 participants