feat: generate kinds in parallel across multiple processes #765

bhearsum · 2025-09-05T14:38:58Z

This is a slightly updated version of #738 + #744, which were backed out a few weeks ago due to not working correctly on Windows. In a conversation I had today it was suggested that we should just not support multiprocess on Windows so we can at least take the win elsewhere. This patch does that, and surely no other issues will arise...

bhearsum · 2025-09-05T14:39:17Z

This still needs more testing in Gecko; I'm opening it as a draft for now to sanity check CI on the taskgraph side.

bhearsum · 2025-09-05T15:59:01Z

For posterity, I did try using a ThreadPoolExecutor as an alternative solution on Windows (which ought to have brought similar perf gains if using a free threaded Python). That seemed to run into concurrency issues that the multiprocess version didn't, so I didn't pursue it further. I don't think it's a bad idea to look at this again in the future, when we're already using free threaded python.

bhearsum · 2025-09-05T16:12:23Z

Looks like this "works" on Windows this time: https://treeherder.mozilla.org/jobs?repo=try&revision=7eb3590d144a7720e1b4f4714c6429f9eb258fdf (aka: it uses the old serial generation).

src/taskgraph/generator.py

jcristau

r+wc

src/taskgraph/generator.py

@ahal

This is a cleaned up and slightly improved version of @ahal's original patch. Most notably, it uses `wait` to resubmit new kinds as soon as they become available (instead of waiting for all kinds in each round to be completed). This means that if a slow kind gets submitted before all other (non-downstream) kinds have been submitted, that it won't block them. In the case of Gecko, the effect of this is that the `test` kind begins to process very quickly, and all other kinds are finished processing before that has completed. Locally, this took `./mach taskgraph tasks` from 1m26s to 1m9s (measured from command start to the final "Generated xxx tasks" message. On try the results were a bit more mixed. The minimum time I observed without this patch was 140s, while the maximum was 291s (which seems to have been caused by bugbug slowness...which I'm willing to throw out). Outside of that outlier, the maximum was 146s and the mean was 143s. The minimum time I observed with this patch was 130s, while the maximum was 144s and the mean was 138s. I presume the difference in results locally vs. Try is that locally I'm on a 64-core SSD machine, and the decision tasks run on lowered powered machines on Try, so there ends up being some resource contention (I/O, I suspect, because the ProcessPoolExecutor will only run one process per CPU core) when we process kinds in parallel there. Despite this disappointing result on Try, this may still be worth taking, as `./mach taskgraph` runs twice in the critical path of many try pushes (once on a developer's machine, and again in the decision task). raw data: Over 5 runs on try I got, without this patch: 291s, 146s, 146s, 140s, 140s In each of those, there were 241s, 92s, 94s, 90s, 90s between "Loading tasks for kind test" and "Generated xxxxxx tasks for kind test" Which means we spent the following amount of time doing non-test kind things in the critical path: 50s, 54s, 52s, 50s, 50s With this patch: 130s, 141s, and 144s, 140s, 135s In each of those, there were 105s, 114s, 115s, 114s, 109s between "Loading tasks for kind test" and "Generated xxxxxx tasks for kind test" Which means we spent the following amount of time doing non-test kind things, but it was almost entirely out of the critical path: 25s, 27s, 29s, 26s, 26s

…ocess kind generation This was supposed to be done in taskcluster#765, but clearly I didn't push it before merging.

…ocess kind generation (#773) This was supposed to be done in #765, but clearly I didn't push it before merging.

bhearsum force-pushed the push-wxroyxouptzo branch from 55e92c3 to 02bd9de Compare September 5, 2025 15:54

bhearsum marked this pull request as ready for review September 5, 2025 16:12

bhearsum requested a review from a team as a code owner September 5, 2025 16:12

bhearsum requested a review from abhishekmadan30 September 5, 2025 16:12

bhearsum force-pushed the push-wxroyxouptzo branch from 02bd9de to d9d50ca Compare September 8, 2025 13:03

bhearsum requested review from ahal and removed request for abhishekmadan30 September 8, 2025 13:03

jcristau reviewed Sep 9, 2025

View reviewed changes

src/taskgraph/generator.py Show resolved Hide resolved

bhearsum force-pushed the push-wxroyxouptzo branch from d9d50ca to 7e13a79 Compare September 9, 2025 14:52

bhearsum requested a review from jcristau September 9, 2025 17:12

jcristau approved these changes Sep 10, 2025

View reviewed changes

src/taskgraph/generator.py Show resolved Hide resolved

src/taskgraph/generator.py Show resolved Hide resolved

bhearsum and others added 2 commits September 10, 2025 09:21

refactor: make fake graph config's picklable

7a8de31

bhearsum force-pushed the push-wxroyxouptzo branch from 7e13a79 to 9e42c6a Compare September 10, 2025 13:25

bhearsum mentioned this pull request Sep 10, 2025

feat: add xlarge pool for t-linux-docker mozilla-releng/fxci-config#490

Merged

bhearsum merged commit f561b87 into taskcluster:main Sep 10, 2025
15 of 17 checks passed

bhearsum added a commit to bhearsum/taskgraph that referenced this pull request Sep 11, 2025

fix: add TASKGRAPH_SERIAL and set multiprocessing context for multipr…

767d1a1

…ocess kind generation This was supposed to be done in taskcluster#765, but clearly I didn't push it before merging.

bhearsum mentioned this pull request Sep 11, 2025

fix: add TASKGRAPH_SERIAL and set multiprocessing context for multipr… #773

Merged

bhearsum added a commit to bhearsum/taskgraph that referenced this pull request Sep 11, 2025

fix: add TASKGRAPH_SERIAL and set multiprocessing context for multipr…

fcd4269

…ocess kind generation This was supposed to be done in taskcluster#765, but clearly I didn't push it before merging.

bhearsum added a commit that referenced this pull request Sep 11, 2025

fix: add TASKGRAPH_SERIAL and set multiprocessing context for multipr…

9aebece

…ocess kind generation (#773) This was supposed to be done in #765, but clearly I didn't push it before merging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: generate kinds in parallel across multiple processes #765

feat: generate kinds in parallel across multiple processes #765

Uh oh!

bhearsum commented Sep 5, 2025

Uh oh!

bhearsum commented Sep 5, 2025

Uh oh!

bhearsum commented Sep 5, 2025

Uh oh!

bhearsum commented Sep 5, 2025

Uh oh!

Uh oh!

jcristau left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: generate kinds in parallel across multiple processes #765

feat: generate kinds in parallel across multiple processes #765

Uh oh!

Conversation

bhearsum commented Sep 5, 2025

Uh oh!

bhearsum commented Sep 5, 2025

Uh oh!

bhearsum commented Sep 5, 2025

Uh oh!

bhearsum commented Sep 5, 2025

Uh oh!

Uh oh!

jcristau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!