-
Couldn't load subscription status.
- Fork 560
Open
Labels
CICI related changeCI related changebuildBuild process related matters (e.g. build system).Build process related matters (e.g. build system).tech debtTechnical Debt Is EvilTechnical Debt Is Evil
Description
🐛 Bug
The CI build takes ~2 hours, significantly affects dev velocity.
Judging from https://github.com/pytorch/xla/actions/runs/14986142268/job/42100348515, the Build PyTorch/XLA steps seems the bottleneck (it takes 1h15m and blocks a whole bunch of downstream test jobs). If we can speed this up, we may shove a large chunk from the build time.
Potential long-hanging fruit:
- The log suggests that there are only 32 parallel bazel actions for this job, far below our recommended dev set-up (112 actions). I suspect the worker machines have only 32 vCPUs. Can we upgrade to 128+ vCPUs? Build machines are highly leveraged, so investment there will pay for itself quickly in terms of dev velocity.
- Set up a bazel remote build farm so that the build is parallelized across machines.
Metadata
Metadata
Assignees
Labels
CICI related changeCI related changebuildBuild process related matters (e.g. build system).Build process related matters (e.g. build system).tech debtTechnical Debt Is EvilTechnical Debt Is Evil