Skip to content

make CI build fast #9177

@zhanyong-wan

Description

@zhanyong-wan

🐛 Bug

The CI build takes ~2 hours, significantly affects dev velocity.

Judging from https://github.com/pytorch/xla/actions/runs/14986142268/job/42100348515, the Build PyTorch/XLA steps seems the bottleneck (it takes 1h15m and blocks a whole bunch of downstream test jobs). If we can speed this up, we may shove a large chunk from the build time.

Potential long-hanging fruit:

  • The log suggests that there are only 32 parallel bazel actions for this job, far below our recommended dev set-up (112 actions). I suspect the worker machines have only 32 vCPUs. Can we upgrade to 128+ vCPUs? Build machines are highly leveraged, so investment there will pay for itself quickly in terms of dev velocity.
  • Set up a bazel remote build farm so that the build is parallelized across machines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CICI related changebuildBuild process related matters (e.g. build system).tech debtTechnical Debt Is Evil

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions