Skip to content

WIP: Test CI speedups #321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

WIP: Test CI speedups #321

wants to merge 6 commits into from

Conversation

ev-br
Copy link
Member

@ev-br ev-br commented May 11, 2025

testing, do not merge: this checks out array-api-tests from a branch of a fork

The matching array-api-tests PR, data-apis/array-api-tests#373, allows to turn xfails into skips.
Apparently, this has a large effect on run time: in fact the vast majority of time on CI was spent in hypothesis working to distill a failure---for things which we know are failing and have explicitly marked as xfails.

@ev-br
Copy link
Member Author

ev-br commented May 11, 2025

Okay, this is an obvious win.
PyTorch job used to take ~6min, in this PR it is 2m14s
Dask job used to take ~13m30s, in this PR it is ~40s
NumPy wins are marginal, down to just under 2 minutes from just above 2 minutes.

The largest benefactor is Dask: we had to limit the --max-examples to 5, meaning we can up the number to ~100. Still in the "smoke testing" realm, but still closer to what other backends are using.

@ev-br ev-br force-pushed the test_ci_speedups branch from ab304c1 to 46e9711 Compare May 11, 2025 15:40
@ev-br
Copy link
Member Author

ev-br commented May 11, 2025

The last commit bumps the number of examples to 500 for NumPy and PyTorch (up from 200) and to 50 for Dask (up from 5). All runs finish in under 5 minutes, which is I think is a sweet spot for the balance between the convenience of iterating on a PR and a reasonable test coverage.

@ev-br
Copy link
Member Author

ev-br commented May 12, 2025

Using 4 pytest workers, -n 4, allows to bump the number of examples to 1000 for numpy/pytorch and 200 for dask; run time is still under 5-10 minutes (closer to 6-7 for torch, under 5 for others). I suppose we'll need to keep an eye on how this fluctuates but overall this sounds like a win.

One potential point of concern can be how parallelization affects the failure mode. It's already not always easy to untangle how hypothesis reports errors and replicate a CI failure locally. I suppose we'll try and see if parallelization makes it any more difficult.

@ev-br ev-br force-pushed the test_ci_speedups branch from 626069f to 232e703 Compare May 12, 2025 12:51
@ev-br ev-br closed this May 12, 2025
@ev-br ev-br reopened this May 12, 2025
@ev-br ev-br force-pushed the test_ci_speedups branch from 4ffc399 to 12f5ff2 Compare May 12, 2025 14:51
@ev-br ev-br closed this May 13, 2025
@ev-br ev-br reopened this May 13, 2025
@ev-br
Copy link
Member Author

ev-br commented May 13, 2025

Okay, I think this is it for low-hanging fruits. The slowest tests on torch and numpy spend their time inside hypothesis, so not much we can do about it.
For dask (and presumably, jax and cupy), further improvements are possible, with significantly more drastic changes. Likely using some ideas from data-apis/array-api-tests#197 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant