Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip dask rolling #9909

Merged
merged 4 commits into from
Dec 18, 2024
Merged

Skip dask rolling #9909

merged 4 commits into from
Dec 18, 2024

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Dec 18, 2024

Skip the rolling tests using dask so the CI becomes usable again.

Test with terrible performance, feel free to fix it and reactivate this test:

import numpy as np
import pandas as pd

import xarray as xr


def randn(shape, frac_nan=None, chunks=None, seed=0):
    rng = np.random.default_rng(seed)
    if chunks is None:
        x = rng.standard_normal(shape)
    else:
        import dask.array as da

        rng = da.random.default_rng(seed)
        x = rng.standard_normal(shape, chunks=chunks)

    if frac_nan is not None:
        inds = rng.choice(range(x.size), int(x.size * frac_nan))
        x.flat[inds] = np.nan

    return x


nx = 3000
long_nx = 30000
ny = 200
nt = 1000
window = 20

randn_xy = randn((nx, ny), frac_nan=0.1)
randn_xt = randn((nx, nt))
randn_t = randn((nt,))
randn_long = randn((long_nx,), frac_nan=0.1)


ds = xr.Dataset(
    {
        "var1": (("x", "y"), randn_xy),
        "var2": (("x", "t"), randn_xt),
        "var3": (("t",), randn_t),
    },
    coords={
        "x": np.arange(nx),
        "y": np.linspace(0, 1, ny),
        "t": pd.date_range("1970-01-01", periods=nt, freq="D"),
        "x_coords": ("x", np.linspace(1.1, 2.1, nx)),
    },
)
window_ = 20
min_periods = 5
use_bottleneck = False
%timeit ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(np.nansum).load()
# 601 ms ± 43.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


ds = ds.chunk({"x": 100, "y": 50, "t": 50})
%timeit ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(np.nansum).load()
# 1min 9s ± 1.31 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Dec 18, 2024
@Illviljan Illviljan merged commit a90fff9 into pydata:main Dec 18, 2024
29 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 19, 2025
* main: (63 commits)
  Fix zarr upstream tests (pydata#9927)
  Update pre-commit hooks (pydata#9925)
  split out CFDatetimeCoder, deprecate use_cftime as kwarg (pydata#9901)
  dev whats-new (pydata#9923)
  Whats-new 2025.01.0 (pydata#9919)
  Silence upstream Zarr warnings (pydata#9920)
  time coding refactor (pydata#9906)
  fix warning from scipy backend guess_can_open on directory (pydata#9911)
  Enhance and move ISO-8601 parser to coding.times (pydata#9899)
  Edit serialization error message (pydata#9916)
  friendlier error messages for missing chunk managers (pydata#9676)
  Bump codecov/codecov-action from 5.1.1 to 5.1.2 in the actions group (pydata#9915)
  Rewrite interp to use `apply_ufunc` (pydata#9881)
  Skip dask rolling (pydata#9909)
  Explicitly configure ReadTheDocs build to use conf.py (pydata#9908)
  Cache pre-existing Zarr arrays in Zarr backend (pydata#9861)
  Optimize idxmin, idxmax with dask (pydata#9800)
  remove unused "type: ignore" comments in test_plot.py (fixed in matplotlib 3.10.0) (pydata#9904)
  move scalar-handling logic into `possibly_convert_objects` (pydata#9900)
  Add missing DataTree attributes to docs (pydata#9876)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark Run the ASV benchmark workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant