Skip to content

Extend rolling_exp to support pd.Timedelta objects with window halflife #10237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

abiasiol
Copy link

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

Description

Extended rolling_exp to support pd.Timedelta objects for the window size when using window_type="halflife" along datetime dimensions, similar to pandas' ewm. This allows expressions like da.rolling_exp(time=pd.Timedelta(days=1), window_type="halflife").mean().

Implementation

  • Matches pandas implementation, allowing the operation only when:
    • window is a pd.Timedelta object
    • window_type is "halflife"
    • dimension is a datetime index
    • operation is mean
  • Take advantage of numbagg's implementation of nanmean which allows alpha to be an array
  • Ported over _calculate_deltas function rather than relying on pandas' private implementation

Behavior Note

One difference from pandas' behavior: when dealing with nan values and a very short timedelta, this implementation returns nan while pandas appears to carry forward the previous value. This behavior seems more appropriate to me (user can fill it later, if they need to).

Example demonstrating the difference:

times = pd.date_range("2000-01-01", freq="1D", periods=21)
da = DataArray(
    np.random.random((21, 4)),
    dims=("time", "x"),
    coords=dict(time=times),
)
da = da.where(da > 0.2)
da.to_pandas().ewm(halflife=pd.Timedelta(minutes=1), times=da.time.values).mean()
da.rolling_exp(time=pd.Timedelta(minutes=1), window_type="halflife").mean().to_pandas()

abiasiol and others added 4 commits April 19, 2025 16:34
Added validation and calculation functions for halflife operations. Updated docstrings and type hints accordingly. Moved _calculate_deltas literally from pandas/window/core/ewm.py to not rely on internal pandas function.
Introduced new test cases to validate the behavior of rolling_exp when using Timedelta windows, specifically for the halflife window type.
Checks for compatibility between window type, window, index, and operation. Check results match pandas.
Copy link

welcome bot commented Apr 20, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

abiasiol and others added 3 commits April 20, 2025 11:20
…compatibility with pandas < 2.2.0

pandas ewm can work with non-ns resolution from >= 2.2.0. Here we just test that this PR rolling_exp can work with non-ns resolution.
@max-sixty
Copy link
Collaborator

thanks @abiasiol !

couple of quick questions:

  • why limit to halflife?
  • does it raise / handle indexes with uneven spacing?
  • why limit to mean?

@abiasiol
Copy link
Author

thanks @abiasiol !

couple of quick questions:
* does it raise / handle indexes with uneven spacing?

Hi @max-sixty !

It works with uneven spacing (the way that Pandas does):

times = pd.date_range("2000-01-01", freq="1D", periods=21)
times_delta = pd.to_timedelta(np.random.randint(0, 12, size=len(times)), unit="h")
times = times + times_delta

da = DataArray(
    np.random.random((21, 4)),
    dims=("time", "x"),
    coords=dict(time=times, x=["a", "b", "c", "d"]),
)

np.allclose(
    da.rolling_exp(time=pd.Timedelta(hours=2), window_type="halflife").mean().values,
    da.to_pandas()
    .ewm(halflife=pd.Timedelta(hours=2), times=da.time.values)
    .mean()
    .values,
) # True

@abiasiol
Copy link
Author

thanks @abiasiol !

couple of quick questions:

* why limit to halflife?
* why limit to mean?

Reading the docstring of Pandas ewm, mean() should be the only "supported" operation, so I kept it simple and followed that.

If times is provided, halflife and one of com, span or alpha may be provided.
halflife: If times is specified, a timedelta convertible unit over which an observation decays to half its value. Only applicable to mean(), and halflife value will not apply to the other functions.

But let me take another look, and I'll get back to you.

@max-sixty
Copy link
Collaborator

ah, great, it uses the numbagg feature which takes an array of alphas — happy to see that being used! I wrote it for myself but hadn't really integrated it into xarray

I don't fully understand why we're limited to halflife — all the window types are freely convertible to one another; though possibly I'm misunderstanding something. (and same thing with mean vs other ops, though am even less confident) — does pandas have a reason for this specificity?

I haven't looked in enough detail at the calcs, but assuming we're well-tested against the pandas implementation, that's sufficient

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants