|
| 1 | +--- |
| 2 | +title: 'Ergonomic seasonal grouping and resampling' |
| 3 | +date: '2025-06-10' |
| 4 | +authors: |
| 5 | + - name: Deepak Cherian |
| 6 | + github: dcherian |
| 7 | +summary: 'Introducing new SeasonalGrouper and SeasonResampler objects' |
| 8 | +--- |
| 9 | + |
| 10 | +## TLDR |
| 11 | + |
| 12 | +Two new [Grouper](https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md) objects - [`SeasonGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonGrouper.html#xarray.groupers.SeasonGrouper) and [`SeasonResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonResampler.html#xarray.groupers.SeasonResampler) - enable ergonomic seasonal aggregations of Xarray objects. See the [docs](https://docs.xarray.dev/en/latest/user-guide/time-series.html#handling-seasons) for more. |
| 13 | + |
| 14 | +## The Problem |
| 15 | + |
| 16 | +Xarray has supported seasonal grouping using `ds.groupby("time.season")` for a very long time. |
| 17 | +Seasonal resampling has been supported using [pandas syntax](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling) `ds.resample(time="QS-Dec")`. |
| 18 | + |
| 19 | +These approaches have significant limitations |
| 20 | + |
| 21 | +1. Custom season definitions are not possible. This is a very common user request ([1](https://github.com/pydata/xarray/discussions/6180), [2](https://github.com/pydata/xarray/discussions/5134), [3](https://github.com/pydata/xarray/discussions/6865), [4](https://stackoverflow.com/questions/68455725/how-to-enable-season-selection-as-jjas-instead-of-jja-in-xarray), [5](https://stackoverflow.com/questions/69021082/december-january-seasonal-mean)). |
| 22 | + - The `"time.season"` 'virtual variable' (or `time.dt.season`) hardcodes the Northern Hemisphere-centric three-month season definitions namely `["DJF", "MAM", "JJA", "SON"]`. |
| 23 | + - The pandas resampling syntax is more powerful but is still limited to three month seasons, even though the start date can be changed (e.g. `QS-Aug` means 'quarters starting in August'). |
| 24 | + - A common annoyance with `groupby('time.season')` is that seasons come out in alphabetical (nonsensical) order — `["DJF", "JJA", "MAM", "SON"]` — a consequence of this really being a 'categorical' reduction under the hood. |
| 25 | +2. Seasons spanning the end of the year (e.g DJF) need to be handled specially, in many cases we want to ignore any months in incompletely sampled seasons. As an example, for a time series beginning in Jan-2001 we'd prefer the DJF season beginning in Dec-2000 to be ignored. |
| 26 | +3. Overlapping seasons are a common request: `["DJFM", "MAMJ", "JJAS", "SOND"]`. |
| 27 | + |
| 28 | +## The Solution |
| 29 | + |
| 30 | +Our new Grouper objects - [`SeasonGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonGrouper.html#xarray.groupers.SeasonGrouper) and [`SeasonResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonResampler.html#xarray.groupers.SeasonResampler) - help solve nearly all the above problems. |
| 31 | +All of the GroupBy API is supported (reductions, iteration, `map`, etc.). |
| 32 | + |
| 33 | +## Examples |
| 34 | + |
| 35 | +### Load data |
| 36 | + |
| 37 | +Load in our classic example dataset: |
| 38 | + |
| 39 | +````python |
| 40 | +>>> import xarray as xr |
| 41 | +>>> |
| 42 | +>>> ds = xr.tutorial.open_dataset("air_temperature") |
| 43 | +>>> ds |
| 44 | +<xarray.Dataset> Size: 31MB |
| 45 | +Dimensions: (lat: 25, time: 2920, lon: 53) |
| 46 | +Coordinates: |
| 47 | + * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 |
| 48 | + * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 |
| 49 | + * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 |
| 50 | +Data variables: |
| 51 | + air (time, lat, lon) float64 31MB ...``` |
| 52 | +Attributes: (5) |
| 53 | +```` |
| 54 | + |
| 55 | +### SeasonGrouper |
| 56 | + |
| 57 | +```python |
| 58 | +>>> from xarray.groupers import SeasonGrouper |
| 59 | +>>> |
| 60 | +>>> ds.groupby(time=SeasonGrouper(["DJF", "MAM", "JJA", "SON"])).count() |
| 61 | +<xarray.Dataset> Size: 43kB |
| 62 | +Dimensions: (season: 4, lat: 25, lon: 53) |
| 63 | +Coordinates: |
| 64 | + * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 |
| 65 | + * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 |
| 66 | + * season (season) object 32B 'DJF' 'MAM' 'JJA' 'SON' |
| 67 | +Data variables: |
| 68 | + air (season, lat, lon) int64 42kB 720 720 720 720 ... 728 728 728 728 |
| 69 | +``` |
| 70 | + |
| 71 | +Overlapping seasons are supported: |
| 72 | + |
| 73 | +``` |
| 74 | +>>> ds.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).count() |
| 75 | +<xarray.Dataset> Size: 43kB |
| 76 | +Dimensions: (lat: 25, lon: 53, season: 4) |
| 77 | +Coordinates: |
| 78 | + * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 |
| 79 | + * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 |
| 80 | + * season (season) object 32B 'DJFM' 'MAMJ' 'JJAS' 'SOND' |
| 81 | +Data variables: |
| 82 | + air (lat, lon, season) int64 42kB 968 976 976 976 ... 968 976 976 976 |
| 83 | +Attributes: (5) |
| 84 | +``` |
| 85 | + |
| 86 | +### SeasonResampler |
| 87 | + |
| 88 | +```python |
| 89 | +>>> from xarray.groupers import SeasonResampler |
| 90 | +>>> ds.groupby(time=SeasonResampler(["DJF", "MAM", "JJA", "SON"], drop_incomplete=True)).count() |
| 91 | +<xarray.Dataset> Size: 75kB |
| 92 | +Dimensions: (time: 7, lat: 25, lon: 53) |
| 93 | +Coordinates: |
| 94 | + * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 |
| 95 | + * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 |
| 96 | + * time (time) datetime64[ns] 56B 2013-03-01 2013-06-01 ... 2014-09-01 |
| 97 | +Data variables: |
| 98 | + air (time, lat, lon) int64 74kB 368 368 368 368 368 ... 364 364 364 364 |
| 99 | +Attributes: (5) |
| 100 | +``` |
| 101 | + |
| 102 | +Note that the first month starts in `2013-03-01`! |
| 103 | +The incomplete DJF season starting in Dec-2012 is ignored (this datasset begins in Jan 2013). |
| 104 | +To avoid this behaviour pass `drop_incomplete=False` |
| 105 | + |
| 106 | +```python |
| 107 | +>>> ds.groupby(time=SeasonResampler(["DJF", "MAM", "JJA", "SON"], drop_incomplete=False)).count() |
| 108 | +<xarray.Dataset> Size: 96kB |
| 109 | +Dimensions: (time: 9, lat: 25, lon: 53) |
| 110 | +Coordinates: |
| 111 | + * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 |
| 112 | + * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 |
| 113 | + * time (time) datetime64[ns] 72B 2012-12-01 2013-03-01 ... 2014-12-01 |
| 114 | +Data variables: |
| 115 | + air (time, lat, lon) int64 95kB 236 236 236 236 236 ... 124 124 124 124 |
| 116 | +Attributes: (5) |
| 117 | +``` |
| 118 | + |
| 119 | +This result starts in `Jan-2013`! |
| 120 | + |
| 121 | +Seasons need not be of the same length: |
| 122 | + |
| 123 | +```python |
| 124 | +>>> ds.groupby(time=SeasonResampler(["JF", "MAM", "JJAS", "OND"])).count() |
| 125 | +<xarray.Dataset> Size: 85kB |
| 126 | +Dimensions: (time: 8, lat: 25, lon: 53) |
| 127 | +Coordinates: |
| 128 | + * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 |
| 129 | + * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 |
| 130 | + * time (time) datetime64[ns] 64B 2013-01-01 2013-03-01 ... 2014-10-01 |
| 131 | +Data variables: |
| 132 | + air (time, lat, lon) int64 85kB 236 236 236 236 236 ... 368 368 368 368 |
| 133 | +Attributes: (5) |
| 134 | +``` |
| 135 | + |
| 136 | +### Multiple groupers |
| 137 | + |
| 138 | +These new Grouper objects compose well with grouping over other arrays ([see blog post](https://xarray.dev/blog/multiple-groupers/)), for example |
| 139 | + |
| 140 | +``` |
| 141 | +>>> from xarray.groupers import BinGrouper |
| 142 | +>>> |
| 143 | +>>> ds.groupby(lat=BinGrouper(bins=2), time=SeasonResampler(["JF", "MAM", "JJAS", "OND"])).count() |
| 144 | +<xarray.Dataset> Size: 7kB |
| 145 | +Dimensions: (lon: 53, lat_bins: 2, time: 8) |
| 146 | +Coordinates: |
| 147 | + * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 |
| 148 | + * lat_bins (lat_bins) interval[float64, right] 32B (14.94, 45.0] (45.0, 75.0] |
| 149 | + * time (time) datetime64[ns] 64B 2013-01-01 2013-03-01 ... 2014-10-01 |
| 150 | +Data variables: |
| 151 | + air (lon, lat_bins, time) int64 7kB 3068 4784 6344 ... 4416 5856 4416 |
| 152 | +Attributes: (5) |
| 153 | +``` |
| 154 | + |
| 155 | +## How does this work? |
| 156 | + |
| 157 | +Xarray's GroupBy API implements the split-apply-combine pattern (Wickham, 2011) which applies to a very large number of problems: histogramming, compositing, climatological averaging, resampling to a different time frequency, etc. |
| 158 | +The first step in doing so is converting group labels of arbitrary type to integer codes — ["factorization"](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#reshaping-factorize). |
| 159 | +[Grouper objects](https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md) provide an extension point that allow users and downstream libraries to plug in custom factorization strategies. |
| 160 | +Here we do exactly that to handle the complexities of seasonal grouping ([example](https://github.com/pydata/xarray/blob/34efef2192a65e0f26a340ae305b0d3ed9e91b19/xarray/groupers.py#L764)). |
| 161 | +Given the user's definition of seasons, we construct the appropriate array of integer codes and run the aggregation as usual. |
| 162 | + |
| 163 | +## Limitations |
| 164 | + |
| 165 | +1. `SeasonGrouper` does not support the `drop_incomplete` option yet. This would be a great contribution. |
| 166 | +2. `SeasonResampler` does not support overlapping seasons. This seems much harder to solve. |
| 167 | + |
| 168 | +## Summary |
| 169 | + |
| 170 | +Two new [Grouper](https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md) objects - [`SeasonGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonGrouper.html#xarray.groupers.SeasonGrouper) and [`SeasonResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonResampler.html#xarray.groupers.SeasonResampler) - enable ergonomic seasonal aggregations with Xarray. |
| 171 | +The Grouper API is not public yet, but (hopefully) will be soon. |
| 172 | +If you have a use-case for domain-specific Grouper objects, please [open an issue](https://github.com/pydata/xarray/issues/new/choose)! |
| 173 | + |
| 174 | +## Acknowledgments |
| 175 | + |
| 176 | +Many thanks to [Thomas Vo](http://tomvo.me/career) and [Olivier Marti](https://www.lsce.ipsl.fr/en/pisp/olivier-marti/) for contributing any tests, and testing out the pull request. |
| 177 | +Thanks also to [Martin Yeo ](https://trexfeathers.github.io) for contributing a very clever [idea](https://github.com/pydata/xarray/discussions/6180#discussioncomment-9141495) on how to do grouping by overlapping seasons. |
0 commit comments