Skip to content

Commit ee6b3a3

Browse files
authored
SeasonGrouper/SeasonResampler blogpost (#777)
1 parent 0fe2627 commit ee6b3a3

File tree

1 file changed

+177
-0
lines changed

1 file changed

+177
-0
lines changed

src/posts/season-grouping/index.md

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
title: 'Ergonomic seasonal grouping and resampling'
3+
date: '2025-06-10'
4+
authors:
5+
- name: Deepak Cherian
6+
github: dcherian
7+
summary: 'Introducing new SeasonalGrouper and SeasonResampler objects'
8+
---
9+
10+
## TLDR
11+
12+
Two new [Grouper](https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md) objects - [`SeasonGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonGrouper.html#xarray.groupers.SeasonGrouper) and [`SeasonResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonResampler.html#xarray.groupers.SeasonResampler) - enable ergonomic seasonal aggregations of Xarray objects. See the [docs](https://docs.xarray.dev/en/latest/user-guide/time-series.html#handling-seasons) for more.
13+
14+
## The Problem
15+
16+
Xarray has supported seasonal grouping using `ds.groupby("time.season")` for a very long time.
17+
Seasonal resampling has been supported using [pandas syntax](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling) `ds.resample(time="QS-Dec")`.
18+
19+
These approaches have significant limitations
20+
21+
1. Custom season definitions are not possible. This is a very common user request ([1](https://github.com/pydata/xarray/discussions/6180), [2](https://github.com/pydata/xarray/discussions/5134), [3](https://github.com/pydata/xarray/discussions/6865), [4](https://stackoverflow.com/questions/68455725/how-to-enable-season-selection-as-jjas-instead-of-jja-in-xarray), [5](https://stackoverflow.com/questions/69021082/december-january-seasonal-mean)).
22+
- The `"time.season"` 'virtual variable' (or `time.dt.season`) hardcodes the Northern Hemisphere-centric three-month season definitions namely `["DJF", "MAM", "JJA", "SON"]`.
23+
- The pandas resampling syntax is more powerful but is still limited to three month seasons, even though the start date can be changed (e.g. `QS-Aug` means 'quarters starting in August').
24+
- A common annoyance with `groupby('time.season')` is that seasons come out in alphabetical (nonsensical) order — `["DJF", "JJA", "MAM", "SON"]` — a consequence of this really being a 'categorical' reduction under the hood.
25+
2. Seasons spanning the end of the year (e.g DJF) need to be handled specially, in many cases we want to ignore any months in incompletely sampled seasons. As an example, for a time series beginning in Jan-2001 we'd prefer the DJF season beginning in Dec-2000 to be ignored.
26+
3. Overlapping seasons are a common request: `["DJFM", "MAMJ", "JJAS", "SOND"]`.
27+
28+
## The Solution
29+
30+
Our new Grouper objects - [`SeasonGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonGrouper.html#xarray.groupers.SeasonGrouper) and [`SeasonResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonResampler.html#xarray.groupers.SeasonResampler) - help solve nearly all the above problems.
31+
All of the GroupBy API is supported (reductions, iteration, `map`, etc.).
32+
33+
## Examples
34+
35+
### Load data
36+
37+
Load in our classic example dataset:
38+
39+
````python
40+
>>> import xarray as xr
41+
>>>
42+
>>> ds = xr.tutorial.open_dataset("air_temperature")
43+
>>> ds
44+
<xarray.Dataset> Size: 31MB
45+
Dimensions: (lat: 25, time: 2920, lon: 53)
46+
Coordinates:
47+
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
48+
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
49+
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
50+
Data variables:
51+
air (time, lat, lon) float64 31MB ...```
52+
Attributes: (5)
53+
````
54+
55+
### SeasonGrouper
56+
57+
```python
58+
>>> from xarray.groupers import SeasonGrouper
59+
>>>
60+
>>> ds.groupby(time=SeasonGrouper(["DJF", "MAM", "JJA", "SON"])).count()
61+
<xarray.Dataset> Size: 43kB
62+
Dimensions: (season: 4, lat: 25, lon: 53)
63+
Coordinates:
64+
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
65+
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
66+
* season (season) object 32B 'DJF' 'MAM' 'JJA' 'SON'
67+
Data variables:
68+
air (season, lat, lon) int64 42kB 720 720 720 720 ... 728 728 728 728
69+
```
70+
71+
Overlapping seasons are supported:
72+
73+
```
74+
>>> ds.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).count()
75+
<xarray.Dataset> Size: 43kB
76+
Dimensions: (lat: 25, lon: 53, season: 4)
77+
Coordinates:
78+
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
79+
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
80+
* season (season) object 32B 'DJFM' 'MAMJ' 'JJAS' 'SOND'
81+
Data variables:
82+
air (lat, lon, season) int64 42kB 968 976 976 976 ... 968 976 976 976
83+
Attributes: (5)
84+
```
85+
86+
### SeasonResampler
87+
88+
```python
89+
>>> from xarray.groupers import SeasonResampler
90+
>>> ds.groupby(time=SeasonResampler(["DJF", "MAM", "JJA", "SON"], drop_incomplete=True)).count()
91+
<xarray.Dataset> Size: 75kB
92+
Dimensions: (time: 7, lat: 25, lon: 53)
93+
Coordinates:
94+
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
95+
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
96+
* time (time) datetime64[ns] 56B 2013-03-01 2013-06-01 ... 2014-09-01
97+
Data variables:
98+
air (time, lat, lon) int64 74kB 368 368 368 368 368 ... 364 364 364 364
99+
Attributes: (5)
100+
```
101+
102+
Note that the first month starts in `2013-03-01`!
103+
The incomplete DJF season starting in Dec-2012 is ignored (this datasset begins in Jan 2013).
104+
To avoid this behaviour pass `drop_incomplete=False`
105+
106+
```python
107+
>>> ds.groupby(time=SeasonResampler(["DJF", "MAM", "JJA", "SON"], drop_incomplete=False)).count()
108+
<xarray.Dataset> Size: 96kB
109+
Dimensions: (time: 9, lat: 25, lon: 53)
110+
Coordinates:
111+
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
112+
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
113+
* time (time) datetime64[ns] 72B 2012-12-01 2013-03-01 ... 2014-12-01
114+
Data variables:
115+
air (time, lat, lon) int64 95kB 236 236 236 236 236 ... 124 124 124 124
116+
Attributes: (5)
117+
```
118+
119+
This result starts in `Jan-2013`!
120+
121+
Seasons need not be of the same length:
122+
123+
```python
124+
>>> ds.groupby(time=SeasonResampler(["JF", "MAM", "JJAS", "OND"])).count()
125+
<xarray.Dataset> Size: 85kB
126+
Dimensions: (time: 8, lat: 25, lon: 53)
127+
Coordinates:
128+
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
129+
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
130+
* time (time) datetime64[ns] 64B 2013-01-01 2013-03-01 ... 2014-10-01
131+
Data variables:
132+
air (time, lat, lon) int64 85kB 236 236 236 236 236 ... 368 368 368 368
133+
Attributes: (5)
134+
```
135+
136+
### Multiple groupers
137+
138+
These new Grouper objects compose well with grouping over other arrays ([see blog post](https://xarray.dev/blog/multiple-groupers/)), for example
139+
140+
```
141+
>>> from xarray.groupers import BinGrouper
142+
>>>
143+
>>> ds.groupby(lat=BinGrouper(bins=2), time=SeasonResampler(["JF", "MAM", "JJAS", "OND"])).count()
144+
<xarray.Dataset> Size: 7kB
145+
Dimensions: (lon: 53, lat_bins: 2, time: 8)
146+
Coordinates:
147+
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
148+
* lat_bins (lat_bins) interval[float64, right] 32B (14.94, 45.0] (45.0, 75.0]
149+
* time (time) datetime64[ns] 64B 2013-01-01 2013-03-01 ... 2014-10-01
150+
Data variables:
151+
air (lon, lat_bins, time) int64 7kB 3068 4784 6344 ... 4416 5856 4416
152+
Attributes: (5)
153+
```
154+
155+
## How does this work?
156+
157+
Xarray's GroupBy API implements the split-apply-combine pattern (Wickham, 2011) which applies to a very large number of problems: histogramming, compositing, climatological averaging, resampling to a different time frequency, etc.
158+
The first step in doing so is converting group labels of arbitrary type to integer codes — ["factorization"](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#reshaping-factorize).
159+
[Grouper objects](https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md) provide an extension point that allow users and downstream libraries to plug in custom factorization strategies.
160+
Here we do exactly that to handle the complexities of seasonal grouping ([example](https://github.com/pydata/xarray/blob/34efef2192a65e0f26a340ae305b0d3ed9e91b19/xarray/groupers.py#L764)).
161+
Given the user's definition of seasons, we construct the appropriate array of integer codes and run the aggregation as usual.
162+
163+
## Limitations
164+
165+
1. `SeasonGrouper` does not support the `drop_incomplete` option yet. This would be a great contribution.
166+
2. `SeasonResampler` does not support overlapping seasons. This seems much harder to solve.
167+
168+
## Summary
169+
170+
Two new [Grouper](https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md) objects - [`SeasonGrouper`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonGrouper.html#xarray.groupers.SeasonGrouper) and [`SeasonResampler`](https://docs.xarray.dev/en/latest/generated/xarray.groupers.SeasonResampler.html#xarray.groupers.SeasonResampler) - enable ergonomic seasonal aggregations with Xarray.
171+
The Grouper API is not public yet, but (hopefully) will be soon.
172+
If you have a use-case for domain-specific Grouper objects, please [open an issue](https://github.com/pydata/xarray/issues/new/choose)!
173+
174+
## Acknowledgments
175+
176+
Many thanks to [Thomas Vo](http://tomvo.me/career) and [Olivier Marti](https://www.lsce.ipsl.fr/en/pisp/olivier-marti/) for contributing any tests, and testing out the pull request.
177+
Thanks also to [Martin Yeo ](https://trexfeathers.github.io) for contributing a very clever [idea](https://github.com/pydata/xarray/discussions/6180#discussioncomment-9141495) on how to do grouping by overlapping seasons.

0 commit comments

Comments
 (0)