Skip to content

Commit e147a51

Browse files
Modernize regrid_time and allow setting a common calendar for decadal, yearly, and monthly data (#2311)
Co-authored-by: Valeriu Predoi <[email protected]>
1 parent afde692 commit e147a51

File tree

6 files changed

+485
-441
lines changed

6 files changed

+485
-441
lines changed

doc/recipe/preprocessor.rst

+54-9
Original file line numberDiff line numberDiff line change
@@ -1248,8 +1248,7 @@ The ``_time.py`` module contains the following preprocessor functions:
12481248
* resample_time_: Resample data
12491249
* resample_hours_: Convert between N-hourly frequencies by resampling
12501250
* anomalies_: Compute (standardized) anomalies
1251-
* regrid_time_: Aligns the time axis of each dataset to have common time
1252-
points and calendars.
1251+
* regrid_time_: Aligns the time coordinate of each dataset, against a standardized time axis.
12531252
* timeseries_filter_: Allows application of a filter to the time-series data.
12541253
* local_solar_time_: Convert cube with UTC time to local solar time.
12551254

@@ -1642,13 +1641,59 @@ See also :func:`esmvalcore.preprocessor.anomalies`.
16421641
``regrid_time``
16431642
---------------
16441643

1645-
This function aligns the time points of each component dataset so that the Iris
1646-
cubes from different datasets can be subtracted. The operation makes the
1647-
datasets time points common; it also resets the time
1648-
bounds and auxiliary coordinates to reflect the artificially shifted time
1649-
points. Current implementation for monthly and daily data; the ``frequency`` is
1650-
set automatically from the variable CMOR table unless a custom ``frequency`` is
1651-
set manually by the user in recipe.
1644+
This function aligns the time points and bounds of an input dataset according
1645+
to the following rules:
1646+
1647+
* Decadal data: 1 January 00:00:00 for the given year.
1648+
Example: 1 January 2005 00:00:00 for given year 2005 (decade 2000-2010).
1649+
* Yearly data: 1 July 00:00:00 for each year.
1650+
Example: 1 July 1993 00:00:00 for the year 1993.
1651+
* Monthly data: 15th day 00:00:00 for each month.
1652+
Example: 15 October 1993 00:00:00 for the month October 1993.
1653+
* Daily data: 12:00:00 for each day.
1654+
Example: 14 March 1996 12:00:00 for the day 14 March 1996.
1655+
* `n`-hourly data where `n` is a divisor of 24: center of each time interval.
1656+
Example: 03:00:00 for interval 00:00:00-06:00:00 (6-hourly data), 16:30:00
1657+
for interval 15:00:00-18:00:00 (3-hourly data), or 09:30:00 for interval
1658+
09:00:00-10:00:00 (hourly data).
1659+
1660+
The frequency of the input data is automatically determined from the CMOR table
1661+
of the corresponding variable, but can be overwritten in the recipe if
1662+
necessary.
1663+
This function does not alter the data in any way.
1664+
1665+
.. note::
1666+
1667+
By default, this preprocessor will not change the calendar of the input time
1668+
coordinate.
1669+
For decadal, yearly, and monthly data, it is possible to change the calendar
1670+
using the optional `calendar` argument.
1671+
Be aware that changing the calendar might introduce (small) errors to your
1672+
data, especially for extensive quantities (those that depend on the period
1673+
length).
1674+
1675+
Parameters:
1676+
* `frequency`: Data frequency.
1677+
If not given, use the one from the CMOR tables of the corresponding
1678+
variable.
1679+
* `calendar`: If given, transform the calendar to the one specified
1680+
(examples: `standard`, `365_day`, etc.).
1681+
This only works for decadal, yearly and monthly data, and will raise an
1682+
error for other frequencies.
1683+
If not set, the calendar will not be changed.
1684+
* `units` (default: `days since 1850-01-01 00:00:00`): Reference time units
1685+
used if the calendar of the data is changed.
1686+
Ignored if `calendar` is not set.
1687+
1688+
Examples:
1689+
1690+
Change the input calendar to `standard` and use custom units:
1691+
1692+
.. code-block:: yaml
1693+
1694+
regrid_time:
1695+
calendar: standard
1696+
units: days since 2000-01-01
16521697
16531698
See also :func:`esmvalcore.preprocessor.regrid_time`.
16541699

esmvalcore/_recipe/recipe.py

+3-5
Original file line numberDiff line numberDiff line change
@@ -147,13 +147,11 @@ def _update_target_grid(dataset, datasets, settings):
147147
_spec_to_latlonvals(**target_grid)
148148

149149

150-
def _update_regrid_time(dataset, settings):
150+
def _update_regrid_time(dataset: Dataset, settings: dict) -> None:
151151
"""Input data frequency automatically for regrid_time preprocessor."""
152-
regrid_time = settings.get('regrid_time')
153-
if regrid_time is None:
152+
if 'regrid_time' not in settings:
154153
return
155-
frequency = settings.get('regrid_time', {}).get('frequency')
156-
if not frequency:
154+
if 'frequency' not in settings['regrid_time']:
157155
settings['regrid_time']['frequency'] = dataset.facets['frequency']
158156

159157

esmvalcore/cmor/_fixes/shared.py

+38-39
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
"""Shared functions for fixes."""
22
import logging
33
import os
4-
from datetime import datetime
4+
from datetime import datetime, timedelta
55
from functools import lru_cache
66

77
import dask.array as da
@@ -453,12 +453,12 @@ def get_next_month(month: int, year: int) -> tuple[int, int]:
453453
def get_time_bounds(time: Coord, freq: str) -> np.ndarray:
454454
"""Get bounds for time coordinate.
455455
456-
For monthly data, use the first day of the current month and the first day
457-
of the next month. For yearly or decadal data, use 1 January of the current
458-
year and 1 January of the next year or 10 years from the current year. For
459-
other frequencies (daily, 6-hourly, 3-hourly, hourly), half of the
460-
frequency is subtracted/added from the current point in time to get the
461-
bounds.
456+
For decadal data, use 1 January 5 years before/after the current year. For
457+
yearly data, use 1 January of the current year and 1 January of the next
458+
year. For monthly data, use the first day of the current month and the
459+
first day of the next month. For other frequencies (daily or `n`-hourly,
460+
where `n` is a divisor of 24), half of the frequency is subtracted/added
461+
from the current point in time to get the bounds.
462462
463463
Parameters
464464
----------
@@ -480,39 +480,38 @@ def get_time_bounds(time: Coord, freq: str) -> np.ndarray:
480480
"""
481481
bounds = []
482482
dates = time.units.num2date(time.points)
483-
for step, date in enumerate(dates):
484-
month = date.month
485-
year = date.year
486-
if freq in ['mon', 'mo']:
487-
next_month, next_year = get_next_month(month, year)
488-
min_bound = date2num(datetime(year, month, 1, 0, 0),
489-
time.units, time.dtype)
490-
max_bound = date2num(datetime(next_year, next_month, 1, 0, 0),
491-
time.units, time.dtype)
492-
elif freq == 'yr':
493-
min_bound = date2num(datetime(year, 1, 1, 0, 0),
494-
time.units, time.dtype)
495-
max_bound = date2num(datetime(year + 1, 1, 1, 0, 0),
496-
time.units, time.dtype)
497-
elif freq == 'dec':
498-
min_bound = date2num(datetime(year, 1, 1, 0, 0),
499-
time.units, time.dtype)
500-
max_bound = date2num(datetime(year + 10, 1, 1, 0, 0),
501-
time.units, time.dtype)
502-
else:
503-
delta = {
504-
'day': 12.0 / 24,
505-
'6hr': 3.0 / 24,
506-
'3hr': 1.5 / 24,
507-
'1hr': 0.5 / 24,
508-
}
509-
if freq not in delta:
483+
484+
for date in dates:
485+
if 'dec' in freq:
486+
min_bound = datetime(date.year - 5, 1, 1, 0, 0)
487+
max_bound = datetime(date.year + 5, 1, 1, 0, 0)
488+
elif 'yr' in freq:
489+
min_bound = datetime(date.year, 1, 1, 0, 0)
490+
max_bound = datetime(date.year + 1, 1, 1, 0, 0)
491+
elif 'mon' in freq or freq == 'mo':
492+
next_month, next_year = get_next_month(date.month, date.year)
493+
min_bound = datetime(date.year, date.month, 1, 0, 0)
494+
max_bound = datetime(next_year, next_month, 1, 0, 0)
495+
elif 'day' in freq:
496+
min_bound = date - timedelta(hours=12.0)
497+
max_bound = date + timedelta(hours=12.0)
498+
elif 'hr' in freq:
499+
(n_hours_str, _, _) = freq.partition('hr')
500+
if not n_hours_str:
501+
n_hours = 1
502+
else:
503+
n_hours = int(n_hours_str)
504+
if 24 % n_hours:
510505
raise NotImplementedError(
511-
f"Cannot guess time bounds for frequency '{freq}'"
506+
f"For `n`-hourly data, `n` must be a divisor of 24, got "
507+
f"'{freq}'"
512508
)
513-
point = time.points[step]
514-
min_bound = point - delta[freq]
515-
max_bound = point + delta[freq]
509+
min_bound = date - timedelta(hours=n_hours / 2.0)
510+
max_bound = date + timedelta(hours=n_hours / 2.0)
511+
else:
512+
raise NotImplementedError(
513+
f"Cannot guess time bounds for frequency '{freq}'"
514+
)
516515
bounds.append([min_bound, max_bound])
517516

518-
return np.array(bounds)
517+
return date2num(np.array(bounds), time.units, time.dtype)

0 commit comments

Comments
 (0)