A major use case for xarray is multi-dimensional time-series data. Accordingly, we've copied many of features that make working with time-series data in pandas such a joy to xarray. In most cases, we rely on pandas for the core functionality.
.. ipython:: python :suppress: import numpy as np import pandas as pd import xarray as xr np.random.seed(123456)
Xarray uses the numpy dtypes datetime64[unit]
and timedelta64[unit]
(where unit is one of "s"
, "ms"
, "us"
and "ns"
) to represent datetime
data, which offer vectorized operations with numpy and smooth integration with pandas.
To convert to or create regular arrays of datetime64
data, we recommend
using :py:func:`pandas.to_datetime` and :py:func:`pandas.date_range`:
.. ipython:: python pd.to_datetime(["2000-01-01", "2000-02-02"]) pd.DatetimeIndex( ["2000-01-01 00:00:00", "2000-02-02 00:00:00"], dtype="datetime64[s]" ) pd.date_range("2000-01-01", periods=365) pd.date_range("2000-01-01", periods=365, unit="s")
It is also possible to use corresponding :py:func:`xarray.date_range`:
.. ipython:: python xr.date_range("2000-01-01", periods=365) xr.date_range("2000-01-01", periods=365, unit="s")
Note
Care has to be taken to create the output with the wanted resolution.
For :py:func:`pandas.date_range` the unit
-kwarg has to be specified
and for :py:func:`pandas.to_datetime` the selection of the resolution
isn't possible at all. For that :py:class:`pd.DatetimeIndex` can be used
directly. There is more in-depth information in section
:ref:`internals.timecoding`.
Alternatively, you can supply arrays of Python datetime
objects. These get
converted automatically when used as arguments in xarray objects (with us-resolution):
.. ipython:: python import datetime xr.Dataset({"time": datetime.datetime(2000, 1, 1)})
When reading or writing netCDF files, xarray automatically decodes datetime and
timedelta arrays using CF conventions (that is, by using a units
attribute like 'days since 2000-01-01'
).
Note
When decoding/encoding datetimes for non-standard calendars or for dates
before 1582-10-15, xarray uses the cftime library by default.
It was previously packaged with the netcdf4-python
package under the
name netcdftime
but is now distributed separately. cftime
is an
:ref:`optional dependency<installing>` of xarray.
You can manual decode arrays in this form by passing a dataset to :py:func:`~xarray.decode_cf`:
.. ipython:: python attrs = {"units": "hours since 2000-01-01"} ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)}) # Default decoding to 'ns'-resolution xr.decode_cf(ds) # Decoding to 's'-resolution coder = xr.coders.CFDatetimeCoder(time_unit="s") xr.decode_cf(ds, decode_times=coder)
From xarray 2025.01.2 the resolution of the dates can be one of "s"
, "ms"
, "us"
or "ns"
. One limitation of using datetime64[ns]
is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a store contains dates outside of these bounds (or dates < 1582-10-15 with a Gregorian, also known as standard, calendar), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex` will be used for indexing.
:py:class:`~xarray.CFTimeIndex` enables most of the indexing functionality of a :py:class:`pandas.DatetimeIndex`.
See :ref:`CFTimeIndex` for more information.
Xarray borrows powerful indexing machinery from pandas (see :ref:`indexing`).
This allows for several useful and succinct forms of indexing, particularly for
datetime64
data. For example, we support indexing with strings for single
items and with the slice
object:
.. ipython:: python time = pd.date_range("2000-01-01", freq="h", periods=365 * 24) ds = xr.Dataset({"foo": ("time", np.arange(365 * 24)), "time": time}) ds.sel(time="2000-01") ds.sel(time=slice("2000-06-01", "2000-06-10"))
You can also select a particular time by indexing with a :py:class:`datetime.time` object:
.. ipython:: python ds.sel(time=datetime.time(12))
For more details, read the pandas documentation and the section on :ref:`datetime_component_indexing` (i.e. using the .dt
accessor).
Similar to pandas accessors, the components of datetime objects contained in a
given DataArray
can be quickly computed using a special .dt
accessor.
.. ipython:: python time = pd.date_range("2000-01-01", freq="6h", periods=365 * 4) ds = xr.Dataset({"foo": ("time", np.arange(365 * 4)), "time": time}) ds.time.dt.hour ds.time.dt.dayofweek
The .dt
accessor works on both coordinate dimensions as well as
multi-dimensional data.
Xarray also supports a notion of "virtual" or "derived" coordinates for datetime components implemented by pandas, including "year", "month", "day", "hour", "minute", "second", "dayofyear", "week", "dayofweek", "weekday" and "quarter":
.. ipython:: python ds["time.month"] ds["time.dayofyear"]
For use as a derived coordinate, xarray adds 'season'
to the list of
datetime components supported by pandas:
.. ipython:: python ds["time.season"] ds["time"].dt.season
The set of valid seasons consists of 'DJF', 'MAM', 'JJA' and 'SON', labeled by the first letters of the corresponding months.
You can use these shortcuts with both Datasets and DataArray coordinates.
In addition, xarray supports rounding operations floor
, ceil
, and round
. These operations require that you supply a rounding frequency as a string argument.
.. ipython:: python ds["time"].dt.floor("D")
The .dt
accessor can also be used to generate formatted datetime strings
for arrays utilising the same formatting as the standard datetime.strftime.
.. ipython:: python ds["time"].dt.strftime("%a, %b %d %H:%M")
You can use use the .dt
accessor when subsetting your data as well. For example, we can subset for the month of January using the following:
.. ipython:: python ds.isel(time=(ds.time.dt.month == 1))
You can also search for multiple months (in this case January through March), using isin
:
.. ipython:: python ds.isel(time=ds.time.dt.month.isin([1, 2, 3]))
Datetime components couple particularly well with grouped operations (see :ref:`groupby`) for analyzing features that repeat over time. Here's how to calculate the mean by time of day:
.. ipython:: python :okwarning: ds.groupby("time.hour").mean()
For upsampling or downsampling temporal resolutions, xarray offers a
:py:meth:`~xarray.Dataset.resample` method building on the core functionality
offered by the pandas method of the same name. Resample uses essentially the
same api as resample
in pandas.
For example, we can downsample our dataset from hourly to 6-hourly:
.. ipython:: python :okwarning: ds.resample(time="6h")
This will create a specialized Resample
object which saves information
necessary for resampling. All of the reduction methods which work with
Resample
objects can also be used for resampling:
.. ipython:: python :okwarning: ds.resample(time="6h").mean()
You can also supply an arbitrary reduction function to aggregate over each resampling group:
.. ipython:: python ds.resample(time="6h").reduce(np.mean)
You can also resample on the time dimension while applying reducing along other dimensions at the same time
by specifying the dim
keyword argument
ds.resample(time="6h").mean(dim=["time", "latitude", "longitude"])
For upsampling, xarray provides six methods: asfreq
, ffill
, bfill
, pad
,
nearest
and interpolate
. interpolate
extends scipy.interpolate.interp1d
and supports all of its schemes. All of these resampling operations work on both
Dataset and DataArray objects with an arbitrary number of dimensions.
In order to limit the scope of the methods ffill
, bfill
, pad
and
nearest
the tolerance
argument can be set in coordinate units.
Data that has indices outside of the given tolerance
are set to NaN
.
.. ipython:: python ds.resample(time="1h").nearest(tolerance="1h")
It is often desirable to center the time values after a resampling operation. That can be accomplished by updating the resampled dataset time coordinate values using time offset arithmetic via the pandas.tseries.frequencies.to_offset function.
.. ipython:: python resampled_ds = ds.resample(time="6h").mean() offset = pd.tseries.frequencies.to_offset("6h") / 2 resampled_ds["time"] = resampled_ds.get_index("time") + offset resampled_ds
For more examples of using grouped operations on a time dimension, see :doc:`../examples/weather-data`.