-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New defaults for concat
, merge
, combine_*
#10062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
5c56acf
5461a9f
e16834f
9c50125
b0cf17a
0026ee8
4d4deda
5a4036b
912638b
67fd4ff
4f38292
51ccc89
aa3180e
93d2abc
e517dcc
0e678e5
37f0147
dac337c
a0c16c3
4eb275c
03f1502
f1649b8
7dbdd4a
c6a557b
9667857
42cf522
8d0d390
ba45599
90bd629
d3b484f
f233294
20a3dbd
324714a
c4d9f74
38ef42d
eb14402
729b8ba
aca67b9
63c5905
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,7 +43,6 @@ new dimension by stacking lower dimensional arrays together: | |
|
||
.. ipython:: python | ||
|
||
da.sel(x="a") | ||
xr.concat([da.isel(x=0), da.isel(x=1)], "x") | ||
|
||
If the second argument to ``concat`` is a new dimension name, the arrays will | ||
|
@@ -52,15 +51,18 @@ dimension: | |
|
||
.. ipython:: python | ||
|
||
xr.concat([da.isel(x=0), da.isel(x=1)], "new_dim") | ||
da0 = da.isel(x=0, drop=True) | ||
da1 = da.isel(x=1, drop=True) | ||
|
||
xr.concat([da0, da1], "new_dim") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dropping the overlapping "x" means that you don't get a future warning anymore and the outcome won't change with the new defaults. It seemed to me like it was maintaining the spirit of the docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd change to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That one will give a In [3]: xr.concat([da.isel(x=[0]), da.isel(x=[1])], "new_dim")
<ipython-input-3-8d3fee24c8e4>:1: FutureWarning: In a future version of xarray the default value for join will change from join='outer' to join='exact'. This change will result in the following ValueError:cannot be aligned with join='exact' because index/labels/sizes are not equal along these coordinates (dimensions): 'x' ('x',) The recommendation is to set join explicitly for this case.
xr.concat([da.isel(x=[0]), da.isel(x=[1])], "new_dim")
Out[3]:
<xarray.DataArray (new_dim: 2, x: 2, y: 3)> Size: 96B
array([[[ 0., 1., 2.],
[nan, nan, nan]],
[[nan, nan, nan],
[ 3., 4., 5.]]])
Coordinates:
* x (x) <U1 8B 'a' 'b'
* y (y) int64 24B 10 20 30
Dimensions without coordinates: new_dim We can add an explicit join value to get rid of the warning or we can allow the docs to build with the warning (I think that is not a good idea because warnings in docs might scare people) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Compared with that with the example as it is on main: In [3]: xr.concat([da.isel(x=0), da.isel(x=1)], "new_dim")
<ipython-input-8-5e17a4052d18>:1: FutureWarning: In a future version of xarray the default value for coords will change from coords='different' to coords='minimal'. This is likely to lead to different results when multiple datasets have matching variables with overlapping values. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set coords explicitly.
xr.concat([da.isel(x=0), da.isel(x=1)], "new_dim")
Out[3]:
<xarray.DataArray (new_dim: 2, y: 3)> Size: 48B
array([[0, 1, 2],
[3, 4, 5]])
Coordinates:
x (new_dim) <U1 8B 'a' 'b'
* y (y) int64 24B 10 20 30
Dimensions without coordinates: new_dim There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we keep this as suggested in the PR I'd go with da0 = da.isel(x=0, drop=True)
da0 = da.isel(x=1, drop=True) |
||
|
||
The second argument to ``concat`` can also be an :py:class:`~pandas.Index` or | ||
:py:class:`~xarray.DataArray` object as well as a string, in which case it is | ||
used to label the values along the new dimension: | ||
|
||
.. ipython:: python | ||
|
||
xr.concat([da.isel(x=0), da.isel(x=1)], pd.Index([-90, -100], name="new_dim")) | ||
xr.concat([da0, da1], pd.Index([-90, -100], name="new_dim")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. |
||
|
||
Of course, ``concat`` also works on ``Dataset`` objects: | ||
|
||
|
@@ -75,6 +77,12 @@ between datasets. With the default parameters, xarray will load some coordinate | |
variables into memory to compare them between datasets. This may be prohibitively | ||
expensive if you are manipulating your dataset lazily using :ref:`dask`. | ||
|
||
.. note:: | ||
|
||
In a future version of xarray the default values for many of these options | ||
will change. You can opt into the new default values early using | ||
``xr.set_options(use_new_combine_kwarg_defaults=True)``. | ||
|
||
.. _merge: | ||
|
||
Merge | ||
|
@@ -94,10 +102,18 @@ If you merge another dataset (or a dictionary including data array objects), by | |
default the resulting dataset will be aligned on the **union** of all index | ||
coordinates: | ||
|
||
.. note:: | ||
|
||
In a future version of xarray the default value for ``join`` and ``compat`` | ||
will change. This change will mean that xarray will no longer attempt | ||
to align the indices of the merged dataset. You can opt into the new default | ||
values early using ``xr.set_options(use_new_combine_kwarg_defaults=True)``. | ||
Or explicitly set ``join='outer'`` to preserve old behavior. | ||
|
||
.. ipython:: python | ||
|
||
other = xr.Dataset({"bar": ("x", [1, 2, 3, 4]), "x": list("abcd")}) | ||
xr.merge([ds, other]) | ||
xr.merge([ds, other], join="outer") | ||
|
||
This ensures that ``merge`` is non-destructive. ``xarray.MergeError`` is raised | ||
if you attempt to merge two variables with the same name but different values: | ||
|
@@ -114,6 +130,16 @@ if you attempt to merge two variables with the same name but different values: | |
array([[ 1.4691123 , 0.71713666, -0.5090585 ], | ||
[-0.13563237, 2.21211203, 0.82678535]]) | ||
|
||
.. note:: | ||
|
||
In a future version of xarray the default value for ``compat`` will change | ||
from ``compat='no_conflicts'`` to ``compat='override'``. In this scenario | ||
the values in the first object override all the values in other objects. | ||
|
||
.. ipython:: python | ||
|
||
xr.merge([ds, ds + 1], compat="override") | ||
|
||
The same non-destructive merging between ``DataArray`` index coordinates is | ||
used in the :py:class:`~xarray.Dataset` constructor: | ||
|
||
|
@@ -144,6 +170,11 @@ For datasets, ``ds0.combine_first(ds1)`` works similarly to | |
there are conflicting values in variables to be merged, whereas | ||
``.combine_first`` defaults to the calling object's values. | ||
|
||
.. note:: | ||
|
||
In a future version of xarray the default options for ``xr.merge`` will change | ||
such that the behavior matches ``combine_first``. | ||
|
||
.. _update: | ||
|
||
Update | ||
|
@@ -236,7 +267,7 @@ coordinates as long as any non-missing values agree or are disjoint: | |
|
||
ds1 = xr.Dataset({"a": ("x", [10, 20, 30, np.nan])}, {"x": [1, 2, 3, 4]}) | ||
ds2 = xr.Dataset({"a": ("x", [np.nan, 30, 40, 50])}, {"x": [2, 3, 4, 5]}) | ||
xr.merge([ds1, ds2], compat="no_conflicts") | ||
xr.merge([ds1, ds2], join="outer", compat="no_conflicts") | ||
|
||
Note that due to the underlying representation of missing values as floating | ||
point numbers (``NaN``), variable data type is not always preserved when merging | ||
|
@@ -295,13 +326,12 @@ they are concatenated in order based on the values in their dimension | |
coordinates, not on their position in the list passed to ``combine_by_coords``. | ||
|
||
.. ipython:: python | ||
:okwarning: | ||
|
||
x1 = xr.DataArray(name="foo", data=np.random.randn(3), coords=[("x", [0, 1, 2])]) | ||
x2 = xr.DataArray(name="foo", data=np.random.randn(3), coords=[("x", [3, 4, 5])]) | ||
xr.combine_by_coords([x2, x1]) | ||
|
||
These functions can be used by :py:func:`~xarray.open_mfdataset` to open many | ||
These functions are used by :py:func:`~xarray.open_mfdataset` to open many | ||
files as one dataset. The particular function used is specified by setting the | ||
argument ``'combine'`` to ``'by_coords'`` or ``'nested'``. This is useful for | ||
situations where your data is split across many files in multiple locations, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,7 +34,7 @@ | |
) | ||
from xarray.backends.locks import _get_scheduler | ||
from xarray.coders import CFDatetimeCoder, CFTimedeltaCoder | ||
from xarray.core import indexing | ||
from xarray.core import dtypes, indexing | ||
from xarray.core.dataarray import DataArray | ||
from xarray.core.dataset import Dataset | ||
from xarray.core.datatree import DataTree | ||
|
@@ -50,6 +50,13 @@ | |
_nested_combine, | ||
combine_by_coords, | ||
) | ||
from xarray.util.deprecation_helpers import ( | ||
_COMPAT_DEFAULT, | ||
_COORDS_DEFAULT, | ||
_DATA_VARS_DEFAULT, | ||
_JOIN_DEFAULT, | ||
CombineKwargDefault, | ||
) | ||
|
||
if TYPE_CHECKING: | ||
try: | ||
|
@@ -1404,14 +1411,16 @@ def open_mfdataset( | |
| Sequence[Index] | ||
| None | ||
) = None, | ||
compat: CompatOptions = "no_conflicts", | ||
compat: CompatOptions | CombineKwargDefault = _COMPAT_DEFAULT, | ||
preprocess: Callable[[Dataset], Dataset] | None = None, | ||
engine: T_Engine = None, | ||
data_vars: Literal["all", "minimal", "different"] | list[str] = "all", | ||
coords="different", | ||
data_vars: Literal["all", "minimal", "different"] | ||
| list[str] | ||
| CombineKwargDefault = _DATA_VARS_DEFAULT, | ||
coords=_COORDS_DEFAULT, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know anything about the context and I'm really bad at typing (so feel free to disregard / punt to a different PR), but shouldn't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably? I was trying to limit the scope of this PR as much as possible, since it's already pretty big. So I would prefer to punt this. When you add types there is always the possibility of breaking a bunch of stuff... |
||
combine: Literal["by_coords", "nested"] = "by_coords", | ||
parallel: bool = False, | ||
join: JoinOptions = "outer", | ||
join: JoinOptions | CombineKwargDefault = _JOIN_DEFAULT, | ||
attrs_file: str | os.PathLike | None = None, | ||
combine_attrs: CombineAttrsOptions = "override", | ||
**kwargs, | ||
|
@@ -1656,6 +1665,7 @@ def open_mfdataset( | |
ids=ids, | ||
join=join, | ||
combine_attrs=combine_attrs, | ||
fill_value=dtypes.NA, | ||
) | ||
elif combine == "by_coords": | ||
# Redo ordering from coordinates, ignoring how they were ordered | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1628,7 +1628,14 @@ def _combine(self, applied, shortcut=False): | |
if shortcut: | ||
combined = self._concat_shortcut(applied, dim, positions) | ||
else: | ||
combined = concat(applied, dim) | ||
combined = concat( | ||
applied, | ||
dim, | ||
data_vars="all", | ||
coords="different", | ||
compat="equals", | ||
join="outer", | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hard-coded these to the old defaults since there is no way for the user to set them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with this approach. These options result in confusing groupby behaviour (#2145) but we can tackle that later |
||
combined = _maybe_reorder(combined, dim, positions, N=self.group1d.size) | ||
|
||
if isinstance(combined, type(self._obj)): | ||
|
@@ -1789,7 +1796,14 @@ def _combine(self, applied): | |
"""Recombine the applied objects like the original.""" | ||
applied_example, applied = peek_at(applied) | ||
dim, positions = self._infer_concat_args(applied_example) | ||
combined = concat(applied, dim) | ||
combined = concat( | ||
applied, | ||
dim, | ||
data_vars="all", | ||
coords="different", | ||
compat="equals", | ||
join="outer", | ||
) | ||
combined = _maybe_reorder(combined, dim, positions, N=self.group1d.size) | ||
# assign coord when the applied function does not return that coord | ||
if dim not in applied_example.dims: | ||
|
Uh oh!
There was an error while loading. Please reload this page.