Description
What happened?
I expected combine_by_coords to explicitly reject all cases where the coordinates overlap, producing a ValueError in cases like the following with coords [0, 1]
and [1, 2]
:
a = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [0, 1]})
b = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [1, 2]})
xarray.combine_by_coords([a, b])
As well as cases with larger overlaps e.g. [0, 1, 2]
and [1, 2, 3]
.
What did you expect to happen?
It fails to reject the case above with coordinates [0, 1]
and [1, 2]
, despite rejecting cases where the overlap is bigger (e.g. [0, 1, 2]
and [1, 2, 3]
). Instead it returns a DataArray with duplicate coordinates. (See example below).
Minimal Complete Verifiable Example
# This overlap is caught, as expected
a = xarray.DataArray(dims=('x',), data=np.ones((3,)), coords={'x': [0, 1, 2]})
b = xarray.DataArray(dims=('x',), data=np.ones((3,)), coords={'x': [1, 2, 3]})
xarray.combine_by_coords([a, b])
=> ValueError: Resulting object does not have monotonic global indexes along dimension x
# This overlap is not caught
a = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [0, 1]})
b = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [1, 2]})
xarray.combine_by_coords([a, b])
=> <xarray.DataArray (x: 4)>
array([1., 1., 1., 1.])
Coordinates:
* x (x) int64 0 1 1 2
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
No response
Anything else we need to know?
As far as I can tell this happens because indexes.is_monotonic_increasing or indexes.is_monotonic_decreasing
are not checking for strict monotonicity and allow consecutive values to be the same.
I assume it wasn't intentional to allow overlaps like this. If so, do you think anyone is depending on this (I'd hope not...) and would you take a PR to fix it to produce a ValueError in this case?
If this behaviour is intentional or relied upon, could we have an option to do a strict check instead?
Also for performance reasons I'd propose to do some extra upfront checks to catch index overlap (e.g. by checking for index overlap in _infer_concat_order_from_coords
), rather than doing a potentially large concat and only detecting duplicates afterwards.
Environment
xarray: 2022.06.0
pandas: 1.1.5
numpy: 1.23.2
scipy: 1.8.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.2.1
Nio: None
zarr: 2.7.0
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.3.4
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 0.7.4
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
IPython: 3.2.3
sphinx: None