Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix): extension array indexers #9671

Open
wants to merge 187 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
187 commits
Select commit Hold shift + click to select a range
7b5f323
implement default_precision_timestamp, refactor coding/times.py and c…
kmuehlbauer Oct 10, 2024
8784f33
align tests with new time resolution behaviour
kmuehlbauer Oct 10, 2024
b45ab23
timedelta decoding, fsspec handling
kmuehlbauer Oct 10, 2024
39086ef
fixes in coding/times.py
kmuehlbauer Oct 13, 2024
df49a40
add docs on time coding
kmuehlbauer Oct 13, 2024
adb8ca3
attempt fixing doc tests
kmuehlbauer Oct 13, 2024
266b1ed
fix issue where out-of-bounds floating point values slipped in the pr…
kmuehlbauer Oct 14, 2024
6d5f13b
convert to UTC first before stripping of tz in _unpack_time_units_and…
kmuehlbauer Oct 14, 2024
5d68bfe
reorganize pandas compatibility code, remove unneeded code, attempt t…
kmuehlbauer Oct 14, 2024
07bba69
another attempt to finally fix mypy
kmuehlbauer Oct 14, 2024
6e7f0bb
refactor out _check_date_is_after_shift
kmuehlbauer Oct 14, 2024
b4a49bb
refactor out _maybe_strip_tz_from_timestamp
kmuehlbauer Oct 14, 2024
2e1ff4f
more refactoring in coding.times.py
kmuehlbauer Oct 14, 2024
d5a7da0
more refactoring in coding.times.py
kmuehlbauer Oct 14, 2024
821b68d
minor fix in time-coding.rst
kmuehlbauer Oct 14, 2024
d066edf
set default resolution to "s", which actually means, use pandas lowes…
kmuehlbauer Oct 14, 2024
ed22da1
Add section for default units, fix options
kmuehlbauer Oct 14, 2024
8bf23f4
attempt to fix typing
kmuehlbauer Oct 14, 2024
c3a2b39
attempt to fix typing
kmuehlbauer Oct 14, 2024
3c44aed
fix scalar datetime/timedelta
kmuehlbauer Oct 15, 2024
48be73a
fix user docs
kmuehlbauer Oct 15, 2024
7ac9983
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2024
d86ad04
Fix variable tests, mostly datetime/timedelta is inittialized with us…
kmuehlbauer Oct 18, 2024
b5d0795
revert changes in _possible_convert_objects, this needs to be checked…
kmuehlbauer Oct 18, 2024
60324f0
fix doc link
kmuehlbauer Oct 18, 2024
c2bc4df
(fix): allow all extension array data types in pandas adapters
ilan-gold Oct 23, 2024
84569bc
(fix): dataframes have no `array` attr
ilan-gold Oct 23, 2024
90e390d
(fix): allow chunked numpy extension arrays because of `test_pandas_a…
ilan-gold Oct 24, 2024
7c32bd0
(fix): dtypes for `PandasIndex`
ilan-gold Oct 24, 2024
795ecf6
(chore): remove test for unnecessary conversion
ilan-gold Oct 24, 2024
8eca6e9
(revert): don't let through so much in `as_compatible_data`
ilan-gold Oct 24, 2024
fb91812
(fix): account for series -> numpy conversions
ilan-gold Oct 25, 2024
a06f2b1
(fix): ensure dtype check is for numpy type
ilan-gold Oct 25, 2024
14027e8
(fix): convert pandas `IntervalArray`
ilan-gold Oct 25, 2024
a47a96f
Merge branch 'main' into ig/fix_extension_indexer
ilan-gold Oct 25, 2024
6f2861a
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 8, 2024
1f07500
Apply suggestions from code review
kmuehlbauer Nov 8, 2024
798b444
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 8, 2024
f487599
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 16, 2024
20d6c9d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 16, 2024
7391948
remove outdated description
kmuehlbauer Nov 16, 2024
308091c
use set instead list
kmuehlbauer Nov 16, 2024
5f40b4e
remove global option
kmuehlbauer Nov 16, 2024
2a65d8d
mypy thinks `unit` is Literal, because the pandas-stubs suggest so, b…
kmuehlbauer Nov 17, 2024
43f7d61
ignore mypy arg-type
kmuehlbauer Nov 17, 2024
59934b9
fix docstring of `default_precision_timestamp`
kmuehlbauer Nov 17, 2024
a01f9f3
add 'time_unit'-kwarg to decode_cf and descendent functions with "ns"…
kmuehlbauer Nov 17, 2024
8b91128
fix tests
kmuehlbauer Nov 17, 2024
0e351ca
fix more tests
kmuehlbauer Nov 17, 2024
07a8e9c
fix docstring
kmuehlbauer Nov 17, 2024
2be5739
use pd.Timestamp(np.datetime64(cftime)) to convert from cftime to numpy
kmuehlbauer Nov 17, 2024
b9d0a8e
use dt = np.datetime64(cftime.isoformat()) to convert from cftime to …
kmuehlbauer Nov 18, 2024
08afc3b
fix time-coding.rst
kmuehlbauer Nov 18, 2024
edc55e1
use us in to_datetimeindex
kmuehlbauer Nov 18, 2024
bffe919
revert back to us for datetimeindex tests
kmuehlbauer Nov 18, 2024
150b982
estimate fitting resolution for floating point values, when decoding …
kmuehlbauer Nov 18, 2024
7113ceb
add test
kmuehlbauer Nov 18, 2024
7f47f0b
refactor floating point decoding
kmuehlbauer Nov 18, 2024
512808d
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 18, 2024
63c83f4
simplify recursive function, update tests
kmuehlbauer Nov 18, 2024
0efbbeb
more refactoring, update tests
kmuehlbauer Nov 19, 2024
2910250
add fixture, apply fixture to more tests.
kmuehlbauer Nov 19, 2024
57d8d72
update time-coding.rst
kmuehlbauer Nov 19, 2024
5333240
fix typing
kmuehlbauer Nov 19, 2024
6f35c81
try to fix test, remove stale print
kmuehlbauer Nov 19, 2024
d0c17a4
another attempt to fix test
kmuehlbauer Nov 19, 2024
b2b6bb1
debug failing test
kmuehlbauer Nov 19, 2024
5dbc8a7
refactor cftime fallback in datetime decoding
kmuehlbauer Nov 21, 2024
be0d3e0
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 21, 2024
f95408a
fix merge-collission
kmuehlbauer Nov 21, 2024
609e15c
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 21, 2024
ec7f165
use CFDatetimeCoder instance to transport unit/use_cftime
kmuehlbauer Nov 22, 2024
1f1cf1c
decode_times with CFDatetimeCoder
kmuehlbauer Nov 25, 2024
14b1a88
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 25, 2024
05627dd
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 25, 2024
e7cbf3a
fix mypy, warning/error
kmuehlbauer Nov 26, 2024
fc87e04
api, docs, docstrings
kmuehlbauer Nov 26, 2024
9ae645e
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 26, 2024
6e3ca57
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 27, 2024
277d1c6
docs, whats-new.rst
kmuehlbauer Nov 27, 2024
81a9d94
fix whats-new.rst
kmuehlbauer Nov 27, 2024
be8642f
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 27, 2024
f3f62e5
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Dec 2, 2024
c07df41
Merge remote-tracking branch 'origin/main' into any-time-resolution-2
kmuehlbauer Dec 10, 2024
ae49850
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
e8f5aa8
Merge branch 'main' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
9653a01
fix tests after merge
kmuehlbauer Dec 10, 2024
a405f03
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Dec 10, 2024
503b313
(fix): `dtype` type handling
ilan-gold Dec 11, 2024
c8ab8f3
(fix): move out of type checking block
ilan-gold Dec 11, 2024
66e5b06
(fix): satisfy mypy
ilan-gold Dec 11, 2024
f9fde3a
(fix): doctest
ilan-gold Dec 11, 2024
8a3e834
(fix): `nbytes` test?
ilan-gold Dec 11, 2024
f5822fd
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Dec 12, 2024
66c0b9f
Apply suggestions from code review
kmuehlbauer Dec 13, 2024
ba51274
provide CFDatetimeCoder from xarray.coders
kmuehlbauer Dec 13, 2024
3ba3e3f
provide CFDatetimeCoder from xarray.coders
kmuehlbauer Dec 13, 2024
1ab43eb
provide CFDatetimeCoder from xarray.coders
kmuehlbauer Dec 13, 2024
45ba9d3
fix tests as suggested by code review
kmuehlbauer Dec 13, 2024
091a90d
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Dec 14, 2024
ab3c9ed
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 14, 2024
53fe43a
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Dec 16, 2024
a16a890
Move scalar handling logic into `_possibly_convert_objects` as sugges…
kmuehlbauer Dec 16, 2024
4283f8a
Add note on ``proleptic_gregorian`` calendar
kmuehlbauer Dec 16, 2024
0ba848d
remove time_resolution from docstring
kmuehlbauer Dec 16, 2024
6cb8702
update time.coding.rst wrt default time unit
kmuehlbauer Dec 16, 2024
5de8d0d
fix empty array
kmuehlbauer Dec 16, 2024
fc985d9
revert some tests to align with scalar logic handling
kmuehlbauer Dec 16, 2024
799b750
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Dec 16, 2024
a2d8e69
split out CFDatetimeCoder into coders, deprecate use_cftime as keywor…
kmuehlbauer Nov 22, 2024
d6fe956
add whats-new.rst entry
kmuehlbauer Dec 17, 2024
bd6a5d1
Apply suggestions from code review
kmuehlbauer Dec 17, 2024
6557ef9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 17, 2024
759fb72
fix warning
kmuehlbauer Dec 17, 2024
2118191
fix docstrings
kmuehlbauer Dec 17, 2024
262295a
try fix typing
kmuehlbauer Dec 17, 2024
941c4b5
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Dec 18, 2024
6e41425
Merge branch 'main' into coders
kmuehlbauer Dec 18, 2024
adebafa
Apply suggestions from code review
kmuehlbauer Dec 30, 2024
6cd81e5
Apply suggestions from code review
kmuehlbauer Dec 30, 2024
1ae9a22
Merge branch 'main' into coders
kmuehlbauer Dec 30, 2024
1cec644
Update xarray/conventions.py
kmuehlbauer Dec 30, 2024
60aed87
Merge branch 'main' into coders
kmuehlbauer Jan 1, 2025
797dc85
Merge branch 'main' into any-time-resolution-2-wip
kmuehlbauer Jan 1, 2025
225c5b3
remove duplicate function (introduced when merging main)
kmuehlbauer Jan 1, 2025
33a1563
Update deprecated directive
kmuehlbauer Jan 2, 2025
4efe8b0
merge main into any-time-resolution-2
kmuehlbauer Jan 3, 2025
21a0ec6
Merge branch 'main' into coders
kmuehlbauer Jan 3, 2025
48dea20
merge coders into any-time-resolution-2
kmuehlbauer Jan 3, 2025
1145f4b
fix typing
kmuehlbauer Jan 3, 2025
a9990cf
re-fix doctests
kmuehlbauer Jan 3, 2025
5fa630f
merge main into any-time-resolution-2
kmuehlbauer Jan 4, 2025
43c85d1
fix whats-new.rst after merging main
kmuehlbauer Jan 4, 2025
a4702d6
Apply suggestions from code review
kmuehlbauer Jan 4, 2025
9bd292a
Apply suggestions from code review
kmuehlbauer Jan 4, 2025
25b797e
rewrite recursive function using for-loop
kmuehlbauer Jan 5, 2025
3bd8cf4
remove astype-construct in _possibly_convert_objects
kmuehlbauer Jan 5, 2025
8b9c85a
Update xarray/coding/times.py
kmuehlbauer Jan 5, 2025
2555d89
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Jan 7, 2025
3b2d861
add suggestions from code review
kmuehlbauer Jan 7, 2025
66e181c
rephrase per suggestion
kmuehlbauer Jan 7, 2025
e380968
add article per suggestion
kmuehlbauer Jan 7, 2025
305938c
Apply suggestions from code review
kmuehlbauer Jan 7, 2025
b32b02c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 7, 2025
a2c46b1
fix scalar handling for timedelta based indexer
kmuehlbauer Jan 7, 2025
fa2c4b6
remove stale error message and "ignore:Converting non-default" in tes…
kmuehlbauer Jan 7, 2025
c65c9af
add per review suggestions
kmuehlbauer Jan 7, 2025
21dffc1
add/remove todo
kmuehlbauer Jan 7, 2025
8eeeb78
rename timeunit -> format
kmuehlbauer Jan 7, 2025
7ad2183
return "ns" resolution per default for timedeltas, if not specified
kmuehlbauer Jan 7, 2025
9e4cab6
Be specific on types/dtpyes
kmuehlbauer Jan 7, 2025
5964a9e
add comment
kmuehlbauer Jan 7, 2025
308391d
add suggestions from code review
kmuehlbauer Jan 7, 2025
0e886d6
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Jan 7, 2025
80dc10b
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Jan 7, 2025
d494fe0
fix docs
kmuehlbauer Jan 8, 2025
ef6f722
fix test which isn't run for numpy2 atm
kmuehlbauer Jan 8, 2025
4ea5241
add notes on to_datetime section, update examples showing usage of 'a…
kmuehlbauer Jan 8, 2025
151e9cd
use np.timedelta64 for to_timedelta example, update as_unit example, …
kmuehlbauer Jan 8, 2025
8ecda4e
remove note
kmuehlbauer Jan 8, 2025
2bbf0ff
Apply suggestions from code review
kmuehlbauer Jan 8, 2025
0308672
refactor timedelta decoding to _numbers_to_timedelta and res-use it w…
kmuehlbauer Jan 9, 2025
b043020
fix conventions test, add todo
kmuehlbauer Jan 9, 2025
7182ce2
run times through pd.Timestamp to catch possible overflows
kmuehlbauer Jan 9, 2025
470235e
fix tests for cftime_to_nptime
kmuehlbauer Jan 9, 2025
e619a4c
fix cftime_to_nptime in cftimeindex
kmuehlbauer Jan 9, 2025
700e78d
introduce pd.Timestamp instance check
kmuehlbauer Jan 9, 2025
4525ea1
warn if out-of-bound datetimes are encoded with standard calendar, fa…
kmuehlbauer Jan 9, 2025
0b93dbd
fix time-coding.rst, add reference to time-series.rst.
kmuehlbauer Jan 9, 2025
b38cd7e
try to fix typing, ignore one
kmuehlbauer Jan 9, 2025
a2d1c96
try to fix docs
kmuehlbauer Jan 9, 2025
c4b2af3
revert doc-changes
kmuehlbauer Jan 9, 2025
45a0d56
Add a non-ns test for polyval, polyfit
dcherian Jan 9, 2025
3ef79cd
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Jan 10, 2025
ac719e8
more doc cosmetics
kmuehlbauer Jan 10, 2025
5292569
add whats-new.rst entry
kmuehlbauer Jan 10, 2025
ecd603b
add/fix coder docstring
kmuehlbauer Jan 10, 2025
f6716dc
add xr.date_range example as suggested per review
kmuehlbauer Jan 10, 2025
0556376
Apply suggestions from code review
kmuehlbauer Jan 13, 2025
ffc1828
Implement `time_unit` option for `decode_cf_timedelta` (#3)
spencerkclark Jan 13, 2025
eaf3c73
fix typing
kmuehlbauer Jan 13, 2025
1e6ba18
use nanmin/nanmax, catch numpy RuntimeWarnings
kmuehlbauer Jan 13, 2025
85a340b
Apply suggestions from code review
spencerkclark Jan 14, 2025
9d77885
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Jan 15, 2025
db69b63
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Jan 15, 2025
b120917
Merge branch 'any-time-resolution-2' into ig/fix_extension_indexer
ilan-gold Jan 15, 2025
f7cda22
merge
ilan-gold Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -6875,7 +6875,7 @@ def groupby(
[[nan, nan, nan],
[ 3., 4., 5.]]])
Coordinates:
* x_bins (x_bins) object 16B (5, 15] (15, 25]
* x_bins (x_bins) interval[int64, right] 16B (5, 15] (15, 25]
* letters (letters) object 16B 'a' 'b'
Dimensions without coordinates: y

Expand Down
2 changes: 1 addition & 1 deletion xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -10510,7 +10510,7 @@ def groupby(
<xarray.Dataset> Size: 128B
Dimensions: (y: 3, x_bins: 2, letters: 2)
Coordinates:
* x_bins (x_bins) object 16B (5, 15] (15, 25]
* x_bins (x_bins) interval[int64, right] 16B (5, 15] (15, 25]
* letters (letters) object 16B 'a' 'b'
Dimensions without coordinates: y
Data variables:
Expand Down
7 changes: 6 additions & 1 deletion xarray/core/indexes.py
Original file line number Diff line number Diff line change
Expand Up @@ -598,7 +598,10 @@ def __init__(
self.index = index
self.dim = dim

if coord_dtype is None:
if pd.api.types.is_extension_array_dtype(index.dtype):
cast(pd.api.extensions.ExtensionDtype, index.dtype)
coord_dtype = index.dtype
elif coord_dtype is None:
coord_dtype = get_valid_numpy_dtype(index)
self.coord_dtype = coord_dtype

Expand Down Expand Up @@ -695,6 +698,8 @@ def concat(

if not indexes:
coord_dtype = None
elif len(set(idx.coord_dtype for idx in indexes)) == 1:
coord_dtype = indexes[0].coord_dtype
else:
coord_dtype = np.result_type(*[idx.coord_dtype for idx in indexes])

Expand Down
54 changes: 40 additions & 14 deletions xarray/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@
from dataclasses import dataclass, field
from datetime import timedelta
from html import escape
from typing import TYPE_CHECKING, Any, overload
from typing import TYPE_CHECKING, Any, cast, overload

import numpy as np
import pandas as pd
from numpy.typing import DTypeLike
from packaging.version import Version

from xarray.core import duck_array_ops
Expand All @@ -33,8 +34,6 @@
from xarray.namedarray.pycompat import array_type, integer_types, is_chunked_array

if TYPE_CHECKING:
from numpy.typing import DTypeLike

from xarray.core.indexes import Index
from xarray.core.types import Self
from xarray.core.variable import Variable
Expand Down Expand Up @@ -1707,27 +1706,44 @@ class PandasIndexingAdapter(ExplicitlyIndexedNDArrayMixin):
__slots__ = ("_dtype", "array")

array: pd.Index
_dtype: np.dtype
_dtype: np.dtype | pd.api.extensions.ExtensionDtype

def __init__(self, array: pd.Index, dtype: DTypeLike = None):
def __init__(
self,
array: pd.Index,
dtype: DTypeLike | pd.api.extensions.ExtensionDtype | None = None,
):
from xarray.core.indexes import safe_cast_to_index

self.array = safe_cast_to_index(array)

if dtype is None:
self._dtype = get_valid_numpy_dtype(array)
if pd.api.types.is_extension_array_dtype(array.dtype):
cast(pd.api.extensions.ExtensionDtype, array.dtype)
self._dtype = array.dtype
else:
self._dtype = get_valid_numpy_dtype(array)
elif pd.api.types.is_extension_array_dtype(dtype):
self._dtype = cast(pd.api.extensions.ExtensionDtype, dtype)
else:
self._dtype = np.dtype(dtype)
self._dtype = np.dtype(cast(DTypeLike, dtype))

@property
def dtype(self) -> np.dtype:
def dtype(self) -> np.dtype | pd.api.extensions.ExtensionDtype: # type: ignore[override]
return self._dtype

def __array__(
self, dtype: np.typing.DTypeLike = None, /, *, copy: bool | None = None
self,
dtype: np.typing.DTypeLike | pd.api.extensions.ExtensionDtype | None = None,
/,
*,
copy: bool | None = None,
) -> np.ndarray:
if dtype is None:
dtype = self.dtype
if pd.api.types.is_extension_array_dtype(dtype):
dtype = get_valid_numpy_dtype(self.array)
dtype = cast(np.dtype, dtype)
array = self.array
if isinstance(array, pd.PeriodIndex):
with suppress(AttributeError):
Expand All @@ -1746,7 +1762,7 @@ def get_duck_array(self) -> np.ndarray:
def shape(self) -> _Shape:
return (len(self.array),)

def _convert_scalar(self, item):
def _convert_scalar(self, item) -> np.ndarray:
if item is pd.NaT:
# work around the impossibility of casting NaT with asarray
# note: it probably would be better in general to return
Expand All @@ -1762,7 +1778,10 @@ def _convert_scalar(self, item):
# numpy fails to convert pd.Timestamp to np.datetime64[ns]
item = np.asarray(item.to_datetime64())
elif self.dtype != object:
item = np.asarray(item, dtype=self.dtype)
dtype = self.dtype
if pd.api.types.is_extension_array_dtype(dtype):
dtype = get_valid_numpy_dtype(self.array)
item = np.asarray(item, dtype=cast(np.dtype, dtype))

# as for numpy.ndarray indexing, we always want the result to be
# a NumPy array.
Expand Down Expand Up @@ -1877,23 +1896,30 @@ class PandasMultiIndexingAdapter(PandasIndexingAdapter):
__slots__ = ("_dtype", "adapter", "array", "level")

array: pd.MultiIndex
_dtype: np.dtype
_dtype: np.dtype | pd.api.extensions.ExtensionDtype
level: str | None

def __init__(
self,
array: pd.MultiIndex,
dtype: DTypeLike = None,
dtype: DTypeLike | pd.api.extensions.ExtensionDtype | None = None,
level: str | None = None,
):
super().__init__(array, dtype)
self.level = level

def __array__(
self, dtype: np.typing.DTypeLike = None, /, *, copy: bool | None = None
self,
dtype: DTypeLike | pd.api.extensions.ExtensionDtype | None = None,
/,
*,
copy: bool | None = None,
) -> np.ndarray:
if dtype is None:
dtype = self.dtype
if pd.api.types.is_extension_array_dtype(dtype):
dtype = get_valid_numpy_dtype(self.array)
dtype = cast(np.dtype, dtype)
if self.level is not None:
return np.asarray(
self.array.get_level_values(self.level).values, dtype=dtype
Expand Down
10 changes: 4 additions & 6 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
import numpy as np
import pandas as pd
from numpy.typing import ArrayLike
from pandas.api.types import is_extension_array_dtype

import xarray as xr # only for Dataset and DataArray
from xarray.core import common, dtypes, duck_array_ops, indexing, nputils, ops, utils
Expand Down Expand Up @@ -412,6 +411,10 @@ def data(self):
if is_duck_array(self._data):
return self._data
elif isinstance(self._data, indexing.ExplicitlyIndexed):
if pd.api.types.is_extension_array_dtype(self._data) and isinstance(
self._data, PandasIndexingAdapter
):
return self._data.array
return self._data.get_duck_array()
else:
return self.values
Expand Down Expand Up @@ -2593,11 +2596,6 @@ def chunk( # type: ignore[override]
dask.array.from_array
"""

if is_extension_array_dtype(self):
raise ValueError(
f"{self} was found to be a Pandas ExtensionArray. Please convert to numpy first."
)

if from_array_kwargs is None:
from_array_kwargs = {}

Expand Down
8 changes: 6 additions & 2 deletions xarray/namedarray/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
)

import numpy as np
import pandas as pd

# TODO: get rid of this after migrating this class to array API
from xarray.core import dtypes, formatting, formatting_html
Expand Down Expand Up @@ -834,7 +835,10 @@ def chunk(
if chunkmanager.is_chunked_array(data_old):
data_chunked = chunkmanager.rechunk(data_old, chunks) # type: ignore[arg-type]
else:
if not isinstance(data_old, ExplicitlyIndexed):
if pd.api.types.is_extension_array_dtype(data_old.dtype):
# One of PandasExtensionArray or PandasIndexingAdapter?
ndata = data_old.array.to_numpy()
elif not isinstance(data_old, ExplicitlyIndexed):
ndata = data_old
else:
# Unambiguously handle array storage backends (like NetCDF4 and h5py)
Expand All @@ -845,7 +849,7 @@ def chunk(
# Using OuterIndexer is a pragmatic choice: dask does not yet handle
# different indexing types in an explicit way:
# https://github.com/dask/dask/issues/2883
ndata = ImplicitToExplicitIndexingAdapter(data_old, OuterIndexer) # type: ignore[assignment]
ndata = ImplicitToExplicitIndexingAdapter(data_old, OuterIndexer)

if is_dict_like(chunks):
chunks = tuple(chunks.get(n, s) for n, s in enumerate(ndata.shape))
Expand Down
2 changes: 1 addition & 1 deletion xarray/plot/dataarray_plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,7 +524,7 @@ def line(
assert hueplt is not None
ax.legend(handles=primitive, labels=list(hueplt.to_numpy()), title=hue_label)

if np.issubdtype(xplt.dtype, np.datetime64):
if isinstance(xplt.dtype, np.dtype) and np.issubdtype(xplt.dtype, np.datetime64):
_set_concise_date(ax, axis="x")

_update_axes(ax, xincrease, yincrease, xscale, yscale, xticks, yticks, xlim, ylim)
Expand Down
21 changes: 18 additions & 3 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -4317,8 +4317,13 @@ def test_setitem_pandas(self) -> None:
ds = self.make_example_math_dataset()
ds["x"] = np.arange(3)
ds_copy = ds.copy()
ds_copy["bar"] = ds["bar"].to_pandas()

series = ds["bar"].to_pandas()
# to_pandas will actually give the result where the internal array of the series is a NumpyExtensionArray
# but ds["bar"] is a numpy array.
# TODO: should assert_equal be updated to handle?
assert (ds["bar"] == series).all()
del ds["bar"]
del ds_copy["bar"]
assert_equal(ds, ds_copy)

def test_setitem_auto_align(self) -> None:
Expand Down Expand Up @@ -4909,6 +4914,16 @@ def test_to_and_from_dataframe(self) -> None:
expected = pd.DataFrame([[]], index=idx)
assert expected.equals(actual), (expected, actual)

def test_from_dataframe_categorical_dtype_index(self) -> None:
cat = pd.CategoricalIndex(list("abcd"))
df = pd.DataFrame({"f": [0, 1, 2, 3]}, index=cat)
ds = df.to_xarray()
restored = ds.to_dataframe()
df.index.name = (
"index" # restored gets the name because it has the coord with the name
)
pd.testing.assert_frame_equal(df, restored)

def test_from_dataframe_categorical_index(self) -> None:
cat = pd.CategoricalDtype(
categories=["foo", "bar", "baz", "qux", "quux", "corge"]
Expand All @@ -4933,7 +4948,7 @@ def test_from_dataframe_categorical_index_string_categories(self) -> None:
)
ser = pd.Series(1, index=cat)
ds = ser.to_xarray()
assert ds.coords.dtypes["index"] == np.dtype("O")
assert ds.coords.dtypes["index"] == ser.index.dtype

@requires_sparse
def test_from_dataframe_sparse(self) -> None:
Expand Down
3 changes: 2 additions & 1 deletion xarray/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1118,7 +1118,8 @@ def test_groupby_math_nD_group() -> None:
expected = da.isel(x=slice(30)) - expanded_mean
expected["labels"] = expected.labels.broadcast_like(expected.labels2d)
expected["num"] = expected.num.broadcast_like(expected.num2d)
expected["num2d_bins"] = (("x", "y"), mean.num2d_bins.data[idxr])
# mean.num2d_bins.data is a pandas IntervalArray so needs to be put in `numpy` to allow indexing
expected["num2d_bins"] = (("x", "y"), mean.num2d_bins.data.to_numpy()[idxr])
actual = g - mean
assert_identical(expected, actual)

Expand Down
31 changes: 15 additions & 16 deletions xarray/tests/test_variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -649,7 +649,7 @@ def test_pandas_categorical_dtype(self):
data = pd.Categorical(np.arange(10, dtype="int64"))
v = self.cls("x", data)
print(v) # should not error
assert v.dtype == "int64"
assert v.dtype == data.dtype

def test_pandas_datetime64_with_tz(self):
data = pd.date_range(
Expand All @@ -660,9 +660,12 @@ def test_pandas_datetime64_with_tz(self):
)
v = self.cls("x", data)
print(v) # should not error
if "America/New_York" in str(data.dtype):
# pandas is new enough that it has datetime64 with timezone dtype
assert v.dtype == "object"
if v.dtype == np.dtype("O"):
import dask.array as da

assert isinstance(v.data, da.Array)
else:
assert v.dtype == data.dtype

def test_multiindex(self):
idx = pd.MultiIndex.from_product([list("abc"), [0, 1]])
Expand Down Expand Up @@ -1578,14 +1581,6 @@ def test_pandas_categorical_dtype(self):
print(v) # should not error
assert pd.api.types.is_extension_array_dtype(v.dtype)

def test_pandas_categorical_no_chunk(self):
data = pd.Categorical(np.arange(10, dtype="int64"))
v = self.cls("x", data)
with pytest.raises(
ValueError, match=r".*was found to be a Pandas ExtensionArray.*"
):
v.chunk((5,))

def test_squeeze(self):
v = Variable(["x", "y"], [[1]])
assert_identical(Variable([], 1), v.squeeze())
Expand Down Expand Up @@ -2400,8 +2395,8 @@ def test_pad(self, mode, xr_arg, np_arg):

def test_pandas_categorical_dtype(self):
data = pd.Categorical(np.arange(10, dtype="int64"))
with pytest.raises(ValueError, match="was found to be a Pandas ExtensionArray"):
self.cls("x", data)
v = self.cls("x", data)
assert (v.data.compute() == data.to_numpy()).all()


@requires_sparse
Expand Down Expand Up @@ -2996,7 +2991,7 @@ def test_datetime_conversion(values, unit) -> None:
# todo: check for redundancy (suggested per review)
dims = ["time"] if isinstance(values, np.ndarray | pd.Index | pd.Series) else []
var = Variable(dims, values)
if var.dtype.kind == "M":
if var.dtype.kind == "M" and isinstance(var.dtype, np.dtype):
assert var.dtype == np.dtype(f"datetime64[{unit}]")
else:
# The only case where a non-datetime64 dtype can occur currently is in
Expand Down Expand Up @@ -3038,8 +3033,12 @@ def test_pandas_two_only_datetime_conversion_warnings(
# todo: check for redundancy (suggested per review)
var = Variable(["time"], data.astype(dtype)) # type: ignore[arg-type]

if var.dtype.kind == "M":
# we internally convert series to numpy representations to avoid too much nastiness with extension arrays
# when calling data.array e.g., with NumpyExtensionArrays
if isinstance(data, pd.Series):
assert var.dtype == np.dtype("datetime64[s]")
elif var.dtype.kind == "M":
assert var.dtype == dtype
else:
# The only case where a non-datetime64 dtype can occur currently is in
# the case that the variable is backed by a timezone-aware
Expand Down
Loading