Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
>>> from cyberpandas import IPArray
>>> import pandas as pd
>>>
>>> df1 = pd.DataFrame({
... 'address': IPArray(['192.168.1.1', '192.168.1.10']),
... 'date': ['2022-01-01', '2022-01-02'],
... 'a': [1, 2]
... })
>>> df1 = df1.set_index(['address', 'date'])
>>>
>>>
>>> df2 = pd.DataFrame({
... 'address': IPArray(['192.168.1.1', '192.168.1.10']),
... 'date': pd.to_datetime(['2022-01-01', '2022-01-02']),
... 'a': [1, 2]
... })
>>> df2 = df2.set_index(['address', 'date'])
>>>
>>> df1.index.dtypes
address ip
date object
dtype: object
>>>
>>> df2.index.dtypes
address ip
date datetime64[ns]
dtype: object
>>>
>>> df1.index.union(df2.index).dtypes
address object # <-- should be type "ip", not "object"
date datetime64[ns]
dtype: object
Issue Description
The ExtensionType can get lost when two MultiIndex objects are combined by .union()
(which becomes a problem when using df.combine_first(...)
which relies on index.union(...)
).
The problem occurs when both MIs share the same EA series, but the other series (assuming only 2-series MI) has a different type. In that case, the former EA dimension of the joined MI is losing its EA dtype.
Expected Behavior
EA type can be maintained after index.union(...)
.
Installed Versions
INSTALLED VERSIONS
commit : e8093ba
python : 3.8.13.final.0
python-bits : 64
OS : Darwin
OS-release : 21.5.0
Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8
pandas : 1.4.3
numpy : 1.23.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 62.3.2
pip : 22.1.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None