-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Code Sample, a copy-pastable example if possible
from numpy import nan
a = np.random.rand(3,4)
a[:,-1] = range(3)
b = np.random.rand(3,4)
b[:,-1] = range(3)
dfa = pd.DataFrame(a,columns=pd.MultiIndex.from_tuples([("n","a"),("n","b"),("n","c"),("x",)]))
dfb = pd.DataFrame(a,columns=pd.MultiIndex.from_tuples([("m","a"),("m","b"),("m","c"),("x",)]))
pd.merge(dfa, dfb, how="outer", on="x")
yields: "ValueError: The column label 'x' is not unique"
Problem description
It would be cumbersome and degrade readability to write on=("x", np.nan)
instead of on="x"
. especially if the real word example would be more like on=("x", np.nan, np.nan, np.nan, np.nan, np.nan)
I think an easy solution would be to add here https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L1384
sth. like if values.ndim == 2 and values.shape[0] == 1: return values[0]
But actually I was wondering for a long tome, why MultiIndex
- automatically fills
np.nan
for unknown values BUT - for selecting treats
''
(empty string) as "skippable" levels
(My wording is probably not very clear, but Expected Output should clarify my point)
Expected Output
the same as with
dfa = pd.DataFrame(a,columns=pd.MultiIndex.from_tuples([("n","a"),("n","b"),("n","c"),("x","")]))
dfb = pd.DataFrame(a,columns=pd.MultiIndex.from_tuples([("m","a"),("m","b"),("m","c"),("x","")]))
pd.merge(dfa, dfb, how="outer", on="x")
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-119-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.utf8
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.23.0.dev0+38.g6552718
pytest: 2.8.7
pip: 9.0.1
setuptools: 20.7.0
Cython: 0.23.4
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.3.6
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.9999999
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None