Skip to content

concat and append ignore level names and order in multi-level index DataFrames #28311

@jxrossel

Description

@jxrossel

Code Sample

>>> tmp = pd.DataFrame( { 't': [0,0], 'h': [1,2], 'k': [3,4] } )
>>> tmp
   t  h  k
0  0  1  3
1  0  2  4
>>> tmp2 = tmp.set_index( ['h','k'] )
>>> tmp2
     t
h k
1 3  0
2 4  0
>>> tmp3 = tmp.set_index( ['k','h'] ) # swap levels
>>> tmp3
     t
k h
3 1  0
4 2  0
>>> tmp2.append( tmp3, sort=False ) # k-level values are appended to h-level values and vice-versa
     t
h k
1 3  0
2 4  0
3 1  0
4 2  0
>>> pd.concat( [ tmp2, tmp3 ], sort=False )
     t
h k
1 3  0
2 4  0
3 1  0
4 2  0
>>> tmp4 = pd.DataFrame( { 't': [ 0,0], 'h': [1,2], 'p': [3,4] } ).set_index( ['p','h'] ) # change level name
>>> tmp4
     t
p h
3 1  0
4 2  0
>>> tmp2.append( tmp4, sort=False ) # level name ignored altogether
     t
h k
1 3  0
2 4  0
3 1  0
4 2  0

Problem description

The multiindex level names are ignored at concatenation (as described here: #10187). In addition, even with common level names, the level order is also ignored.

I am not saying this is a bug, but since frame columns are aligned at concatenation it's a bit of a surprise to see index levels are not. Should a warning be added to the docs of both concat and append ?

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.6.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.1
pytz : 2018.3
dateutil : 2.7.2
pip : 9.0.3
setuptools : 39.0.1
Cython : 0.28.1
pytest : 3.5.0
hypothesis : None
sphinx : 1.7.2
blosc : 1.5.1
feather : 0.4.0
xlsxwriter : 1.0.2
lxml.etree : 4.2.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 6.2.1
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fastparquet : 0.1.4
gcsfs : None
lxml.etree : 4.2.1
matplotlib : 3.0.2
numexpr : 2.6.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.9.0
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.2.5
tables : 3.4.2
xarray : 0.10.2
xlrd : 1.1.0
xlwt : None
xlsxwriter : 1.0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions