Skip to content

pivot_table with MultiIndex and margin=True fails with ValueError #29834

@kurtforrester

Description

@kurtforrester

Code Sample, a copy-pastable example if possible

import string
import random
from datetime import datetime, timezone, date
import pandas as pd
import numpy as np

size_ = 100000
data_ = dict()

for n in range(2):
  cat_ = ["".join(cc) for cc in np.random.choice(
      [c for c in string.ascii_lowercase], size=(4,6))]
  data_.update({("categ", f"cdata.{n:02}"): 
                    pd.Categorical(
                        np.random.choice(cat_, size=size_,
                                         p=[.1, .5, .25, .15]),
                                   categories=cat_, 
                                   ordered=random.choice([True, False]))})

for n in range(2):
  data_.update({("float", f"fdata.{n:02}"): np.random.rand(size_)})

data_.update({
    ("period", "created"): np.random.choice(
        [date(2019,2,1), date(2019,5,1), date(2019,8,1), date(2019,11,1)],
        size=size_)})

_df = pd.DataFrame(data=data_)

# When margin=False the output is as expected
_df.pivot_table(
    values=[("float", "fdata.00"), ("float", "fdata.01")],
    index=[("categ", "cdata.00"), ("categ", "cdata.01")],
    columns=[("period", "created")],
    aggfunc=np.sum,
#    margins=True,
)

# When margins=True a ValueError is raised
_df.pivot_table(
    values=[("float", "fdata.00"), ("float", "fdata.01")],
    index=[("categ", "cdata.00"), ("categ", "cdata.01")],
    columns=[("period", "created")],
    aggfunc=np.sum,
    margins=True,
)

# This seems to be related to MultiIndex and Categorical but I am unsure of which.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-169-e6e29487da7e> in <module>()
      4     columns=[("period", "created")],
      5     aggfunc=np.sum,
----> 6     margins=True,
      7 )

4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed)
   6087             dropna=dropna,
   6088             margins_name=margins_name,
-> 6089             observed=observed,
   6090         )
   6091 

/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed)
    160             observed=dropna,
    161             margins_name=margins_name,
--> 162             fill_value=fill_value,
    163         )
    164 

/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/pivot.py in _add_margins(table, data, values, rows, cols, aggfunc, observed, margins_name, fill_value)
    206     if hasattr(table, "columns"):
    207         for level in table.columns.names[1:]:
--> 208             if margins_name in table.columns.get_level_values(level):
    209                 raise ValueError(msg)
    210 

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/multi.py in get_level_values(self, level)
   1596         Index(['d', 'e', 'f'], dtype='object', name='level_2')
   1597         """
-> 1598         level = self._get_level_number(level)
   1599         values = self._get_level_values(level)
   1600         return values

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/multi.py in _get_level_number(self, level)
   1290         if (count > 1) and not is_integer(level):
   1291             raise ValueError(
-> 1292                 "The name %s occurs multiple times, use a " "level number" % level
   1293             )
   1294         try:

ValueError: The name None occurs multiple times, use a level number

Problem description

A ValueError is unexpected. There seems to be an underlying issue with a number of pivot-like functions (pivot, pivot_table, crosstable) when processing MultiIndex and/or Categorical features. I am unsure as to the source of the issue and if this is indeed expected behaviour. The error suggests using a level number but that recommendation is non-specific as to where this would be applied.

Expected Output

I would expect an "All" column to be calculated based on the aggfunc and an error to not have been raised.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
/usr/local/lib/python3.6/dist-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
""")

INSTALLED VERSIONS

commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.137+
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.4
pytz : 2018.9
dateutil : 2.6.1
pip : 19.3.1
setuptools : 41.6.0
Cython : 0.29.14
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 5.5.0
pandas_datareader: 0.7.4
bs4 : 4.6.3
bottleneck : 1.3.0
fastparquet : None
gcsfs : None
lxml.etree : 4.2.6
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
s3fs : 0.4.0
scipy : 1.3.2
sqlalchemy : 1.3.11
tables : 3.4.4
xarray : 0.11.3
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions