Skip to content

pivot_table args index and observed don't play well #28372

@luranhe

Description

@luranhe

Code Sample, a copy-pastable example if possible

cat_df = pd.DataFrame()
cat_df['letters'] = ['a', 'b', 'c']
cat_df['letters'] = cat_df['letters'].astype('category')
cat_df['nums'] = [0, 1, 2]
cat_df['nums'] = cat_df['nums'].astype('category')
cat_df['count'] = 1
# when I set an index, I don't get all the columns!
cat_df.iloc[:2].pivot_table(aggfunc=sum, columns=['nums'], values='count', index='letters')

Problem description

I would expect the default behavior of pivot_table to yield precisely 1 column per category of a categorical grouper when pivoting over a single categorical column. This was not the case when I passed in an index -- I only ended up with the observed categories for columns, even when passing observed=False explicitly!

I am working on an application where I want to pivot with an index and group by multiple columns (e.g. column A with categories a1, a2, ..., column B with categories b1, b2, ..., etc.), and I was surprised to discover that if any combination of categories (e.g. a1, b1, ...) didn't appear in the same row, I would not get a corresponding column for it.

As an aside, I would guess that the correct behavior would be to have all the nans in the output from above by 0, but that is seems important.

Expected Output

nums    0  1  2
letters
a        1.0  0.0  0.0
b        0.0  1.0  0.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.133-88.112.amzn1.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

Labels

BugCategoricalCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, Explode

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions