-
-
Couldn't load subscription status.
- Fork 19.2k
Description
Code Sample, a copy-pastable example if possible
cat_df = pd.DataFrame()
cat_df['letters'] = ['a', 'b', 'c']
cat_df['letters'] = cat_df['letters'].astype('category')
cat_df['nums'] = [0, 1, 2]
cat_df['nums'] = cat_df['nums'].astype('category')
cat_df['count'] = 1
# when I set an index, I don't get all the columns!
cat_df.iloc[:2].pivot_table(aggfunc=sum, columns=['nums'], values='count', index='letters')Problem description
I would expect the default behavior of pivot_table to yield precisely 1 column per category of a categorical grouper when pivoting over a single categorical column. This was not the case when I passed in an index -- I only ended up with the observed categories for columns, even when passing observed=False explicitly!
I am working on an application where I want to pivot with an index and group by multiple columns (e.g. column A with categories a1, a2, ..., column B with categories b1, b2, ..., etc.), and I was surprised to discover that if any combination of categories (e.g. a1, b1, ...) didn't appear in the same row, I would not get a corresponding column for it.
As an aside, I would guess that the correct behavior would be to have all the nans in the output from above by 0, but that is seems important.
Expected Output
nums 0 1 2
letters
a 1.0 0.0 0.0
b 0.0 1.0 0.0
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.133-88.112.amzn1.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None