Skip to content

BUG?: creating Categorical from pandas Index/Series with "object" dtype infers string #61778

Open
@jorisvandenbossche

Description

@jorisvandenbossche

When creating a pandas Series/Index/DataFrame, I think we generally differentiate between passing a pandas object with object dtype and a numpy array with object dtype:

>>> pd.options.future.infer_string = True
>>> pd.Index(pd.Series(["foo", "bar", "baz"], dtype="object"))
Index(['foo', 'bar', 'baz'], dtype='object')
>>> pd.Index(np.array(["foo", "bar", "baz"], dtype="object"))
Index(['foo', 'bar', 'baz'], dtype='str')

So for pandas objects, we preserve the dtype, for numpy arrays of object dtype, we essentially treat that as a sequence of python objects where we infer the dtype (@jbrockmendel that's also your understanding?)

But for categorical that doesn't seem to happen:

>>> pd.options.future.infer_string = True
>>> pd.Categorical(pd.Series(["foo", "bar", "baz"], dtype="object"))
['foo', 'bar', 'baz']
Categories (3, str): [bar, baz, foo]   # <--- categories inferred as str

So we want to preserver the dtype for the categories here as well?

Metadata

Metadata

Assignees

Labels

CategoricalCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversions

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions