Skip to content

BUG: DataFrame.rank does not return EA types when original type was an EADtype #52829

@tinadu0806

Description

@tinadu0806

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pyarrow as pa
s = pd.Series([1, 2], dtype=pd.ArrowDtype(pa.int32()))
r1 = s.rank(method="min")
df = s.to_frame(name="a")
r2 = df.rank(method="min")
>>> s
0    1
1    2
dtype: int32[pyarrow]
>>> df.dtypes
a    int32[pyarrow]
dtype: object
>>> r1
0    1
1    2
dtype: uint64[pyarrow]
>>> r2
     a
0  1.0
1  2.0
>>> r2.dtypes
a    float64
dtype: object

Issue Description

When we have a dataframe backed with pyarrow type data, when we call df.rank(method="min"), returned result is not arrow backed dataframe. This behavior does not happen for Series.rank(), we could see Series.rank() returned result is still arrow backed Series.

Incorrect:

df.dtypes
a int32[pyarrow]
dtype: object
r2 = df.rank(method="min")
r2.dtypes
a float64
dtype: object

Correct:

s
0 1
1 2
dtype: int32[pyarrow]
r1 = s.rank(method="min")
r1.dtype
uint64[pyarrow]

Expected Behavior

DataFrame.rank should return pyarrow backed dataframe when original dataframe filled with pyarrow.

Installed Versions

pd.version
'2.0.0'

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions