Skip to content

API: should dtype=str return array of dtype StringDtype for pandas 2.0? #49398

@topper-123

Description

@topper-123

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

IMO it would be an API improvement for pandas if creating dataframes/series/arrays using dtype=str (and dtype="str") would return a dataframe/series/array of dtype StringDtype instead of dtype object. The reason being that IMO in 99,9 % of cases where users instantiate using dtype=str they would have prefer having used dtype="string" and therefore have the guarantee that the array actually only contains strings (and NA's).

This would be similar to when instantiating currently using dtype=int gives a dtype np.int64 and for dtype=float we get np.float64.

The above proposal would be backwards incompatible and too late to introduce depreciations in pandas 1.x now. However, could it become a breaking change as part of the jump to version 2.0 of pandas, similar to the backwards-incompatible changes already listed in #44823?

Feature Description

Basically it would just change the dtype resolution function to return a StringDtype instead the current behavior, so reasonably simple to implement.

Alternative Solutions

The alternative would be to keep the current behavior in pandas 2.0.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignClosing CandidateMay be closeable, needs more eyeballsStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions