-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
IMO it would be an API improvement for pandas if creating dataframes/series/arrays using dtype=str (and dtype="str") would return a dataframe/series/array of dtype StringDtype instead of dtype object. The reason being that IMO in 99,9 % of cases where users instantiate using dtype=str they would have prefer having used dtype="string" and therefore have the guarantee that the array actually only contains strings (and NA's).
This would be similar to when instantiating currently using dtype=int gives a dtype np.int64 and for dtype=float we get np.float64.
The above proposal would be backwards incompatible and too late to introduce depreciations in pandas 1.x now. However, could it become a breaking change as part of the jump to version 2.0 of pandas, similar to the backwards-incompatible changes already listed in #44823?
Feature Description
Basically it would just change the dtype resolution function to return a StringDtype instead the current behavior, so reasonably simple to implement.
Alternative Solutions
The alternative would be to keep the current behavior in pandas 2.0.
Additional Context
No response