-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String dtype: disallow specifying the 'str' dtype with storage in [..] in string alias #60661
String dtype: disallow specifying the 'str' dtype with storage in [..] in string alias #60661
Conversation
…] in string alias
This gives a definitively wrong result and so I think that puts it solidly in the bugfix camp, not a breaking change. |
If we really cared we could convert that during the pickle read, although I dont think that would be a blocker. Generally using pickle to move from one environment to another is discouraged |
…ow-str-pyarrow-alias
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
…] in string alias (pandas-dev#60661) (cherry picked from commit 7415aca)
Manual backport -> #60715 |
The intention was for the new default
"str"
dtype to not include the storage in the string alias, and so to also not allow constructing it that way (this is discussed in the PDEP).This is also implemented this way, as you can see when directly calling the extension dtype API:
However, when specifying this as a
dtype
argument in eg constructors (going throughpandas_dtype(...)
, which goes through the extension dtype registry), this "accidentally" kind of works, but gives an unexpected result:I think it is confusing that it does work in case of the pyarrow storage, but then does give a different dtype than what you would typically expect.
So I would rather just disallow this case (which is what this PR does), although this is a small breaking change for people currently using
dtype="str[pyarrow]"
to get the ArrowDtype.