-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise NotImplementedError for groupby.agg if duplicate columns would be created #17956
Raise NotImplementedError for groupby.agg if duplicate columns would be created #17956
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving, deferring to you on to add a warning in the non-pandas_compatible case as well.
Do we need to add similar logic for duplicate columns in a dataframe itself? i.e. to prevent something like
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
>>> df.rename({'b': 'a'}, axis=1)
or any other way that duplicate names could manifest? That could be done in another PR of course.
for values in aggs.values() | ||
): | ||
# In non pandas_compatible mode, we would just drop the duplicate aggregation. | ||
# Should we issue a UserWarning? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking if we should start issuing a warning in non-pandas_compatible mode, i.e. in an else clause? I would support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, exactly. Thanks added a UserWarning in non-pandas_compatible mode.
Yeah we have some checks like this in The harder-to-catch cases are when we're preprocessing column-related operations with mappings ( |
/merge |
Description
xref #17649
For
cudf.pandas
, we will dispatch to pandas instead of silently dropping the duplicate columnChecklist