Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEPR] deprecate pivot_wider #1263

Merged
merged 6 commits into from
Jun 1, 2023
Merged

[DEPR] deprecate pivot_wider #1263

merged 6 commits into from
Jun 1, 2023

Conversation

samukweku
Copy link
Collaborator

@samukweku samukweku commented May 16, 2023

PR Description

Please describe the changes proposed in the pull request:

  • deprecate pivot_wider in favor of pandas pivot
  • minor edits to pivot_longer

This PR relates to #1045 .

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.md.
  1. Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

@ericmjl
Copy link
Member

ericmjl commented May 16, 2023

@samukweku samukweku changed the title deprecate pivot_wider [DEPR] deprecate pivot_wider May 16, 2023
@codecov
Copy link

codecov bot commented May 16, 2023

Codecov Report

Merging #1263 (4491b1f) into dev (4da9a90) will increase coverage by 10.36%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##              dev    #1263       +/-   ##
===========================================
+ Coverage   87.15%   97.51%   +10.36%     
===========================================
  Files          78       78               
  Lines        3705     3702        -3     
===========================================
+ Hits         3229     3610      +381     
+ Misses        476       92      -384     

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ericmjl ericmjl merged commit 51f6f4c into dev Jun 1, 2023
samukweku added a commit that referenced this pull request Jun 1, 2023
* simplify logic

* minor updates

* changelog

* fix userwarning on then function

---------

Co-authored-by: Eric Ma <[email protected]>
ericmjl added a commit that referenced this pull request Jul 8, 2023
* skeleton

* partial implementation

* minor edits

* add comments

* fix notebook

* Update conditional_join.ipynb

* add changelog

* updates

* update changelog

* add more tests

* shortcut for monotonic increasing groups

* use 0/1 for counter check for monotonicity

* fix test failure

* [DEPR] deprecate `pivot_wider` (#1263)

* simplify logic

* minor updates

* changelog

* fix userwarning on then function

---------

Co-authored-by: Eric Ma <[email protected]>

* fix range count

* remove irrelevant imports

* fix test fail

* Update CHANGELOG.md

* Update CHANGELOG.md

* fix test failure

* remove caching

* add force parameter

* update docstrings

* fix failing test

* remove strict eq join check

---------

Co-authored-by: samuel.oranyeli <[email protected]>
Co-authored-by: Eric Ma <[email protected]>
@samukweku samukweku deleted the samukweku/pivot_updates branch November 18, 2023 02:12
@fkgruber
Copy link

fkgruber commented Feb 4, 2025

Too bad you are deprecating pivot_wider. Seems like a better useful function. What is the most efficient way to use pivot to get the same outputs pivot_wider? Because by default you get multi-index which are very inconvenient

@samukweku
Copy link
Collaborator Author

Hi @fkgruber kindly share an example you have and I can show how to deal with it. There is also the collapse_levels function to handle multiindex. We would like to not duplicate existing pandas functions. Open to suggestions/feedback. Thanks

@samukweku
Copy link
Collaborator Author

@ericmjl @fkgruber I'll write up a series of examples on the alternatives for the deprecated functions on here. Thoughts ?

@fkgruber
Copy link

fkgruber commented Feb 5, 2025

Here is an example in R. Would love to see the equivalent in pyjanitor using non-deprecated functions

library(tidyverse)
mtcars %>% 
  rownames_to_column() %>%
  group_by(cyl) %>%
  summarise(mean=mean(mpg))%>%
  pivot_wider(names_from = cyl, values_from = mean)%>%
  mutate(Average= mean(c(`6`, `4`, `8`)),
         Diff_4= `4`-Average)

@samukweku
Copy link
Collaborator Author

samukweku commented Feb 5, 2025

@fkgruber this is a similar approach in pandas

import pandas as pd
url='https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv'
mtcars = pd.read_csv(url)
(mtcars
.groupby('cyl', as_index=False)
.agg(mean=('mpg','mean'))
.assign(index=1)
.pivot(index='index', columns='cyl',values='mean')
.rename_axis(columns=None,index=None)
.assign(Average=lambda df: df.mean(axis='columns'), 
        Diff_4= lambda df: df[4] - df.Average)
.round(2)
)

@fkgruber
Copy link

fkgruber commented Feb 6, 2025

Thanks,
so with pure pandas the pivot_wider provided in R becoes 3 calls
.assign(index=1)
.pivot(index='index', columns='cyl',values='mean')
.rename_axis(columns=None,index=None)
(by the way I have seen examples where .rename_axis(columns=None,index=None) does not work so it is not a general solution)

With janitor and pivot_wider we could do:

(mtcars
 .groupby('cyl', as_index=False)
 .agg(mean=('mpg','mean'))
 .assign(index=1)
 .pivot_wider(index="index",names_from='cyl', values_from='mean')
 .assign(Average=lambda df: df.mean(axis='columns'),
         Diff_4= lambda df: df[4] - df.Average)
)

And if pivot_wider would take of the index as dplyr does, then it would almost be the same number of lines as in R.

Seems like pivot_wider is helpful by reducing extra coding

@samukweku
Copy link
Collaborator Author

Kindly share an example where rename_axis does not work. We could raise an issue with the pandas team. The assign(index=1) is not required in all cases; tidyverse does not assign the same level of importance that pandas does to index.

Pandas pivot already does a great job and there is no need to repeat it; if there are edge cases it can be handled (happy to be proven wrong). Pandas does not have pivot_wider_spec or something close to it; pyjanitor has that.

If you have any other scenario where pivot is not sufficient happy to work through it with you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants