-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EHN] Add jointly
option for min_max_scale
#1112
Merged
ericmjl
merged 19 commits into
pyjanitor-devs:dev
from
Zeroto521:min-max-scale-entire-data-option
Jun 14, 2022
Merged
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
dbc66f2
[EHN] Add `entire_data` for `min_max_scale`
Zeroto521 572673a
Update the description of function
Zeroto521 ffe85bb
highlight the keywords
Zeroto521 5ea9511
Update examples
Zeroto521 bfebf21
Rename function
Zeroto521 6c8d242
Update test suitcases
Zeroto521 ec1d956
Ignore darglint error
Zeroto521 45bff67
Update test results
Zeroto521 efd2439
correct variable name
Zeroto521 7097b25
Miss data
Zeroto521 3052b9a
Update example result
Zeroto521 06a8405
`entire_data` -> `jointly`
Zeroto521 3e3ee53
Update description
Zeroto521 b9d47f6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] db2a4f3
Add changelog section
Zeroto521 2e18941
Update CHANGELOG.md
Zeroto521 174e7c7
Merge branch 'min-max-scale-entire-data-option' of https://github.com…
Zeroto521 0247ec9
lint codes
Zeroto521 eff218b
lint codes
Zeroto521 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think
entire_data
is a better name.Need to be added to the changelog file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, though I also need a bit of time and space to think of a better name. I wonder if others on the dev team have ideas? @pyjanitor-devs/core-devs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure of a good name to use. maybe
keep
like in Pandas?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean 'keep' its value could be 'column' or 'all'?
But this variable is better as a boolean type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, horrible parameter name. maybe
scale_all
orscale_all_columns
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, @Zeroto521, on second thought, I think we need a bit better definition of the semantics.
min_max_scale
currently operates with the assumption of operating on one column, so the use ofcolumn_name
makes sense here. Below is my attempt at reasoning through the multiple ways we could usemin_max_scale
.entire_data
is a special case of scaling multiple columns jointly.Since we're working on this function, it's a good chance to change the API to be flexible yet also sensible. What if the API was, instead the following?
If
jointly
is True, then thecolumn_names
provided are jointly scaled; otherwise, they are not.I wanted to point out a new behaviour that we might be able to support across the rest of the API -- by making
column_names
accept a Callable that has the signature:we can enable
min_max_scale
on all columns by doing:This is pretty concise without resorting to needing to maintain string mappings for special-case behaviour.
I'm glad we didn't rush to merge this PR, giving us the time and space to think clearly about the semantics of the API.
@samukweku and @Zeroto521 what do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I still can't get the point why
column_names
could accept a callback type argument.The usage of
column_names
is to get dataframe's columns likedf[column_names]
.So if
column_names
has some column names which is not indf.columns
.There will raise an error. Whatever
column_names
isIterable[Hashable]
or callback type could returnIterable[Hashable]
.To scale all columns, I thought we could use the default argument as
None
forcolumn_names
without no more inputting.Are there more examples to show the importance of the callback type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments, @Zeroto521!
Regarding why we might want to allow
column_names
to be a Callable, I had the idea that it helps support being explicit over implicit, which is in the Zen of Python. Settingcolumn_names=None
makes selecting all columns implicit, whereas settingcolumn_names=lambda df: df.columns
makes selecting all columns explicit. In addition, it allows the selection of arbitrary subsets of column names programmatically, without needing to hard-code those names.On further thought, I can see how
column_names=None
actually follows the pattern established in other places in the library, so I think, for now, we can:column_names=None
to imply selection of all columns, andcolumn_names: Callable
in the issue tracker, deferring the implementation till later.What do you all think about
jointly=True
as the keyword for triggering whether to independently scale each column or to jointly scale all columns specified incolumn_names
? @Zeroto521 if you're in agreement with the keyword argument, then I think, let's get that specified in this PR, then we can close out the PR!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with using
jointly
.About
column_names
whether could receive a callable type or not.I can understand now.
column_names=lambda df: df.columns
is an implicit style and also a trick.df.columns
lambda df: df.columns[:3]
['a', 'b', 'c']
lambda df: [i for i in df.columns if instance(i, str)]
['a', 'b', 'c']
As you said, we can put it aside at present.
Once the parameter
column_names
ofmin_max_scale
could receive, other functions also need to do the same thing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More discussions move to #1115