-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Compute] Add percentile rank function #45190
Comments
cc @icexelloss |
This looks good to me. One thing I might add is that there are a few different favors of percentile rank, e.g. (1) scale to 1/n+1, 2/n+1, …, n/n+1 (0-1 exclusive) So might be useful to expose the option for the user to choose which one. |
Do you have pointers to the existence of the "inclusive" flavor? |
Also, the formula given in https://en.wikipedia.org/wiki/Percentile_rank takes ties into account and matches neither of your two suggestions. It'd rather stick to the Wikipedia definition if there's no strong reason to do otherwise. |
tldr: I agree with the wikipedia definition. Full thought: In Python, I actually didn't find any library that implements the wikipedia definition, which is quite surprising. The two reference implementation I found is Pandas and Scipy, both agree with each other but not the wikipedia one: For the example input from wikipedia:
The wikipedia result is
both
However, the |
Describe the enhancement requested
Arrow C++ already offers a
rank
function: https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitionsIt would be useful to add a percentile rank function according to this definition: https://en.wikipedia.org/wiki/Percentile_rank
Proposed API:
Component(s)
C++
The text was updated successfully, but these errors were encountered: