-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt out of bottleneck for nanmean #47716
opt out of bottleneck for nanmean #47716
Conversation
I added 2 unit tests to validate that numerical precision is bounded by log(n). If anyone has suggestions how to improve these tests let me know |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming that this should be disabled for integer types too? i.e. the precision loss is similar for integers and therefore should also be disabled?
clarify that there might be a performance decrease experienced from disabling `mean` for bottleneck Co-authored-by: Matthew Roeschke <[email protected]>
@mroeschke I will verify this. Addition is pure for integers so depending on the bottleneck implementation it should be almost exact |
Glad to see this update. Should it be targeting pandas 1.5.0 or 1.4.4 release? |
1.5. Point releases are reserved for regression in behavior from recent releases. |
Integer mean should also be opted out of. In the case of integer arrays it seems that bottleneck accumulates the sum in a float64 type, wich means that the problem persists there as well. I'm adding int types to the unit test as well Example >>> a = np.full((1_000_000_000, 1), 1_000_000_000_000, dtype=np.int64)
>>> bottleneck.nanmean(a)
1000000022905.4894
>>> np.nanmean(a)
1000000000000.0 |
Co-authored-by: JMBurley <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.