opt out of bottleneck for nanmean #47716

sebasv · 2022-07-14T07:23:58Z

closes BUG: np.mean(pd.Series) != np.mean(pd.Series.values) #42878
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

pep8speaks · 2022-07-14T07:24:01Z

Hello @sebasv! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-07-17 13:11:07 UTC

sebasv · 2022-07-14T13:52:55Z

I added 2 unit tests to validate that numerical precision is bounded by log(n). If anyone has suggestions how to improve these tests let me know

pandas/tests/reductions/test_reductions.py

pandas/tests/test_nanops.py

doc/source/whatsnew/v1.5.0.rst

mroeschke

Confirming that this should be disabled for integer types too? i.e. the precision loss is similar for integers and therefore should also be disabled?

clarify that there might be a performance decrease experienced from disabling `mean` for bottleneck Co-authored-by: Matthew Roeschke <[email protected]>

sebasv · 2022-07-14T19:23:34Z

@mroeschke I will verify this. Addition is pure for integers so depending on the bottleneck implementation it should be almost exact

JMBurley · 2022-07-14T19:43:38Z

Glad to see this update. Should it be targeting pandas 1.5.0 or 1.4.4 release?

mroeschke · 2022-07-14T19:46:18Z

Glad to see this update. Should it be targeting pandas 1.5.0 or 1.4.4 release?

1.5. Point releases are reserved for regression in behavior from recent releases.

sebasv · 2022-07-15T08:57:07Z

Integer mean should also be opted out of. In the case of integer arrays it seems that bottleneck accumulates the sum in a float64 type, wich means that the problem persists there as well. I'm adding int types to the unit test as well

Example

>>> a = np.full((1_000_000_000, 1), 1_000_000_000_000, dtype=np.int64)
>>> bottleneck.nanmean(a)
1000000022905.4894
>>> np.nanmean(a)
1000000000000.0

pandas/core/nanops.py

Co-authored-by: JMBurley <[email protected]>

mroeschke · 2022-07-18T19:14:28Z

Thanks @sebasv and @JMBurley for review

opt out of bottleneck for nanmean

93438ec

sebasv added 2 commits July 14, 2022 09:25

remove trailing whitespace

27957c8

make error bound explicit

a4ddd8f

mroeschke reviewed Jul 14, 2022

View reviewed changes

pandas/tests/reductions/test_reductions.py Outdated Show resolved Hide resolved

mroeschke added Dependencies Required and optional dependencies Reduction Operations sum, mean, min, max, etc. labels Jul 14, 2022

unittest only _bn_ok_dtype

3ddfbd9

mroeschke reviewed Jul 14, 2022

View reviewed changes

pandas/tests/test_nanops.py Show resolved Hide resolved

link issue to test function

58b1ae0

mroeschke reviewed Jul 14, 2022

View reviewed changes

doc/source/whatsnew/v1.5.0.rst Outdated Show resolved Hide resolved

mroeschke reviewed Jul 14, 2022

View reviewed changes

Update doc/source/whatsnew/v1.5.0.rst

fd217c6

clarify that there might be a performance decrease experienced from disabling `mean` for bottleneck Co-authored-by: Matthew Roeschke <[email protected]>

extend unit tests with (u)int dtypes

5b2c71e

mroeschke added this to the 1.5 milestone Jul 15, 2022

mroeschke reviewed Jul 15, 2022

View reviewed changes

pandas/core/nanops.py Show resolved Hide resolved

JMBurley reviewed Jul 15, 2022

View reviewed changes

pandas/core/nanops.py Show resolved Hide resolved

Update pandas/core/nanops.py

91ec8d1

Co-authored-by: JMBurley <[email protected]>

mroeschke approved these changes Jul 18, 2022

View reviewed changes

mroeschke merged commit cf4758f into pandas-dev:main Jul 18, 2022

sebasv deleted the opt-out-of-bottleneck-for-mean branch July 18, 2022 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

opt out of bottleneck for nanmean #47716

opt out of bottleneck for nanmean #47716

Uh oh!

sebasv commented Jul 14, 2022 •

edited

Loading

Uh oh!

pep8speaks commented Jul 14, 2022 •

edited

Loading

Uh oh!

sebasv commented Jul 14, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mroeschke left a comment

Uh oh!

sebasv commented Jul 14, 2022

Uh oh!

JMBurley commented Jul 14, 2022

Uh oh!

mroeschke commented Jul 14, 2022

Uh oh!

sebasv commented Jul 15, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mroeschke commented Jul 18, 2022

Uh oh!

Uh oh!

Uh oh!

opt out of bottleneck for nanmean #47716

opt out of bottleneck for nanmean #47716

Uh oh!

Conversation

sebasv commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2022-07-17 13:11:07 UTC

Uh oh!

sebasv commented Jul 14, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

sebasv commented Jul 14, 2022

Uh oh!

JMBurley commented Jul 14, 2022

Uh oh!

mroeschke commented Jul 14, 2022

Uh oh!

sebasv commented Jul 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mroeschke commented Jul 18, 2022

Uh oh!

Uh oh!

sebasv commented Jul 14, 2022 •

edited

Loading

pep8speaks commented Jul 14, 2022 •

edited

Loading

sebasv commented Jul 15, 2022 •

edited

Loading