Skip to content

Rolling() gives values different from pd.rolling() #5877

Open
@chiaral

Description

@chiaral

I am not sure this is a bug - but it clearly doesn't give the results the user would expect.

The rolling sum of zeros gives me values that are not zeros

 var = np.array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.31      , 0.91999996, 8.3       ,
       1.42      , 0.03      , 1.22      , 0.09999999, 0.14      ,
       0.13      , 0.        , 0.12      , 0.03      , 2.53      ,
       0.        , 0.19999999, 0.19999999, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ],
               dtype='float32')

timet = np.array([  43200000000000,  129600000000000,  216000000000000,  302400000000000,
        388800000000000,  475200000000000,  561600000000000,  648000000000000,
        734400000000000,  820800000000000,  907200000000000,  993600000000000,
       1080000000000000, 1166400000000000, 1252800000000000, 1339200000000000,
       1425600000000000, 1512000000000000, 1598400000000000, 1684800000000000,
       1771200000000000, 1857600000000000, 1944000000000000, 2030400000000000,
       2116800000000000, 2203200000000000, 2289600000000000, 2376000000000000,
       2462400000000000, 2548800000000000, 2635200000000000, 2721600000000000,
       2808000000000000, 2894400000000000, 2980800000000000],
      dtype='timedelta64[ns]')

ds_ex = xr.Dataset(data_vars=dict(
                          pr=(["time"], var),
                        ),
                        coords=dict(
                        time=("time", timet)
                        ),
    )

ds_ex.rolling(time=3).sum().pr.values

it gives me this result:

array([ nan, nan, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 3.1000000e-01,
1.2300000e+00, 9.5300007e+00, 1.0640000e+01, 9.7500000e+00,
2.6700001e+00, 1.3500001e+00, 1.4600002e+00, 3.7000012e-01,
2.7000013e-01, 2.5000012e-01, 1.5000013e-01, 2.6800001e+00,
2.5600002e+00, 2.7300003e+00, 4.0000033e-01, 4.0000033e-01,
2.0000035e-01, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07,
3.5762787e-07, 3.5762787e-07, 3.5762787e-07, 3.5762787e-07,
3.5762787e-07, 3.5762787e-07, 3.5762787e-07
], dtype=float32)

Note the non zero values - the non zero value changes depending on whether i use float64 or float32 as precision of my data. So this seems to be a precision related issue (although the first values are correctly set to zero), in fact other sums of values are not exactly what they should be.

The small difference at the 8th/9th decimal position can be expected due to precision, but the fact that the 0s become non zeros is problematic imho, especially if not documented. Oftentimes zero in geoscience data can mean a very specific thing (i.e. zero rainfall will be characterized differently than non-zero).

in pandas this instead works:

df_ex = ds_ex.to_dataframe()
df_ex.rolling(window=3).sum().values.T

gives me

array([[ nan, nan, 0. , 0. , 0. ,
0. , 0. , 0.31 , 1.22999996, 9.53000015,
10.6400001 , 9.75000015, 2.66999999, 1.35000001, 1.46000002,
0.36999998, 0.27 , 0.24999999, 0.15 , 2.67999997,
2.55999997, 2.72999996, 0.39999998, 0.39999998, 0.19999999,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ]])

What you expected to happen:

the sum of zeros should be zero.
If this cannot be achieved/expected because of precision issues, it should be documented.

Anything else we need to know?:

I discovered this behavior in my old environments, but I created a new ad hoc environment with the latest versions, and it does the same thing.

Environment:

INSTALLED VERSIONS

commit: None
python: 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 0.19.0
pandas: 1.3.3
numpy: 1.21.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 58.0.4
pip: 21.2.4
conda: None
pytest: None
IPython: 7.28.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions