Skip to content

Feature Request: merge_asof with interpolation #22410

@JoaoAparicio

Description

@JoaoAparicio

merge_asof with interpolation would be a new feature. When you merge df1 and df2, the merged columns of df2 would not be the exact "asof" values as in df2, but the interpolated values (for example in a linearly time-weighted fashion).

Define

np.random.seed(0)
start = pd.Timestamp("2018-07-23 09:00:00")
df1 = pd.DataFrame(np.random.normal(size=5), index=pd.date_range(start, periods=5, freq='S'), columns=['something'])

df2 = pd.DataFrame(np.random.normal(size=5), index=pd.date_range(start, periods=5, freq='S')+datetime.timedelta(seconds=0.5), columns=['something_else'])
df2 = df2.iloc[:2].append(df2.iloc[3:])

df1:

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
something
2018-07-23 09:00:001.764052
2018-07-23 09:00:010.400157
2018-07-23 09:00:020.978738
2018-07-23 09:00:032.240893
2018-07-23 09:00:041.867558

df2:

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
something_else
2018-07-23 09:00:00.500-0.977278
2018-07-23 09:00:01.5000.950088
2018-07-23 09:00:03.500-0.103219
2018-07-23 09:00:04.5000.410599

This is how merge_asof works:

pd.merge_asof(df1, df2, left_index=True, right_index=True)

Returns:

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
somethingsomething_else
2018-07-23 09:00:001.764052NaN
2018-07-23 09:00:010.400157-0.977278
2018-07-23 09:00:020.9787380.950088
2018-07-23 09:00:032.2408930.950088
2018-07-23 09:00:041.867558-0.103219

And so

pd.merge_asof(df1, df2, left_index=True, right_index=True, method="linear")

would return

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
somethingsomething_else
2018-07-23 09:00:001.764052NaN
2018-07-23 09:00:010.400157-0.013595
2018-07-23 09:00:020.9787380.598986
2018-07-23 09:00:032.2408930.247884
2018-07-23 09:00:041.8675580.153690

method="last" could reproduce default behaviour.

This could also be reproduced by doing these operations manually:

pd.merge(df1, df2, left_index=True, right_index=True, how="outer").interpolate("linear").merge(df1, left_index=True, right_index=True, on="something", how="right")

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions