Skip to content

merge_ordered() should accept left_index and right_index args #27760

@austin3dickey

Description

@austin3dickey

Code Sample, a copy-pastable example if possible

INDEX_COLS = ['group', 'timestamp']

left = pandas.DataFrame({
    'group': ['a', 'a', 'b', 'b'],
    'timestamp': [0, 1, 2, 5],
    'obs1': [10, 11, 12, 15]
}).set_index(INDEX_COLS)

right = pandas.DataFrame({
    'group': ['a', 'a', 'b', 'b'], 
    'timestamp': [0, 1, 3, 4],
    'obs2': [20, 21, 23, 24]
}).set_index(INDEX_COLS)

pandas.merge_ordered(left, right)

# Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File "python3.6/site-packages/pandas/core/reshape/merge.py", line 226, in merge_ordered
#    result = _merger(left, right)
#  File "python3.6/site-packages/pandas/core/reshape/merge.py", line 212, in _merger
#    how=how)
#  File "python3.6/site-packages/pandas/core/reshape/merge.py", line 1252, in __init__
#    sort=True  # factorize sorts
#  File "python3.6/site-packages/pandas/core/reshape/merge.py", line 524, in __init__
#    self._validate_specification()
#  File "python3.6/site-packages/pandas/core/reshape/merge.py", line 1033, in _validate_specification
#    lidx=self.left_index, ridx=self.right_index))
# pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False

Problem description

The error message above implies that I should be able to merge these two timeseries DataFrames using left_index=True and right_index=True. But these are not arguments to the merge_ordered() function like they are to other merging functions.

I could work around this by doing the following:

pandas.merge_ordered(left, right, left_on=INDEX_COLS, right_on=INDEX_COLS).set_index(INDEX_COLS)

but that seems unnecessarily verbose, and possibly less performant than if I could do

pandas.merge_ordered(left, right, left_index=True, right_index=True)

Proposal

It looks like merge_ordered() uses _OrderedMerge under the hood which can take the left_index and right_index arguments (and the code example above works as expected if those arguments are passed through). Would you consider adding these arguments? I can create a PR if there's appetite.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.2.0
pip: 19.0.1
setuptools: 40.7.3
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.4
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions