-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
Code Sample, a copy-pastable example if possible
INDEX_COLS = ['group', 'timestamp']
left = pandas.DataFrame({
'group': ['a', 'a', 'b', 'b'],
'timestamp': [0, 1, 2, 5],
'obs1': [10, 11, 12, 15]
}).set_index(INDEX_COLS)
right = pandas.DataFrame({
'group': ['a', 'a', 'b', 'b'],
'timestamp': [0, 1, 3, 4],
'obs2': [20, 21, 23, 24]
}).set_index(INDEX_COLS)
pandas.merge_ordered(left, right)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "python3.6/site-packages/pandas/core/reshape/merge.py", line 226, in merge_ordered
# result = _merger(left, right)
# File "python3.6/site-packages/pandas/core/reshape/merge.py", line 212, in _merger
# how=how)
# File "python3.6/site-packages/pandas/core/reshape/merge.py", line 1252, in __init__
# sort=True # factorize sorts
# File "python3.6/site-packages/pandas/core/reshape/merge.py", line 524, in __init__
# self._validate_specification()
# File "python3.6/site-packages/pandas/core/reshape/merge.py", line 1033, in _validate_specification
# lidx=self.left_index, ridx=self.right_index))
# pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=FalseProblem description
The error message above implies that I should be able to merge these two timeseries DataFrames using left_index=True and right_index=True. But these are not arguments to the merge_ordered() function like they are to other merging functions.
I could work around this by doing the following:
pandas.merge_ordered(left, right, left_on=INDEX_COLS, right_on=INDEX_COLS).set_index(INDEX_COLS)but that seems unnecessarily verbose, and possibly less performant than if I could do
pandas.merge_ordered(left, right, left_index=True, right_index=True)Proposal
It looks like merge_ordered() uses _OrderedMerge under the hood which can take the left_index and right_index arguments (and the code example above works as expected if those arguments are passed through). Would you consider adding these arguments? I can create a PR if there's appetite.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.2.0
pip: 19.0.1
setuptools: 40.7.3
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.4
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None