Skip to content

BUG: concat along the index (axis=0) of two dataframes with duplicate column name fails #35240

@ghost

Description

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.


Question about pandas

Hi,
I have a persistent problem with concatenating multiple DataFrames with shapes:

  1. (48, 5674)
  2. (48, 9022)
  3. (48, 7340),
  4. (47, 6539)
  5. (47, 10369)
  6. (47, 17242)
  7. (47, 19248)
  8. (47, 14282)

If I want to concatenate this, or even any part of it with

pd.concat(df_list)

I get the following error:

Traceback (most recent call last):
  File "E:/OneDrive/Informatik Studium/KIT Master/SS20/AGD Praktikum/phase-2/1_code/MyTest.py", line 46, in <module>
    df_result = __parallelize_dataframe(func=apply_functions, df_data=df_train.copy(), config_tupels=config_tupels)
  File "E:/OneDrive/Informatik Studium/KIT Master/SS20/AGD Praktikum/phase-2/1_code/MyTest.py", line 22, in __parallelize_dataframe
    df_pool_result = pd.concat(pool_result[0:2])
  File "E:\venv\lib\site-packages\pandas\core\reshape\concat.py", line 284, in concat
    return op.get_result()
  File "E:\venv\lib\site-packages\pandas\core\reshape\concat.py", line 497, in get_result
    mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
  File "E:\venv\lib\site-packages\pandas\core\internals\managers.py", line 2016, in concatenate_block_managers
    elif is_uniform_join_units(join_units):
  File "E:\venv\lib\site-packages\pandas\core\internals\concat.py", line 388, in is_uniform_join_units
    all(not ju.is_na or ju.block.is_extension for ju in join_units)
  File "E:\venv\lib\site-packages\pandas\core\internals\concat.py", line 388, in <genexpr>
    all(not ju.is_na or ju.block.is_extension for ju in join_units)
AttributeError: 'NoneType' object has no attribute 'is_extension'

I found out in my research, that blocks in the join_units are sometimes None. But I don't understand why this is so...
All table entries in my DataFrames are not None/NaN.
Unfortunately I can't post the data here, because they are very extensive.
Maybe it helps to know that I split the rows in my original dataframe for multiprocessing. Afterwards I will concatenate them again, see above.
Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions