Skip to content

DeepHash: Different dataframes get the same hash #394

@amakelov

Description

@amakelov

Describe the bug
Hash collision seems to happen whenever two dataframes have the same column names, regardless of the rows.

To Reproduce

from deepdiff import DeepHash
x = pd.DataFrame({'a': [1, 2, 3]})
y = pd.DataFrame({'a': [1, 2, 3, 4]})
a = DeepHash(x)[x]
b = DeepHash(y)[y]
assert a == b

Expected behavior
Collisions should be harder to find than this (unless this was designed into the library?)

OS, DeepDiff version and Python version (please complete the following information):

  • OS: Ubuntu 22.04.2 LTS
  • Python Version: 3.10.8
  • DeepDiff Version: 6.3.0

Activity

self-assigned this
on May 1, 2023
seperman

seperman commented on May 1, 2023

@seperman
Owner

Hi @amakelov
While we have been supporting Numpy for many years, Pandas data-frames have never been covered. If you have time to add the Pandas support to DeepDiff, that would be great. Otherwise I will look into it when I have a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @amakelov@seperman

    Issue actions

      `DeepHash`: Different dataframes get the same hash · Issue #394 · seperman/deepdiff