Description
Is your feature request related to a problem? Please describe.
I am using DeepHash to hash an object with multiple nested objects which I can then compare against at a later date. The nested objects are part of sets - the objects have their own hashing function that is used by python to ensure no duplicate items are added. It seems that when using DeepHash on the same objects in a set I get two different values and seem to flipflop between the two. Any ideas spring to mind? I know that python's internal hashing is not stable and they recommend using hashlib
but I have seen that deepdiff
uses hashlib
so a bit at a loss. The main requirement is deterministic hashing as these items serve as lookup references at a later date. An example of something I am trying to do without nesting shown below though the example reproduces the same behaviour.
class Name:
def __init__(self,
fn: str = None,
sn: str = None,
ssn: str = None):
self.fn = fn
self.sn = sn
self.ssn = ssn
def __hash__(self):
"""
Enable hashing of class for set usage.
Returns:
hash (str): The hash value of the object.
"""
return hash(self.ssn)
def __eq__(self, other):
"""
Determine equivalence of objects.
Args:
other (Alias): The other object to compare - should compare
against Alias type object
Returns:
equivalence (bool): True if object match based on tuple values of
attributes, otherwise False.
"""
return hash(self) == hash(other)
from deepdiff.deephash import DeepHash
obj1 = Name(sn='Tony',
ssn='Tony')
obj2 = Name(fn='Tiny',
sn='Tony',
ssn='Tiny Tony')
obj3 = Name(sn='Tony',
ssn='Tony')
# I use sets to drop unwanted duplicated using a hash method from class.
# This was prior to knowledge of DeepHash. Result of this should be a set with
# obj1 and obj2 of which the ordering may differ depending on hash generated.
# As we know python hash differs from process to process but remains consistent
# within the same process. I need this to remain consistent across processes.
objects = {obj1, obj2, obj3}
# When running DeepHash on 'objects' it differs from process to process.
# I get the following two hashes:
# f261026e3e51aac71fb74323b324b0313d19246031510ae8d79749bf87247050
# 93a6ceff87e2b04675ec657a78e65843d1a9477bdd37128fe83f1b4b4fbcab26
print(DeepHash(objects)[objects])
# From what I gathered in some cases the hash calculation is either done on the
# whole object plus the string i.e str:obj:xxxxxxxxx:yyyyyyyyy:zzzzzzzzz... and
# in the other case simply what I would expect str:Tony. Not sure how it all
# works with how DeepHash and if this is desired behaviour.