Skip to content

Dealing with objects in sets. #305

Open
@matthewvanhoutte

Description

@matthewvanhoutte

Is your feature request related to a problem? Please describe.
I am using DeepHash to hash an object with multiple nested objects which I can then compare against at a later date. The nested objects are part of sets - the objects have their own hashing function that is used by python to ensure no duplicate items are added. It seems that when using DeepHash on the same objects in a set I get two different values and seem to flipflop between the two. Any ideas spring to mind? I know that python's internal hashing is not stable and they recommend using hashlib but I have seen that deepdiff uses hashlib so a bit at a loss. The main requirement is deterministic hashing as these items serve as lookup references at a later date. An example of something I am trying to do without nesting shown below though the example reproduces the same behaviour.

class Name:

    def __init__(self,
                 fn: str = None,
                 sn: str = None,
                 ssn: str = None):

        self.fn = fn
        self.sn = sn
        self.ssn = ssn

    def __hash__(self):
        """
        Enable hashing of class for set usage.

        Returns:
            hash (str): The hash value of the object.
        """
        return hash(self.ssn)

    def __eq__(self, other):
        """
        Determine equivalence of objects.

        Args:
            other (Alias): The other object to compare - should compare
                against Alias type object

        Returns:
            equivalence (bool): True if object match based on tuple values of
                attributes, otherwise False.
        """
        return hash(self) == hash(other)


from deepdiff.deephash import DeepHash

obj1 = Name(sn='Tony',
            ssn='Tony')
obj2 = Name(fn='Tiny',
            sn='Tony',
            ssn='Tiny Tony')
obj3 = Name(sn='Tony',
            ssn='Tony')

# I use sets to drop unwanted duplicated using a hash method from class.
# This was prior to knowledge of DeepHash. Result of this should be a set with
# obj1 and obj2 of which the ordering may differ depending on hash generated.
# As we know python hash differs from process to process but remains consistent
# within the same process. I need this to remain consistent across processes.

objects = {obj1, obj2, obj3}

# When running DeepHash on 'objects' it differs from process to process.
# I get the following two hashes:
# f261026e3e51aac71fb74323b324b0313d19246031510ae8d79749bf87247050
# 93a6ceff87e2b04675ec657a78e65843d1a9477bdd37128fe83f1b4b4fbcab26
print(DeepHash(objects)[objects])

# From what I gathered in some cases the hash calculation is either done on the
# whole object plus the string i.e str:obj:xxxxxxxxx:yyyyyyyyy:zzzzzzzzz... and
# in the other case simply what I would expect str:Tony. Not sure how it all
# works with how DeepHash and if this is desired behaviour.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions