Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_hash stalls for long numbers #7

Open
za3k opened this issue Oct 27, 2022 · 2 comments
Open

make_hash stalls for long numbers #7

za3k opened this issue Oct 27, 2022 · 2 comments

Comments

@za3k
Copy link

za3k commented Oct 27, 2022

perfection.make_hash([3934533475148980154, -3342959649445288775, 654684863317306829, 29321895429837616, 2356289430496076560]) stalls indefinitely on my computer.

Note, these numbers are the output of Python's built-in hash, which I think is a reasonable use case.

@eddieantonio
Copy link
Owner

tl;dr: your example internally creates a very large array!

The reason this stalls is due to the naïve and literal interpretation of this algorithm. I implemented the algorithm described by Thomas Gettys here, and the first step is to generate a $t \times t$ square, where $t \times t \ge \max(A')$, $A$ are the integers you want to hash, and $A'$ is every element in $A$ subtracted by $min(A)$. In effect, this means that $max(A')$ is the range of $A$, that is $\max(A) - \min(A)$.

In this particular example,
$\min(A)$ = -3342959649445288775
which means
$\max(A')$ = 7277493124594268929
therefore the smallest value of $t$ is 2697682918, because this is the smallest $t$ such that $t \times t$ is greater than 7277493124594268929.

Because it's making a square of $t$, this means the algorithm then tries to make room for at least 7277493124594268929 integers. Your computer cannot do this. For reference, this number is between $2^{62}$ and $2^{63}$. There are a few other housekeeping data structures that are made, so 7277493124594268929 times the size of a Python integer is the minimum amount of bytes needed for this algorithm to terminate.

Long story short, this perfect hashing algorithm is not suitable for numbers with a very large range, such as in this example.

Maybe I will look into other perfect hashing algorithms that are more appropriate, optimize the current algorithm so it doesn't allocate, or just document that it's not appropriate for large ranges.

@za3k
Copy link
Author

za3k commented Nov 5, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants