-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make_hash stalls for long numbers #7
Comments
tl;dr: your example internally creates a very large array! The reason this stalls is due to the naïve and literal interpretation of this algorithm. I implemented the algorithm described by Thomas Gettys here, and the first step is to generate a In this particular example, Because it's making a square of Long story short, this perfect hashing algorithm is not suitable for numbers with a very large range, such as in this example. Maybe I will look into other perfect hashing algorithms that are more appropriate, optimize the current algorithm so it doesn't allocate, or just document that it's not appropriate for large ranges. |
Throwing an exception if you try to do this might be a "quick workaround"
…On 2022-11-05 9:30 am, Eddie Antonio Santos wrote:
tl;dr: your example internally creates a _very_ large array!
The reason this stalls is due to the naïve and literal interpretation of
this algorithm. I implemented the algorithm described by Thomas Gettys here
[1], and the first step is to generate a $t \times t$ square, where $t
\times t \ge \max(A')$, $A$ are the integers you want to hash, and $A'$ is
every element in $A$ subtracted by $min(A)$. In effect, this means that
$max(A')$ is the _range_ of $A$, that is $\max(A) - \min(A)$.
In this particular example,
$\min(A)$ = -3342959649445288775
which means
$\max(A')$ = 7277493124594268929
therefore the smallest value of $t$ is 2697682918, because this is the
smallest $t$ such that $t \times t$ is greater than 7277493124594268929.
Because it's making a _square_ of $t$, this means the algorithm then tries
to make room for _at least_ 7277493124594268929 integers. Your computer
cannot do this. For reference, this number is between $2^{62}$ and $2^{63}$.
There are a few other housekeeping data structures that are made, so
7277493124594268929 times the size of a Python integer is the _minimum_
amount of bytes needed for this algorithm to terminate.
Long story short, THIS PERFECT HASHING ALGORITHM IS NOT SUITABLE FOR NUMBERS
WITH A VERY LARGE RANGE, such as in this example.
Maybe I will look into other perfect hashing algorithms that are more
appropriate, optimize the current algorithm so it doesn't allocate, or just
document that it's not appropriate for large ranges.
--
Reply to this email directly, view it on GitHub [2], or unsubscribe [3].
You are receiving this because you authored the thread.Message ID:
***@***.***>
Links:
------
[1]
https://www.drdobbs.com/architecture-and-design/generating-perfect-hash-functions/184404506
[2]
#7 (comment)
[3]
https://github.com/notifications/unsubscribe-auth/AAFRLUUXNZXBXENEZMET7GLWG2DTTANCNFSM6AAAAAARPUJ45Y
|
perfection.make_hash([3934533475148980154, -3342959649445288775, 654684863317306829, 29321895429837616, 2356289430496076560]) stalls indefinitely on my computer.
Note, these numbers are the output of Python's built-in
hash
, which I think is a reasonable use case.The text was updated successfully, but these errors were encountered: