-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring back parallelism #56
base: master
Are you sure you want to change the base?
Bring back parallelism #56
Conversation
This reverts commit a71499a.
This reverts commit 06da41a.
This reverts commit 3b6dcc1.
This is just a proof-of-concept. It seems to not immediately fail on Miri (like my first attempts), but I have yet to let it run to completion.
Also review that an UnsafeCell isn't necessary. The Glossary of the Unsafe Code Guidelines Reference explicitly states: > Finding live shared references propagates recursively through > references, but not through raw pointers. [...] If data immediately > pointed to by a *const T or &*const T is mutated, that's *not* interior > mutability. — https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#interior-mutability The Reference also only mentions transitivity through references and boxes: > Moreover, the bytes pointed to by a shared reference, including > transitively through other references (both shared and mutable) and > Boxes, are immutable; transitivity includes those references stored in > fields of compound types. — https://doc.rust-lang.org/stable/reference/behavior-considered-undefined.html?highlight=undefin#r-undefined.immutable
These have higher lane counts (as with parallelism enabled that may surface more bugs) but very little memory, since running in Miri is quite slow.
This can yield small performance improvements in some cases (e.g. a 2% improvement computing Argon2id in parallel with p=4, m=64MiB and t=1).
Thanks for your work on this! A comment and a question:
|
That's ok, I think it can be easily rewritten with But, just to clarify, do you directly want to support Rust 1.56? Because you mentioned targeting an edition, however there's no hard requirement on keeping the minimum supported Rust version that low even if you choose to stick with Rust 2021 here. And users generally wouldn't care about the edition you pick, since crates with different editions can coexist in a build. (In fact, the code was compiled and tested as is, with the 2021 Edition, using Rust 1.83.0, which has
A MIR-level interpreter to detect various types of undefined behavior in Rust, in particular (even though still experimental) aliasing violations, pointer provenance issues, and data races. https://github.com/rust-lang/miri Ralf Jung, one of its main authors, is also (among others) doing a lot of the work on defining the rules for unsafe Rust, in particular the aliasing rules (affecting unsafe, since safe Rust is bound by the much stricter borrow checker). Stacked Borrows, one of his papers on that, is (AFAIK) still the most complete aliasing model proposed for Rust. |
This is an attempt to bring back parallelism (which was removed in 3b6dcc1 due to #41).
I've started by reverting the three commits related to the removal of parallelism, then replaced
Memory::as_lanes_mut
with a (wrapped) raw pointer (type) and changedfill_segment
to directly obtain disjoint shared or mutable block references as needed.Essentially, this relies on the fact that Argon2 does its passes over memory in slices, and the algorithm is specifically designed so that all segments of a slice can be computed in parallel (without data races or atomics). Additionally, synchronization between threads (in the memory ordering sense) is performed after each slice is processed, when the thread
scope
drops.As pointed out in #41, it was also necessary to move the
unsafe
boundary up, since the soundness of these operations depends onfill_segment
and its ancestors having received correct arguments. I've also added the corresponding safety docs and comments.Finally, I've added a few more tests, which successfully run in Miri. The new tests use little memory, but more lanes, and take a few minutes (each) to run in Miri. I also ran an existing 4 MiB test, which took 20 or so minutes.
P.S. Feel free to squash the individual commits when/if the time comes to merge this PR. I've intentionally left the intermediate steps I took in case the review sparks discussions on some of the choices I made.