Skip to content

polyval: implement Karatsuba multiplication for arm64 #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 23, 2023

Conversation

ericlagergren
Copy link
Contributor

Improves performance by ~200 MB/s on a 2020 M1.

Improves performance by ~200 MB/s on a 2020 M1.

Signed-off-by: Eric Lagergren <[email protected]>
@ericlagergren
Copy link
Contributor Author

The code is taken from https://github.com/ericlagergren/polyval-rs/tree/dev, which also has "wide" implementations (8 blocks at a time), which has significantly better performance (~0.17 cycles per byte instead of ~2).

@ericlagergren
Copy link
Contributor Author

I also have an x86 version I can submit as well if you'd like.

@tarcieri
Copy link
Member

Parallel and x86 versions would be appreciated, although perhaps as separate PRs to ease reviewability

Copy link
Member

@tarcieri tarcieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on an M2 Max, where I observed the reported speedups.

Percentage-wise it's about a 17% speedup.

@tarcieri tarcieri merged commit 973fe29 into RustCrypto:master Jun 23, 2023
@ericlagergren
Copy link
Contributor Author

Parallel and x86 versions would be appreciated, although perhaps as separate PRs to ease reviewability

Actually, your x86 implementation only uses 3 clmul instructions, so I don't think the serial version can be improved much.

I'll look at adding parallel implementations. Off hand, do you know if the current API supports it? The input probably needs to be in one contiguous buffer. (Maybe not?) But that's the common case, at least for stuff like non-interleaved AES-GCM-SIV or HCTR2.

@tarcieri
Copy link
Member

tarcieri commented Jun 24, 2023

Take a look at poly1305 for an example of a parallel multi-block backend (AVX2): https://github.com/RustCrypto/universal-hashes/blob/0054b30/poly1305/src/backend/avx2.rs#L188-L198

baloo added a commit to baloo/universal-hashes that referenced this pull request Mar 3, 2024
Added
- add `new_with_init_block` (RustCrypto#195)

Changed
- implement Karatsuba multiplication for arm64 (RustCrypto#181)
@baloo baloo mentioned this pull request Mar 3, 2024
tarcieri pushed a commit that referenced this pull request Mar 3, 2024
Added
- add `new_with_init_block` (#195)

Changed
- implement Karatsuba multiplication for arm64 (#181)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants