-
Notifications
You must be signed in to change notification settings - Fork 137
Add an AVX-512-optimized single-buffer MD5 implementation #2573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi all, I tried several versions of this but this is the most straightforward. If there is any interest in optimizing a single-buffer MD5 implementation, I thought this was the best place to start. If we'd like to see what else we can squeeze out in terms of performance with the trade-off of a slightly less clean API or more complex implementation, I'm happy to keep churning on it. |
cmp \$0, $num | ||
jg .L_main_loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's very minor but these two lines could be replaced with jnz .L_main_loop
only
push %rbp | ||
mov %rsp,%rbp | ||
sub $XMM_STORAGE, %rsp | ||
and \$0xffffffffffffffc0,%rsp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this aligns the stack frame to 64 bytes and probably only 16 byte alignment is needed here
5bce68f
to
08c3292
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
#define md5_block_data_order md5_block_asm_data_order | ||
#else | ||
static void md5_block_data_order(uint32_t *state, const uint8_t *data, | ||
size_t num); | ||
#endif | ||
|
||
void MD5_Transform(MD5_CTX *c, const uint8_t data[MD5_CBLOCK]) { | ||
md5_block_data_order(c->h, data, 1); | ||
void (*block_func)(uint32_t *state, const uint8_t *data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'block_func' is not initialized [cppcoreguidelines-init-variables]
void (*block_func)(uint32_t *state, const uint8_t *data, | |
void (*block_func = NULL)(uint32_t *state, const uint8_t *data, |
block_func = md5_block_data_order; | ||
|
||
#if defined(MD5_ASM_AVX512) | ||
if (!CRYPTO_is_AVX512_capable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: call to undeclared function 'CRYPTO_is_AVX512_capable'; ISO C99 and later do not support implicit function declarations [clang-diagnostic-implicit-function-declaration]
if (!CRYPTO_is_AVX512_capable()) {
^
} | ||
|
||
int MD5_Update(MD5_CTX *c, const void *data, size_t len) { | ||
crypto_md32_update(&md5_block_data_order, c->h, c->data, MD5_CBLOCK, &c->num, | ||
void (*block_func)(uint32_t *state, const uint8_t *data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'block_func' is not initialized [cppcoreguidelines-init-variables]
void (*block_func)(uint32_t *state, const uint8_t *data, | |
void (*block_func = NULL)(uint32_t *state, const uint8_t *data, |
block_func = md5_block_data_order; | ||
|
||
#if defined(MD5_ASM_AVX512) | ||
if (!CRYPTO_is_AVX512_capable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: call to undeclared function 'CRYPTO_is_AVX512_capable'; ISO C99 and later do not support implicit function declarations [clang-diagnostic-implicit-function-declaration]
if (!CRYPTO_is_AVX512_capable()) {
^
&c->Nh, &c->Nl, data, len); | ||
return 1; | ||
} | ||
|
||
int MD5_Final(uint8_t out[MD5_DIGEST_LENGTH], MD5_CTX *c) { | ||
crypto_md32_final(&md5_block_data_order, c->h, c->data, MD5_CBLOCK, &c->num, | ||
void (*block_func)(uint32_t *state, const uint8_t *data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variable 'block_func' is not initialized [cppcoreguidelines-init-variables]
void (*block_func)(uint32_t *state, const uint8_t *data, | |
void (*block_func = NULL)(uint32_t *state, const uint8_t *data, |
block_func = md5_block_data_order; | ||
|
||
#if defined(MD5_ASM_AVX512) | ||
if (!CRYPTO_is_AVX512_capable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: call to undeclared function 'CRYPTO_is_AVX512_capable'; ISO C99 and later do not support implicit function declarations [clang-diagnostic-implicit-function-declaration]
if (!CRYPTO_is_AVX512_capable()) {
^
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2573 +/- ##
==========================================
- Coverage 78.72% 78.71% -0.01%
==========================================
Files 645 645
Lines 110642 110655 +13
Branches 15647 15652 +5
==========================================
+ Hits 87101 87103 +2
- Misses 22841 22850 +9
- Partials 700 702 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Hi Dan, thanks for your contribution! We are discussing internally, trying to figure out if we have any customer demand for faster single buffer MD5. We'll get back to you. |
This PR adds a block order implementation optimized with AVX-512's
vpternlog
instruction for the mixing functions.The table below shows the results of
bssl speed
for a Intel(R) Xeon(R) 6787P (Granite Rapids) CPU using the command:main
)md5
)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.