Skip to content

Add an AVX-512-optimized single-buffer MD5 implementation #2573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pittma
Copy link
Contributor

@pittma pittma commented Jul 23, 2025

This PR adds a block order implementation optimized with AVX-512's vpternlog instruction for the mixing functions.

The table below shows the results of bssl speed for a Intel(R) Xeon(R) 6787P (Granite Rapids) CPU using the command:

taskset -c 0 ./tool/bssl speed -filter MD5
Bytes MB/s (main) MB/s (md5) % diff
16 103.8 106.5 2.6%
256 566.6 611.5 7.9%
1350 770.3 856.6 11.2%
8192 822.6 925.7 12.5%
16384 830.8 933.9 12.4%

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@pittma pittma requested a review from a team as a code owner July 23, 2025 22:55
@pittma
Copy link
Contributor Author

pittma commented Jul 23, 2025

Hi all, I tried several versions of this but this is the most straightforward. If there is any interest in optimizing a single-buffer MD5 implementation, I thought this was the best place to start. If we'd like to see what else we can squeeze out in terms of performance with the trade-off of a slightly less clean API or more complex implementation, I'm happy to keep churning on it.

Comment on lines 245 to 246
cmp \$0, $num
jg .L_main_loop

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very minor but these two lines could be replaced with jnz .L_main_loop only

push %rbp
mov %rsp,%rbp
sub $XMM_STORAGE, %rsp
and \$0xffffffffffffffc0,%rsp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this aligns the stack frame to 64 bytes and probably only 16 byte alignment is needed here

@pittma pittma force-pushed the md5 branch 2 times, most recently from 5bce68f to 08c3292 Compare July 28, 2025 17:20
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

#define md5_block_data_order md5_block_asm_data_order
#else
static void md5_block_data_order(uint32_t *state, const uint8_t *data,
size_t num);
#endif

void MD5_Transform(MD5_CTX *c, const uint8_t data[MD5_CBLOCK]) {
md5_block_data_order(c->h, data, 1);
void (*block_func)(uint32_t *state, const uint8_t *data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'block_func' is not initialized [cppcoreguidelines-init-variables]

Suggested change
void (*block_func)(uint32_t *state, const uint8_t *data,
void (*block_func = NULL)(uint32_t *state, const uint8_t *data,

block_func = md5_block_data_order;

#if defined(MD5_ASM_AVX512)
if (!CRYPTO_is_AVX512_capable()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: call to undeclared function 'CRYPTO_is_AVX512_capable'; ISO C99 and later do not support implicit function declarations [clang-diagnostic-implicit-function-declaration]

  if (!CRYPTO_is_AVX512_capable()) {
       ^

}

int MD5_Update(MD5_CTX *c, const void *data, size_t len) {
crypto_md32_update(&md5_block_data_order, c->h, c->data, MD5_CBLOCK, &c->num,
void (*block_func)(uint32_t *state, const uint8_t *data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'block_func' is not initialized [cppcoreguidelines-init-variables]

Suggested change
void (*block_func)(uint32_t *state, const uint8_t *data,
void (*block_func = NULL)(uint32_t *state, const uint8_t *data,

block_func = md5_block_data_order;

#if defined(MD5_ASM_AVX512)
if (!CRYPTO_is_AVX512_capable()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: call to undeclared function 'CRYPTO_is_AVX512_capable'; ISO C99 and later do not support implicit function declarations [clang-diagnostic-implicit-function-declaration]

  if (!CRYPTO_is_AVX512_capable()) {
       ^

&c->Nh, &c->Nl, data, len);
return 1;
}

int MD5_Final(uint8_t out[MD5_DIGEST_LENGTH], MD5_CTX *c) {
crypto_md32_final(&md5_block_data_order, c->h, c->data, MD5_CBLOCK, &c->num,
void (*block_func)(uint32_t *state, const uint8_t *data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'block_func' is not initialized [cppcoreguidelines-init-variables]

Suggested change
void (*block_func)(uint32_t *state, const uint8_t *data,
void (*block_func = NULL)(uint32_t *state, const uint8_t *data,

block_func = md5_block_data_order;

#if defined(MD5_ASM_AVX512)
if (!CRYPTO_is_AVX512_capable()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: call to undeclared function 'CRYPTO_is_AVX512_capable'; ISO C99 and later do not support implicit function declarations [clang-diagnostic-implicit-function-declaration]

  if (!CRYPTO_is_AVX512_capable()) {
       ^

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 47.05882% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.71%. Comparing base (fb9104d) to head (127a280).

Files with missing lines Patch % Lines
crypto/fipsmodule/md5/md5_test.cc 0.00% 4 Missing and 1 partial ⚠️
crypto/fipsmodule/md5/md5.c 66.66% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2573      +/-   ##
==========================================
- Coverage   78.72%   78.71%   -0.01%     
==========================================
  Files         645      645              
  Lines      110642   110655      +13     
  Branches    15647    15652       +5     
==========================================
+ Hits        87101    87103       +2     
- Misses      22841    22850       +9     
- Partials      700      702       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dkostic
Copy link
Contributor

dkostic commented Jul 30, 2025

Hi Dan, thanks for your contribution! We are discussing internally, trying to figure out if we have any customer demand for faster single buffer MD5. We'll get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants