Skip to content

Conversation

@RenjiSann
Copy link
Collaborator

This MR brings a few changes to simplify and optimize some parts of the checksum computation

It depends on the #9511 MR

@RenjiSann RenjiSann self-assigned this Dec 1, 2025
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

GNU testsuite comparison:

Congrats! The gnu test tests/cksum/cksum-c is no longer failing!

@RenjiSann RenjiSann changed the title cksum: small improvements cksum: small improvements, l10n Dec 1, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 1, 2025

CodSpeed Performance Report

Merging #9532 will degrade performances by 16.41%

Comparing RenjiSann:cksum-improvements (935f0a9) with main (22a8732)

Summary

⚡ 1 improvement
❌ 13 regressions
✅ 112 untouched
⏩ 6 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
wc_bytes_synthetic[500] 163.9 µs 182.4 µs -10.16%
b64_decode_synthetic 149.1 µs 168.9 µs -11.74%
b64_decode_ignore_garbage_synthetic 149.2 µs 166.8 µs -10.55%
b64_encode_synthetic 146.3 µs 164.4 µs -11.04%
cksum_blake3 190.7 µs 207.7 µs -8.18%
sort_ascii_c_locale 22.7 ms 21.7 ms +4.51%
sort_reverse_mixed 38.2 ms 39.2 ms -2.57%
ls_recursive_wide_tree[(10000, 1000)] 51.3 ms 52.5 ms -2.33%
cp_large_file[16] 341.3 µs 359 µs -4.94%
mv_single_file 129.7 ms 136 ms -4.63%
mv_force_overwrite 125.3 ms 138.4 ms -9.51%
factor_multiple_u64s[2] 178.3 ms 213.3 ms -16.41%
split_number_chunks 275.4 µs 287.7 µs -4.27%
rm_single_file 106.6 ms 119.6 ms -10.91%

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/cksum/cksum-c is no longer failing!

@RenjiSann RenjiSann force-pushed the cksum-improvements branch 2 times, most recently from 4e9151b to 086d610 Compare December 1, 2025 10:40
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

GNU testsuite comparison:

Congrats! The gnu test tests/cksum/cksum-c is no longer failing!

@RenjiSann RenjiSann force-pushed the cksum-improvements branch 2 times, most recently from c96ed79 to dc8063d Compare December 1, 2025 18:32
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@RenjiSann RenjiSann marked this pull request as ready for review December 3, 2025 10:05
@RenjiSann
Copy link
Collaborator Author

RenjiSann commented Dec 3, 2025

It looks like the localized messages I added add a significant overhead to many other tools :/ Not sure how to improve this.

sum.parse::<u16>().unwrap(),
match (options.algo_kind, sum) {
(SizedAlgoKind::Sysv, DigestOutput::U16(sum)) => print!(
"{prefix}{sum} {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the printing of prefix here, and in the other arms of this match, correct? I'm asking because it is already printed on line 155.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's wrong, I forgot to remove the print above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

use hex::encode;
use std::io::{self, Write};

use data_encoding::BASE64;
Copy link
Contributor

@cakebaker cakebaker Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to add data_encoding to the sum feature in src/uucore/Cargo.toml. Currently there is an "unresolved import data_encoding" error when I try to run cargo test --features=sum --no-default-features

Comment on lines +420 to +421
impl_digest_shake!(Shake128, 256);
impl_digest_shake!(Shake256, 512);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the values 256 and 512 correct? With the impl_digest_common macro, the values correspond to the bits in the algorithm name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I saw, Shake algorithms are not available in GNU, so I have no point of comparison. I just used the values that kept the test passing while still removing the code that previously handled the specific case of Shake.

// Implements the Digest trait for sha2 / sha3 algorithms with variable output
macro_rules! impl_digest_shake {
($algo_type: ty) => {
($algo_type: ty, $output_bits: literal) => {
Copy link
Contributor

@cakebaker cakebaker Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason that the definition here is different than the one used for impl_digest_common (literal vs expr)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

literal is more restrictive than expr, and it's enough since I'm writing the number directly. I can switch to expr in case we want to change this later

Copy link
Contributor

@cakebaker cakebaker Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use literal in both cases. In the other macro the number is also written directly.

.all(|&b| b.is_ascii_alphanumeric() || b == b'+' || b == b'/')
{
return None;
while index < checksum.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a for loop so you don't have to increase index yourself.

Suggested change
while index < checksum.len() {
for index in 0..checksum.len() {

// ASCII alphanumeric
[b'A'..=b'Z' | b'a'..=b'z' | b'0'..=b'9', ..] => {
index += 1;
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continue is not necessary.

Suggested change
continue;

[b'+' | b'/', ..] => {
is_base64 = true;
index += 1;
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Suggested change
continue;

Comment on lines 760 to 762
// division by 2 converts the length of the Blake2b checksum from
// hexadecimal characters to bytes, as each byte is represented by
// two hexadecimal characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment no longer matches the code.

Suggested change
// division by 2 converts the length of the Blake2b checksum from
// hexadecimal characters to bytes, as each byte is represented by
// two hexadecimal characters.

(AlgoKind::Blake2b, Some(expected_checksum.len()))
}
algo @ (AlgoKind::Sha2 | AlgoKind::Sha3) => {
// multiplication by 4 to get the number of bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// multiplication by 4 to get the number of bits
// multiplication by 8 to get the number of bits

Copy link
Contributor

@cakebaker cakebaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/overlay-headers (passes in this run but fails in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants