Skip to content

Conversation

@bjorn3
Copy link
Collaborator

@bjorn3 bjorn3 commented Jan 26, 2026

This roughly halves the perf hit compared with the C original.

This also introduces a unsafe-nochecks feature to disable a couple of very hot bounds checks and get rid of a mask operation inserted by rustc for a shift in get_middle_bits. The latter may be possible to get rid without unsafe code of in the future once rust gets pattern types.

@bjorn3 bjorn3 requested a review from folkertdev January 26, 2026 14:45
@bjorn3 bjorn3 force-pushed the perf_improvements branch from 4b3e37e to 0e6afa7 Compare January 26, 2026 14:45
Comment on lines +155 to +158
if nbBits >= MASK.len() as u32 {
unsafe { std::hint::unreachable_unchecked() };
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, we'd need to justify this right? probably we just inline this function altogether and then it's obvious?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did have to recursively inline look_bits and read_bits too then. And even then some of the callers of read_bits take the bit count from a field whose range is bounds checked in an entirely separate part of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I looked at the wrong thing. So, if we can't justify it, it should be behind the new feature flag.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to do that. Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to push?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, forgot --force-with-lease.

@bjorn3 bjorn3 force-pushed the perf_improvements branch from 0e6afa7 to 2623836 Compare January 26, 2026 14:59
Inclusive ranges don't optimize well. Iterating in the forward direction
seems to be every so slightly cheaper.
@bjorn3 bjorn3 force-pushed the perf_improvements branch from 2623836 to e092ff1 Compare January 26, 2026 15:00
@codecov
Copy link

codecov bot commented Jan 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
test-aarch64-apple-darwin 32.93% <100.00%> (-0.74%) ⬇️
test-aarch64-unknown-linux-gnu 31.64% <100.00%> (-0.02%) ⬇️
test-i686-unknown-linux-gnu 31.70% <100.00%> (-0.02%) ⬇️
test-x86_64-unknown-linux-gnu 33.42% <100.00%> (-0.91%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
lib/common/bitstream.rs 98.03% <ø> (-0.01%) ⬇️
lib/decompress/zstd_decompress_block.rs 76.25% <100.00%> (+0.01%) ⬆️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bjorn3 bjorn3 force-pushed the perf_improvements branch from e092ff1 to 007e418 Compare January 26, 2026 15:21
@folkertdev folkertdev merged commit e96d3b1 into main Jan 26, 2026
38 of 39 checks passed
@bjorn3 bjorn3 deleted the perf_improvements branch January 26, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants