-
Notifications
You must be signed in to change notification settings - Fork 307
HPCC-33474 Introduce a new faster compression for inplace indexes #19575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33474 Jirabot Action Result: |
I have tested the regression system, that some of my sample synthetic data indexes generate the same data and also run the regression suite with -fdefaultIndexCompression=inplace. |
NOTE: This is currently targetting 9.10.x, but my plan is to merge against 9.2.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One ignorable question about a value range.
system/jlib/jlz4.cpp
Outdated
auto processOption = [this](const char * option, const char * value) | ||
{ | ||
if (strieq(option, "hclevel")) | ||
hcLevel = atoi(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a bounds check on this to keep it between 1 and LZ4HC_CLEVEL_MAX? Or at least add a comment noting the valid range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lot to go over in serious detail.
It looks good from an initial review.
There is also the copyexp program in tools that we could extend to use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spot anything significant, just some typos in comments
system/jlib/jlz4.cpp
Outdated
// and because the input buffer cannot be reallocated with the streaming api | ||
size32_t inlen = 0; // total length of input data | ||
size32_t lastCompress = 0; // the offset in the input data that has not been compressed yet. | ||
size32_t outlen = 0; // How mucch compressed data has been generated, or final size once closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: mucch
system/jlib/jlz4.cpp
Outdated
if (tryCompress()) | ||
return true; | ||
|
||
//No benefit in trying to recompress if there is no more data to compress, and we alredy have a single block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: alredy
system/jlib/jlz4.cpp
Outdated
return 0; | ||
} | ||
|
||
//Save all the input text - it LZ4 needs all the data, and we also recompress all data so far sometimes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar: The word "it" seems spurious here
Signed-off-by: Gavin Halliday <[email protected]>
Jirabot Action Result: |
Type of change:
Checklist:
Smoketest:
Testing: