Skip to content

HPCC-33474 Introduce a new faster compression for inplace indexes #19575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 6, 2025

Conversation

ghalliday
Copy link
Member

@ghalliday ghalliday commented Feb 28, 2025

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33474

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@ghalliday ghalliday requested a review from dcamper February 28, 2025 16:55
@ghalliday
Copy link
Member Author

I have tested the regression system, that some of my sample synthetic data indexes generate the same data and also run the regression suite with -fdefaultIndexCompression=inplace.
See the jira for more details.

@ghalliday
Copy link
Member Author

NOTE: This is currently targetting 9.10.x, but my plan is to merge against 9.2.x

Copy link
Contributor

@dcamper dcamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One ignorable question about a value range.

auto processOption = [this](const char * option, const char * value)
{
if (strieq(option, "hclevel"))
hcLevel = atoi(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a bounds check on this to keep it between 1 and LZ4HC_CLEVEL_MAX? Or at least add a comment noting the valid range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add that.

Copy link
Contributor

@mckellyln mckellyln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lot to go over in serious detail.
It looks good from an initial review.
There is also the copyexp program in tools that we could extend to use this.

Copy link
Member

@richardkchapman richardkchapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spot anything significant, just some typos in comments

// and because the input buffer cannot be reallocated with the streaming api
size32_t inlen = 0; // total length of input data
size32_t lastCompress = 0; // the offset in the input data that has not been compressed yet.
size32_t outlen = 0; // How mucch compressed data has been generated, or final size once closed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: mucch

if (tryCompress())
return true;

//No benefit in trying to recompress if there is no more data to compress, and we alredy have a single block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: alredy

return 0;
}

//Save all the input text - it LZ4 needs all the data, and we also recompress all data so far sometimes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: The word "it" seems spurious here

@ghalliday ghalliday changed the base branch from candidate-9.10.x to candidate-9.6.x March 6, 2025 17:20
@ghalliday ghalliday merged commit 0c89b78 into hpcc-systems:candidate-9.6.x Mar 6, 2025
20 checks passed
Copy link

github-actions bot commented Mar 6, 2025

Jirabot Action Result:
Added fix version: 9.6.90
Added fix version: 9.8.66
Added fix version: 9.10.12
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants