-
Notifications
You must be signed in to change notification settings - Fork 1.1k
chore: implement background self-identification of huffman encoding #6106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dad6eef to
4b0a33a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a background task that automatically identifies opportunities for Huffman encoding compression on datasets with significant key memory usage. The task runs once on shard 0, analyzes key patterns to build a Huffman encoding table, and logs the results along with updating a Prometheus metric.
Key changes:
- Adds
HuffmanCheckTaskclass that incrementally scans keys and builds character frequency histograms - Triggers analysis when key memory usage exceeds 50MB (release) or 5MB (debug) and represents more than 1/8 of object memory
- Exposes
dragonfly_huffman_tables_builtmetric to track table generation
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/server/engine_shard.cc | Implements HuffmanCheckTask class and integrates it into the heartbeat mechanism to run once on shard 0 |
| src/server/engine_shard.h | Adds huffman_tables_built stat counter, huffman_check_task_id_ field, and fixes enum underlying type |
| src/server/server_family.cc | Exposes huffman_tables_built metric to Prometheus |
| src/server/debugcmd.cc | Converts Huffman-related constants to constexpr for better type safety |
| src/server/db_slice.h | Corrects comment typo ("everride" -> "override") |
| tests/dragonfly/server_family_test.py | Adds test for Huffman table building and moves testcontainers imports to function scope |
| tests/dragonfly/connection_test.py | Adds unused port variable extraction |
Comments suppressed due to low confidence (1)
tests/dragonfly/connection_test.py:644
- Variable port is not used.
port = async_client.connection_pool.connection_kwargs["port"]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The PR adds a background task that runs only once on db 0 and computes huffman table but only on datasets that fit the requirements - key memory usage is significant enough to do such analysis. At the end of the process a metric increased to signal that the huffman table was computed and printed to logs. Signed-off-by: Roman Gershman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]> Signed-off-by: Roman Gershman <[email protected]>
| class HuffmanCheckTask { | ||
| public: | ||
| HuffmanCheckTask() { | ||
| hist_.fill(0); | ||
| } | ||
|
|
||
| int32_t Run(DbSlice* db_slice); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My dream once was to turn every on idle task into a background fiber... 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the same though? I clearly remember that OnIdle tasks run only when we do not have any CPU work to do and can be starved indefinitely. I thought that background fibers a bit more agressive. Also OnIdle tasks provide a way to throttle themselves and background fibers do not do that,, imho
| // incrementally aggregate frequency histogram. | ||
| auto& prime = db_table->prime; | ||
|
|
||
| constexpr uint32_t kMaxTraverses = 512; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that you run it without flags or anything and it might freeze when iterating 512 of big strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A valid concern, even though I do not think keys are long for real workloads. I will add a protection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, didn't think about it being just keys, I assume that in the future we want to test values as well
src/server/engine_shard.cc
Outdated
| HuffmanCheckTask* task = new HuffmanCheckTask(); | ||
| huffman_check_task_id_ = ProactorBase::me()->AddOnIdleTask([task] { | ||
| if (!shard_ || !namespaces) { | ||
| delete task; | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a little sloppy overall, like it was written for a POC. If it becomes more complicated, it would be preferrably to refactor it a little. For example, you don't need the pointer, it can be just a member of the lambda,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! :)
what do you mean by that? how lambda can have a member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just being honest 😅
[task = HummanCheckTasks{}] mutable {} , desotrying the lambda will destroy the task
The PR adds a background task that runs only once on db 0 and computes huffman table
but only on datasets that fit the requirements - key memory usage is significant
enough to do such analysis. At the end of the process a metric increased to signal
that the huffman table was computed and printed to logs.