Skip to content

Conversation

@romange
Copy link
Collaborator

@romange romange commented Nov 24, 2025

The PR adds a background task that runs only once on db 0 and computes huffman table
but only on datasets that fit the requirements - key memory usage is significant
enough to do such analysis. At the end of the process a metric increased to signal
that the huffman table was computed and printed to logs.

@romange romange force-pushed the Pr3 branch 2 times, most recently from dad6eef to 4b0a33a Compare November 24, 2025 17:58
@romange romange requested a review from Copilot November 24, 2025 17:58
Copilot finished reviewing on behalf of romange November 24, 2025 18:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a background task that automatically identifies opportunities for Huffman encoding compression on datasets with significant key memory usage. The task runs once on shard 0, analyzes key patterns to build a Huffman encoding table, and logs the results along with updating a Prometheus metric.

Key changes:

  • Adds HuffmanCheckTask class that incrementally scans keys and builds character frequency histograms
  • Triggers analysis when key memory usage exceeds 50MB (release) or 5MB (debug) and represents more than 1/8 of object memory
  • Exposes dragonfly_huffman_tables_built metric to track table generation

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/server/engine_shard.cc Implements HuffmanCheckTask class and integrates it into the heartbeat mechanism to run once on shard 0
src/server/engine_shard.h Adds huffman_tables_built stat counter, huffman_check_task_id_ field, and fixes enum underlying type
src/server/server_family.cc Exposes huffman_tables_built metric to Prometheus
src/server/debugcmd.cc Converts Huffman-related constants to constexpr for better type safety
src/server/db_slice.h Corrects comment typo ("everride" -> "override")
tests/dragonfly/server_family_test.py Adds test for Huffman table building and moves testcontainers imports to function scope
tests/dragonfly/connection_test.py Adds unused port variable extraction
Comments suppressed due to low confidence (1)

tests/dragonfly/connection_test.py:644

  • Variable port is not used.
    port = async_client.connection_pool.connection_kwargs["port"]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The PR adds a background task that runs only once on db 0 and computes huffman table
but only on datasets that fit the requirements - key memory usage is significant
enough to do such analysis. At the end of the process a metric increased to signal
that the huffman table was computed and printed to logs.

Signed-off-by: Roman Gershman <[email protected]>
Copilot finished reviewing on behalf of romange November 24, 2025 18:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@romange
Copy link
Collaborator Author

romange commented Nov 24, 2025

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Contributor

Copilot AI commented Nov 24, 2025

@romange I've opened a new pull request, #6108, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Roman Gershman <[email protected]>
dranikpg
dranikpg previously approved these changes Nov 26, 2025
Comment on lines +121 to +127
class HuffmanCheckTask {
public:
HuffmanCheckTask() {
hist_.fill(0);
}

int32_t Run(DbSlice* db_slice);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My dream once was to turn every on idle task into a background fiber... 😞

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the same though? I clearly remember that OnIdle tasks run only when we do not have any CPU work to do and can be starved indefinitely. I thought that background fibers a bit more agressive. Also OnIdle tasks provide a way to throttle themselves and background fibers do not do that,, imho

// incrementally aggregate frequency histogram.
auto& prime = db_table->prime;

constexpr uint32_t kMaxTraverses = 512;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that you run it without flags or anything and it might freeze when iterating 512 of big strings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A valid concern, even though I do not think keys are long for real workloads. I will add a protection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, didn't think about it being just keys, I assume that in the future we want to test values as well

Comment on lines 724 to 729
HuffmanCheckTask* task = new HuffmanCheckTask();
huffman_check_task_id_ = ProactorBase::me()->AddOnIdleTask([task] {
if (!shard_ || !namespaces) {
delete task;
return -1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little sloppy overall, like it was written for a POC. If it becomes more complicated, it would be preferrably to refactor it a little. For example, you don't need the pointer, it can be just a member of the lambda,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! :)
what do you mean by that? how lambda can have a member?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just being honest 😅
[task = HummanCheckTasks{}] mutable {} , desotrying the lambda will destroy the task

@romange romange merged commit 497f253 into main Nov 27, 2025
10 checks passed
@romange romange deleted the Pr3 branch November 27, 2025 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants