Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for RDB analysis reports #1743

Open
wants to merge 5 commits into
base: unstable
Choose a base branch
from

Conversation

artikell
Copy link
Contributor

@artikell artikell commented Feb 16, 2025

The PR add analysis capabilities to the valkey-check-rdb tool. Now the tool can statistically analyze various types of keys, expired keys, and key elements.

  • Added the --profiler parameter to the valkey-check-rdb tool's rdb analysis functionality, which enables opening reports.
  • Supported the --format parameter with three options, table/csv/info, to provide more flexible output formats.
  • Add function & lua script number info

The content format is as follows:

... ...
[offset 8999] \o/ RDB looks OK! \o/
[info] 60 keys read
[info] 0 expires
[info] 0 already expired
db.0.type.name                  string  list    set     zset    hash    module  stream
db.0.keys.total                 10      0       0       0       0       0       0    
db.0.exipre_keys.total          0       0       0       0       0       0       0    
db.0.already_expired.total      0       0       0       0       0       0       0    
db.0.keys.size                  200     0       0       0       0       0       0    
db.0.keys.value_size            2000    0       0       0       0       0       0    
db.0.elements.total             10      0       0       0       0       0       0    
db.0.elements.size              2000    0       0       0       0       0       0    
db.0.elements.num.max           1       0       0       0       0       0       0    
db.0.elements.num.avg           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p99           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p90           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p50           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.max          200     0       0       0       0       0       0    
db.0.elements.size.avg          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p99          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p90          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p50          200.00  0.00    0.00    0.00    0.00    0.00    0.00

Copy link

codecov bot commented Feb 16, 2025

Codecov Report

Attention: Patch coverage is 78.11321% with 58 lines in your changes missing coverage. Please review.

Project coverage is 70.96%. Comparing base (c5ae37f) to head (fac1d67).
Report is 57 commits behind head on unstable.

Files with missing lines Patch % Lines
src/valkey-check-rdb.c 78.11% 58 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1743      +/-   ##
============================================
- Coverage     71.17%   70.96%   -0.22%     
============================================
  Files           123      123              
  Lines         65536    65927     +391     
============================================
+ Hits          46645    46783     +138     
- Misses        18891    19144     +253     
Files with missing lines Coverage Δ
src/valkey-check-rdb.c 73.97% <78.11%> (+9.38%) ⬆️

... and 45 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: wei.kukey <[email protected]>
Signed-off-by: artikell <[email protected]>
@artikell artikell force-pushed the feat/profile_feature branch from 523708d to 04c476c Compare February 16, 2025 12:30
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Not sure about the argument name --profiler. Profiler is more often used for things like CPU time.

How about --stats or --verbose?

An RDB file can also contain Functions IIRC. Shall we print the count too?

@enjoy-binbin I know you wanted to print the eval scripts in old RDB files. Maybe we can print the number of scripts as part of stats here too?

@artikell
Copy link
Contributor Author

--stats seems more suitable for this scenario, let me add the data of the function.

@artikell artikell requested a review from zuiderkwast March 24, 2025 15:17
@enjoy-binbin
Copy link
Member

@enjoy-binbin I know you wanted to print the eval scripts in old RDB files. Maybe we can print the number of scripts as part of stats here too?

You have a wrong memory, i dont want to print the lua scripts, i want to print the number of scripts. See #1448

@zuiderkwast
Copy link
Contributor

@enjoy-binbin I know you wanted to print the eval scripts in old RDB files. Maybe we can print the number of scripts as part of stats here too?

You have a wrong memory, i dont want to print the lua scripts, i want to print the number of scripts. See #1448

Sorry, I mean number of scripts. This PR prints the number of many things, such as the of functions, number of keys of each type, etc. To print the number of lua scripts if any would make sense here? Maybe only print it if it's non-zero, because it's only for quite old RDB files. WDYT?

@enjoy-binbin
Copy link
Member

This PR prints the number of many things, such as the of functions, number of keys of each type, etc. To print the number of lua scripts if any would make sense here? Maybe only print it if it's non-zero, because it's only for quite old RDB files. WDYT?

SGTM, we can print it (that is what i was looking for at that time). I can re-open the old PR if needed if you want to keep the history. Or we can do it in this PR.

@zuiderkwast
Copy link
Contributor

Or we can do it in this PR.

I guess in this PR is better in that case. This PR prints the info in another way. Or maybe reopen it after this PR is merged.

@artikell do you want to add the number of lua scripts from old RDB files as in #1448? They are just aux-fields with the name "lua". I suggest print them only if the number is non-zero since it's old stuff that will always be zero except in very old RDB files.

@zuiderkwast
Copy link
Contributor

@artikell What is the "form" formats? Can you mention it in the PR top comment?

What's the use case of the "form" format? If it is to consume by scripts, I guess CSV or JSON formats is more common. We already use those formats in valkey-cli and valkey-benchmark.

@artikell
Copy link
Contributor Author

artikell commented Mar 25, 2025

@artikell What is the "form" formats? Can you mention it in the PR top comment?

What's the use case of the "form" format? If it is to consume by scripts, I guess CSV or JSON formats is more common. We already use those formats in valkey-cli and valkey-benchmark.

  • The form is a format aligned by tabs.(better to call it "table"?) I think it is more conducive to people's reading. Just like the format in the commit, the result is that it is aligned. In my opinion, CSV can also be supported.

  • I have considered using JSON, but implementing the YAML format is a bit simpler (at least, all it takes is adding some spaces).

Regarding the number of Lua scripts, I will refer to Binbin's MR to make the addition in this MR.

@zuiderkwast
Copy link
Contributor

  • The form is a format aligned by tabs.(better to call it "table"?) I think it is more conducive to people's reading.

Yes maybe "table" is a better name. The example in the PR comment is the "info" format? It is also readable for users. Not a big difference to "form"? I'm thinking maybe we don't need both....

For CSV, JSON, YAML, I don't have a strong opinion but it seems nice to align the output format from our different tools. CSV seems to be supported by most of them so add it here too?

@artikell
Copy link
Contributor Author

artikell commented Mar 25, 2025

  • The form is a format aligned by tabs.(better to call it "table"?) I think it is more conducive to people's reading.

Yes maybe "table" is a better name. The example in the PR comment is the "info" format? It is also readable for users. Not a big difference to "form"? I'm thinking maybe we don't need both....

For CSV, JSON, YAML, I don't have a strong opinion but it seems nice to align the output format from our different tools. CSV seems to be supported by most of them so add it here too?

@zuiderkwast I've understood your thoughts. Now I've changed it to three formats:

  • Table: It is the default structure in the PR comment, which is conducive to human reading.
  • CSV: A formatted output that is convenient for other scripts to parse.
  • Info: Referring to the original format, it is used for the TCL script to make correctness judgments.

This is CSV output:
image

This is Info output:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants