-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds CLI tool to print BucketList archival stats #4154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks like a really helpful command! Left a few questions
src/bucket/BucketManagerImpl.cpp
Outdated
std::map<LedgerKey, LedgerEntry> | ||
BucketManagerImpl::loadCompleteLedgerState(HistoryArchiveState const& has) | ||
static std::vector<std::pair<Hash, std::string>> | ||
getBucketHashes(HistoryArchiveState const& has) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this functionality seems redudant: can we re-purpose HistoryArchiveState::allBuckets
to avoid duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually can't use that here, since HistoryArchiveState::allBuckets
includes future bucket hashes and returns the bucket hashes sorted lexicographically by hash instead of in BucketList order. This was helpful though, I realized we were iterating through the BucketList in reverse order instead of in order. I've removed the getBucketHashes
function since loadCompleteLedgerState
and dumpStateArchivalStatistics
iterate the BubkcetList in different directions.
docs/software/commands.md
Outdated
@@ -84,6 +84,7 @@ Command options can only by placed after command. | |||
See more examples in [ledger_query_examples.md](ledger_query_examples.md). | |||
|
|||
* **dump-xdr <FILE-NAME>**: Dumps the given XDR file and then exits. | |||
* **dump-archival-stats**: Logs state archival statistics about the BucketList and then exits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the exit comment a bit redundant since it's the default behavior for offline commands
src/bucket/BucketManager.h
Outdated
// Logs state archival statistics, such as the number of expired entries | ||
// currently in the BucketList, number of bytes of evicted entries, etc. | ||
virtual void | ||
dumpStateArchivalStatistics(HistoryArchiveState const& has) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem like this function belongs to the BucketManager interface as it shouldn't rely on state (so probably should be a free util function instead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i think StateArchivalMetric should be moved elsewhere too)
src/main/ApplicationUtils.cpp
Outdated
@@ -535,14 +535,6 @@ mergeBucketList(Config cfg, std::string const& outputDir) | |||
Application::pointer app = Application::create(clock, cfg, false); | |||
app->getLedgerManager().loadLastKnownLedger(nullptr); | |||
auto& lm = app->getLedgerManager(); | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking, did this get removed because it doesn't work correctly with BucketListDB? (which would be fixed in #4166)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've reverted this after rebasing.
src/bucket/BucketManagerImpl.cpp
Outdated
|
||
CLOG_INFO(Bucket, "BucketList total bytes: {}", blSize); | ||
CLOG_INFO(Bucket, | ||
"Live Temporary Entries: Non-shadows {} bytes ({}%), Shadowed {} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding terminology, do you think we could move away from shadows
and maybe call these entries newest vs non-newest (or something like that)? older protocols use the term shadows to refer to something else, and while current protocol doesn't have shadows anymore, we still need to support replay of older buckets in the codebase, so it would be nice to avoid any confusion here.
src/bucket/BucketManagerImpl.cpp
Outdated
// *BytesNoShadow == bytes consumed only by newest version of BucketEntry | ||
// *BytesShadows == bytes consumed only by shadows, does not count newest | ||
// version of BucketEntry | ||
// live -> liveUntilLedger > ledgerSeq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>=?
src/bucket/BucketManagerImpl.cpp
Outdated
if (iter == map.end()) | ||
{ | ||
StateArchivalMetric metric; | ||
metric.le = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can get pretty expensive memory-wise if we're dealing with a production bucketlist with a large number of temp entries (in order of GBs). Perhaps we should only store the size and expiration ledger of LEs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also keep track of whether the newest version of a key is DEADENTRY or LIVEENTRY, but you're right, storing the whole LedgerEntry is unnecessary,
5e90e09
to
02c2edc
Compare
LGTM, but looks like commit signing is broken in this PR |
02c2edc
to
70d0f7c
Compare
Fixed |
r+ 70d0f7c |
Description
Resolves #4138
Adds a CLI tool that logs some state archival related metrics from the BucketList. Specifically, the tool logs the amount of bytes currently consumed by temporary entries. It divides the temporary entries into three categories: live, expired but no evicted, and evicted.
Checklist
clang-format
v8.0.0 (viamake format
or the Visual Studio extension)