-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continue to capture SCP messages for previous ledger in database #4121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together! I had a question regarding another scenario we'd want to handle
src/herder/HerderImpl.h
Outdated
@@ -288,6 +289,10 @@ class HerderImpl : public Herder | |||
Application& mApp; | |||
LedgerManager& mLedgerManager; | |||
|
|||
// Set of nodes whose SCP messages from the previous ledger have already | |||
// been stored in the scphistory database table. | |||
UnorderedSet<NodeID> mPrevExternalizedEnvs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right: current implementation only records new statements from nodes we haven't heard from at all in the last consensus round. The scenario we also want to handle is when we receive newer SCP messages from nodes in our quorum set. This can occur if, for example, we have CONFIRM messages from our quorum slice (which is enough to externalize that ledger), but then receive EXTERNALIZE messages from the same nodes after some time. We'd like to store those newer EXTERNALIZE messages as well, to make things easier for downstream systems (e.g. bridges). So I think we'd either want to store the diff in SCP messages, or rewrite the whole externalizing state, depending on the performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this was an oversight on my part, I forgot about that scenario. I'll fix this and do another round of performance testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed up a fix for this in 07defe8. Weirdly enough the performance of rewriting the entire externalizing state is faster than tracking diffs. I think this is because rewriting can be done with a single DELETE
followed by a single bulk INSERT
(I changed saveSCPHistory
to perform a single multi-row INSERT
into scphistory
, which provided a small performance improvement), while tracking diffs requires multiple MODIFY
s and INSERT
s.
Regardless, the performance impact of all of this isn't too bad. Compared to master
on pubnet, the new process of saving SCP messages (doing two saveSCPHistory
calls) takes ~3.7ms longer on Postgres and ~0.9ms longer on SQLite on my machine. In relative terms, that's 31% longer for Postgres and 50% longer for SQLite.
just a drive-by message wrt diff vs rewrite all of SCP state:
|
Good point. I opened #4139 so we don't forget about this.
I don't think the number of competing TxSets will have much of an impact, as this only records the last externalizing message from any given validator. The only impact I could see is if we switched to the diff based approach and competing TxSets resulted in more frequent diffs. However, larger txsets might have an impact. I assume these txsets end up in the ballots in the |
Actually, it looks like the SCP envelopes contain only the hash of the txset. In that case, I don't think txset size has any impact on the performance of saving these envelopes to the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few nit pics and questions.
src/herder/HerderImpl.cpp
Outdated
// NOTE: Consolidating the two saveSCPHistory calls into one transaction | ||
// provides modest performance increases for sqlite and insignificant | ||
// performance increases for postgres. | ||
soci::transaction txscope(mApp.getDatabase().getSession()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Camel case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c950e1c
|
||
if (!envs.empty()) | ||
{ | ||
// Perform multi-row insert into scphistory | ||
auto prepEnv = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change this query to handle collisions or to check for the existence of an entry before inserting it? I think we currently wind up with two copies of every message, since we insert both the currentLedger and currentLedger - 1 messages into the table. INSERT INTO
will create a 2nd copy if we insert the same thing twice. I think this is different than the diff between BALLOT and EXTERNALIZE conversation above since this would store identical copies of the same message, but I'm not super well versed on SCP so I could be missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handles the collision case by deleting all entries from currentLedger
before performing the INSERT
. This also resolves the CONFIRM
vs EXTERNALIZE
issue.
src/herder/test/HerderTests.cpp
Outdated
REQUIRE(getNumSCPHistoryEntries(C, 2) == 0); | ||
|
||
// Get messages from A and B | ||
HerderImpl* herderA = dynamic_cast<HerderImpl*>(&A->getHerder()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cast as HerderImpl&
so we don't have to deal with raw pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c950e1c
src/herder/test/HerderTests.cpp
Outdated
|
||
// A and B should now have 3 entries in their scphistory table for ledger 2. | ||
REQUIRE(getNumSCPHistoryEntries(A, 2) == 3); | ||
REQUIRE(getNumSCPHistoryEntries(B, 2) == 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check that C does not have extra copies of messages? I might have missed something but I think this may be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to explicitly check this case. I added a test for this in d39400b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo, but LGTM, thanks!
src/herder/test/HerderTests.cpp
Outdated
}, | ||
2 * Herder::EXP_LEDGER_TIMESPAN_SECONDS, false); | ||
|
||
// Return the number of entires in a node's scphistory table for the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "entires"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 72b8757
I just rebased this on top of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! Left a few suggestions. Let's plan to ship this change in 20.2.0
src/herder/HerderImpl.cpp
Outdated
// NOTE: Consolidating the two saveSCPHistory calls into one transaction | ||
// provides modest performance increases for sqlite and insignificant | ||
// performance increases for postgres. | ||
soci::transaction txScope(mApp.getDatabase().getSession()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there isn't much benefit to consolidating two calls into one transaction, I'd recommend to remove it for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a8139ed
src/herder/test/HerderTests.cpp
Outdated
}, | ||
2 * Herder::EXP_LEDGER_TIMESPAN_SECONDS, false); | ||
|
||
// A and B should now have 3 entries in their scphistory table for ledger 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check could be made stronger to ensure that we're updating the database with the latest state. Specifically, we want to make sure that if nodes externalized with CONFIRM statements from its quorum set, on the next ledger we actually persist new EXTERNALIZE statements received for that same ledger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a8139ed
This PR enables updating the `scphistory` table during catchup from history. It allows users to specify which archives to use via the `SCP_HISTORY_ARCHIVES` config option. If a user specifies multiple archives, stellar-core will merge the messages from the archives. This is a draft PR because I'm looking for feedback on the approach, but still have some work to do before it is in a mergeable state. Most of the remaining work is to: * Document the new functionality * Write additional tests for: * Merging * Failed downloads * No `SCP_HISTORY_ARCHIVES` * Multiple `SCP_HISTORY_ARCHIVES` * Changes to `ReplayDebugMetaWork` * Integrate changes from stellar#4121 * Address other `TODO`s in the changes
This PR enables updating the `scphistory` table during catchup from history. It allows users to specify which archives to use via the `SCP_HISTORY_ARCHIVES` config option. If a user specifies multiple archives, stellar-core will merge the messages from the archives. This is a draft PR because I'm looking for feedback on the approach, but still have some work to do before it is in a mergeable state. Most of the remaining work is to: * Document the new functionality * Write additional tests for: * Merging * Failed downloads * No `SCP_HISTORY_ARCHIVES` * Multiple `SCP_HISTORY_ARCHIVES` * Changes to `ReplayDebugMetaWork` * Integrate changes from stellar#4121 * Address other `TODO`s in the changes
Continue to capture SCP messages for previous ledger in database Reviewed-by: marta-lokhova
Closes stellar#3769 Upon closing a ledger, nodes record SCP messages received during the ledger period in the `scphistory` database table. However, any messages about that ledger heard *after* close were dropped. This change captures more of those messages by modifying the `processExternalized` function (which is responsible for recording SCP messages on ledger close) to also record any new or more recent messages heard about the previous ledger.
The test failed because it was overly sensitive to the rng seed. I fixed it by increasing some timeouts in the test and also better specifying the state of |
r+ 26bba35 |
Description
Resolves #3769
Upon closing a ledger, nodes record SCP messages received during the ledger period in the
scphistory
database table. However, any messages about that ledger heard after close were ignored. This change captures more of those messages by modifying theprocessExternalized
function (which is responsible for recording SCP messages on ledger close) to also record any new messages heard about the previous ledger.Performance impact
As this change adds additional database interactions to the main thread, it does have a performance impact. I've mitigated this by wrapping the new and old
saveSCPHistory
calls in a single transaction. I measured the performance impacts using Tracy and have summarized the results for SQLite and Postgres below.On SQLite, the use of a single transaction for both
saveSCPHistory
calls effectively negates the performance impact of the additional call. I suspect this is because it saves an expensive write to disk, which dominates the call.On Postgres the performance improvement from using a single transaction is minimal, and does not entirely make up for the additional saveSCPHistory call. On average, saving the SCP history now takes 10% longer than before (averaging 0.82 miliseconds more on my machine). For reference, 0.82 milliseconds makes up 0.5% of the average
processExternalized
call when using the Postgres backend.A 0.5% impact on one of two backends seemed like a good place to stop tweaking and measuring performance impacts, but if that's too large an impact I have a couple more ideas I can try:
scphistory
may be faster than the multiple single row insertssaveSCPHistory
currently performs.Checklist
clang-format
v8.0.0 (viamake format
or the Visual Studio extension)