Continue to capture SCP messages for previous ledger in database #4121

bboston7 · 2024-01-05T01:18:31Z

Description

Resolves #3769

Upon closing a ledger, nodes record SCP messages received during the ledger period in the scphistory database table. However, any messages about that ledger heard after close were ignored. This change captures more of those messages by modifying the processExternalized function (which is responsible for recording SCP messages on ledger close) to also record any new messages heard about the previous ledger.

Performance impact

As this change adds additional database interactions to the main thread, it does have a performance impact. I've mitigated this by wrapping the new and old saveSCPHistory calls in a single transaction. I measured the performance impacts using Tracy and have summarized the results for SQLite and Postgres below.

On SQLite, the use of a single transaction for both saveSCPHistory calls effectively negates the performance impact of the additional call. I suspect this is because it saves an expensive write to disk, which dominates the call.

On Postgres the performance improvement from using a single transaction is minimal, and does not entirely make up for the additional saveSCPHistory call. On average, saving the SCP history now takes 10% longer than before (averaging 0.82 miliseconds more on my machine). For reference, 0.82 milliseconds makes up 0.5% of the average processExternalized call when using the Postgres backend.

A 0.5% impact on one of two backends seemed like a good place to stop tweaking and measuring performance impacts, but if that's too large an impact I have a couple more ideas I can try:

Using a single multi-row insert into scphistory may be faster than the multiple single row inserts saveSCPHistory currently performs.
At first glance, it seems like saving SCP history could be moved off the main thread and onto a worker thread.

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

marta-lokhova

Thanks for putting this together! I had a question regarding another scenario we'd want to handle

marta-lokhova · 2024-01-08T20:38:44Z

src/herder/HerderImpl.h

@@ -288,6 +289,10 @@ class HerderImpl : public Herder
    Application& mApp;
    LedgerManager& mLedgerManager;

+    // Set of nodes whose SCP messages from the previous ledger have already
+    // been stored in the scphistory database table.
+    UnorderedSet<NodeID> mPrevExternalizedEnvs;


This isn't quite right: current implementation only records new statements from nodes we haven't heard from at all in the last consensus round. The scenario we also want to handle is when we receive newer SCP messages from nodes in our quorum set. This can occur if, for example, we have CONFIRM messages from our quorum slice (which is enough to externalize that ledger), but then receive EXTERNALIZE messages from the same nodes after some time. We'd like to store those newer EXTERNALIZE messages as well, to make things easier for downstream systems (e.g. bridges). So I think we'd either want to store the diff in SCP messages, or rewrite the whole externalizing state, depending on the performance impact.

Ah, this was an oversight on my part, I forgot about that scenario. I'll fix this and do another round of performance testing.

I pushed up a fix for this in 07defe8. Weirdly enough the performance of rewriting the entire externalizing state is faster than tracking diffs. I think this is because rewriting can be done with a single DELETE followed by a single bulk INSERT (I changed saveSCPHistory to perform a single multi-row INSERT into scphistory, which provided a small performance improvement), while tracking diffs requires multiple MODIFYs and INSERTs.

Regardless, the performance impact of all of this isn't too bad. Compared to master on pubnet, the new process of saving SCP messages (doing two saveSCPHistory calls) takes ~3.7ms longer on Postgres and ~0.9ms longer on SQLite on my machine. In relative terms, that's 31% longer for Postgres and 50% longer for SQLite.

MonsieurNicolas · 2024-01-11T20:37:09Z

just a drive-by message wrt diff vs rewrite all of SCP state:

right now tier1 contains I think 23 validators but we expect this to grow by 2-3x. That being said: if this becomes a problem, we could track (as a different issue) to change how/where we record SCP messages.
on the txset front (not sure if it matters here)
- with more validators we're more likely to encounter more (competing) TxSets
- with TPS increase, txsets themselves are going to be bigger

bboston7 · 2024-01-12T01:02:51Z

right now tier1 contains I think 23 validators but we expect this to grow by 2-3x. That being said: if this becomes a problem, we could track (as a different issue) to change how/where we record SCP messages.

Good point. I opened #4139 so we don't forget about this.

on the txset front (not sure if it matters here)

with more validators we're more likely to encounter more (competing) TxSets

with TPS increase, txsets themselves are going to be bigger

I don't think the number of competing TxSets will have much of an impact, as this only records the last externalizing message from any given validator. The only impact I could see is if we switched to the diff based approach and competing TxSets resulted in more frequent diffs.

However, larger txsets might have an impact. I assume these txsets end up in the ballots in the Value field, thus making the ballots themselves larger. Is that right?

bboston7 · 2024-01-12T04:42:00Z

However, larger txsets might have an impact. I assume these txsets end up in the ballots in the Value field, thus making the ballots themselves larger. Is that right?

Actually, it looks like the SCP envelopes contain only the hash of the txset. In that case, I don't think txset size has any impact on the performance of saving these envelopes to the database.

SirTyson

Looks good! Just a few nit pics and questions.

SirTyson · 2024-01-22T20:41:51Z

src/herder/HerderImpl.cpp

+        // NOTE: Consolidating the two saveSCPHistory calls into one transaction
+        // provides modest performance increases for sqlite and insignificant
+        // performance increases for postgres.
+        soci::transaction txscope(mApp.getDatabase().getSession());


Nit: Camel case

Done in c950e1c

SirTyson · 2024-01-22T20:58:31Z

src/herder/HerderPersistenceImpl.cpp

+
+    if (!envs.empty())
+    {
+        // Perform multi-row insert into scphistory
        auto prepEnv =


Do we need to change this query to handle collisions or to check for the existence of an entry before inserting it? I think we currently wind up with two copies of every message, since we insert both the currentLedger and currentLedger - 1 messages into the table. INSERT INTO will create a 2nd copy if we insert the same thing twice. I think this is different than the diff between BALLOT and EXTERNALIZE conversation above since this would store identical copies of the same message, but I'm not super well versed on SCP so I could be missing something here.

This handles the collision case by deleting all entries from currentLedger before performing the INSERT. This also resolves the CONFIRM vs EXTERNALIZE issue.

SirTyson · 2024-01-22T20:59:49Z

src/herder/test/HerderTests.cpp

+    REQUIRE(getNumSCPHistoryEntries(C, 2) == 0);
+
+    // Get messages from A and B
+    HerderImpl* herderA = dynamic_cast<HerderImpl*>(&A->getHerder());


Cast as HerderImpl& so we don't have to deal with raw pointers.

Done in c950e1c

SirTyson · 2024-01-22T21:01:04Z

src/herder/test/HerderTests.cpp

+
+    // A and B should now have 3 entries in their scphistory table for ledger 2.
+    REQUIRE(getNumSCPHistoryEntries(A, 2) == 3);
+    REQUIRE(getNumSCPHistoryEntries(B, 2) == 3);


Should we check that C does not have extra copies of messages? I might have missed something but I think this may be possible.

Good idea to explicitly check this case. I added a test for this in d39400b.

SirTyson

Minor typo, but LGTM, thanks!

SirTyson · 2024-01-23T01:40:06Z

src/herder/test/HerderTests.cpp

+        },
+        2 * Herder::EXP_LEDGER_TIMESPAN_SECONDS, false);
+
+    // Return the number of entires in a node's scphistory table for the given


Nit: "entires"

Fixed in 72b8757

bboston7 · 2024-01-23T19:00:45Z

I just rebased this on top of master and squashed down to a single commit

marta-lokhova

Thanks for the change! Left a few suggestions. Let's plan to ship this change in 20.2.0

marta-lokhova · 2024-01-23T18:22:47Z

src/herder/HerderImpl.cpp

+        // NOTE: Consolidating the two saveSCPHistory calls into one transaction
+        // provides modest performance increases for sqlite and insignificant
+        // performance increases for postgres.
+        soci::transaction txScope(mApp.getDatabase().getSession());


if there isn't much benefit to consolidating two calls into one transaction, I'd recommend to remove it for simplicity.

Done in a8139ed

marta-lokhova · 2024-01-23T19:02:33Z

src/herder/test/HerderTests.cpp

+        },
+        2 * Herder::EXP_LEDGER_TIMESPAN_SECONDS, false);
+
+    // A and B should now have 3 entries in their scphistory table for ledger 2.


I think this check could be made stronger to ensure that we're updating the database with the latest state. Specifically, we want to make sure that if nodes externalized with CONFIRM statements from its quorum set, on the next ledger we actually persist new EXTERNALIZE statements received for that same ledger.

Done in a8139ed

This PR enables updating the `scphistory` table during catchup from history. It allows users to specify which archives to use via the `SCP_HISTORY_ARCHIVES` config option. If a user specifies multiple archives, stellar-core will merge the messages from the archives. This is a draft PR because I'm looking for feedback on the approach, but still have some work to do before it is in a mergeable state. Most of the remaining work is to: * Document the new functionality * Write additional tests for: * Merging * Failed downloads * No `SCP_HISTORY_ARCHIVES` * Multiple `SCP_HISTORY_ARCHIVES` * Changes to `ReplayDebugMetaWork` * Integrate changes from stellar#4121 * Address other `TODO`s in the changes

Continue to capture SCP messages for previous ledger in database Reviewed-by: marta-lokhova

Closes stellar#3769 Upon closing a ledger, nodes record SCP messages received during the ledger period in the `scphistory` database table. However, any messages about that ledger heard *after* close were dropped. This change captures more of those messages by modifying the `processExternalized` function (which is responsible for recording SCP messages on ledger close) to also record any new or more recent messages heard about the previous ledger.

bboston7 · 2024-02-13T00:24:06Z

The test failed because it was overly sensitive to the rng seed. I fixed it by increasing some timeouts in the test and also better specifying the state of A and B to be insensitive to leader election.

marta-lokhova · 2024-02-13T02:51:04Z

r+ 26bba35

bboston7 requested a review from marta-lokhova January 5, 2024 01:18

marta-lokhova requested changes Jan 8, 2024

View reviewed changes

bboston7 requested a review from marta-lokhova January 11, 2024 18:33

bboston7 mentioned this pull request Jan 12, 2024

Investigate performance of SCP message capturing in database as validator set grows #4139

Closed

SirTyson reviewed Jan 22, 2024

View reviewed changes

bboston7 requested a review from SirTyson January 23, 2024 00:39

SirTyson approved these changes Jan 23, 2024

View reviewed changes

bboston7 force-pushed the scp-msgs branch from 72b8757 to d457d0e Compare January 23, 2024 19:00

marta-lokhova requested changes Jan 23, 2024

View reviewed changes

bboston7 mentioned this pull request Jan 23, 2024

Draft: Recover SCP messages when replaying from history #4156

Draft

6 tasks

bboston7 requested a review from marta-lokhova January 24, 2024 19:27

marta-lokhova approved these changes Feb 12, 2024

View reviewed changes

latobarita added a commit that referenced this pull request Feb 12, 2024

Merge pull request #4121 from bboston7/scp-msgs

718bd22

Continue to capture SCP messages for previous ledger in database Reviewed-by: marta-lokhova

bboston7 force-pushed the scp-msgs branch from a8139ed to 26bba35 Compare February 13, 2024 00:21

latobarita merged commit d2e6be1 into stellar:master Feb 13, 2024
15 checks passed

bboston7 deleted the scp-msgs branch September 16, 2024 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue to capture SCP messages for previous ledger in database #4121

Continue to capture SCP messages for previous ledger in database #4121

bboston7 commented Jan 5, 2024

marta-lokhova left a comment

marta-lokhova Jan 8, 2024

bboston7 Jan 8, 2024

bboston7 Jan 10, 2024

MonsieurNicolas commented Jan 11, 2024

bboston7 commented Jan 12, 2024

bboston7 commented Jan 12, 2024 •

edited

Loading

SirTyson left a comment

SirTyson Jan 22, 2024

bboston7 Jan 23, 2024

SirTyson Jan 22, 2024 •

edited

Loading

bboston7 Jan 23, 2024

SirTyson Jan 22, 2024

bboston7 Jan 23, 2024

SirTyson Jan 22, 2024

bboston7 Jan 23, 2024

SirTyson left a comment

SirTyson Jan 23, 2024

bboston7 Jan 23, 2024

bboston7 commented Jan 23, 2024

marta-lokhova left a comment

marta-lokhova Jan 23, 2024

bboston7 Jan 24, 2024

marta-lokhova Jan 23, 2024

bboston7 Jan 24, 2024

bboston7 commented Feb 13, 2024

marta-lokhova commented Feb 13, 2024

Continue to capture SCP messages for previous ledger in database #4121

Continue to capture SCP messages for previous ledger in database #4121

Conversation

bboston7 commented Jan 5, 2024

Description

Performance impact

Checklist

marta-lokhova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MonsieurNicolas commented Jan 11, 2024

bboston7 commented Jan 12, 2024

bboston7 commented Jan 12, 2024 • edited Loading

SirTyson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SirTyson Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SirTyson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bboston7 commented Jan 23, 2024

marta-lokhova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bboston7 commented Feb 13, 2024

marta-lokhova commented Feb 13, 2024

bboston7 commented Jan 12, 2024 •

edited

Loading

SirTyson Jan 22, 2024 •

edited

Loading