Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serve recent historical state #375

Open
leighmcculloch opened this issue Feb 26, 2025 · 19 comments
Open

Serve recent historical state #375

leighmcculloch opened this issue Feb 26, 2025 · 19 comments

Comments

@leighmcculloch
Copy link
Member

leighmcculloch commented Feb 26, 2025

Today the RPC stores and serves via its JSON-RPC endpoint 7 days of recent txs (getTransactions, getTransaction), 7 days of recent events (getEvents), and 7 days of recent ledger headers (getLedgers), but only 1 recent ledger of ledger state (getLedgerEntries).

A request and a response to the getLedgerEntries method today looks like:

Request:

{
  "jsonrpc": "2.0",
  "id": 8675309,
  "method": "getLedgerEntries",
  "params": {
    "keys": [
      "AAAABgAAAAHMA/50/Q+w3Ni8UXWm/trxFBfAfl6De5kFttaMT0/ACwAAABAAAAABAAAAAgAAAA8AAAAHQ291bnRlcgAAAAASAAAAAAAAAAAg4dbAxsGAGICfBG3iT2cKGYQ6hK4sJWzZ6or1C5v6GAAAAAE="
    ]
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 8675309,
  "result": {
    "entries": [
      {
        "key": "AAAAB+qfy4GuVKKfazvyk4R9P9fpo2n9HICsr+xqvVcTF+DC",
        "xdr": "AAAABgAAAAAAAAABzAP+dP0PsNzYvFF1pv7a8RQXwH5eg3uZBbbWjE9PwAsAAAAQAAAAAQAAAAIAAAAPAAAAB0NvdW50ZXIAAAAAEgAAAAAAAAAAIOHWwMbBgBiAnwRt4k9nChmEOoSuLCVs2eqK9Qub+hgAAAABAAAAAwAAAAw=",
        "lastModifiedLedgerSeq": 2552504
      }
    ],
    "latestLedger": 2552990
  }
}

This issue requests that it be possible to request with getLedgerEntries ledgers to include a field, ledger, in the request that causes the RPC to return the state of the ledger entry as it was at a historical point that is within some window of data that RPC stores in addition to the data it stores today:

Request:

 {
   "jsonrpc": "2.0",
   "id": 8675309,
   "method": "getLedgerEntries",
   "params": {
+    "ledger": 2552500,
     "keys": [
       "AAAABgAAAAHMA/50/Q+w3Ni8UXWm/trxFBfAfl6De5kFttaMT0/ACwAAABAAAAABAAAAAgAAAA8AAAAHQ291bnRlcgAAAAASAAAAAAAAAAAg4dbAxsGAGICfBG3iT2cKGYQ6hK4sJWzZ6or1C5v6GAAAAAE="
     ]
   }
 }

Response:

 {
   "jsonrpc": "2.0",
   "id": 8675309,
   "result": {
+    "ledger": 2552500,
     "entries": [
       {
         "key": "AAAAB+qfy4GuVKKfazvyk4R9P9fpo2n9HICsr+xqvVcTF+DC",
         "xdr": "AAAABgAAAAAAAAABzAP+dP0PsNzYvFF1pv7a8RQXwH5eg3uZBbbWjE9PwAsAAAAQAAAAAQAAAAIAAAAPAAAAB0NvdW50ZXIAAAAAEgAAAAAAAAAAIOHWwMbBgBiAnwRt4k9nChmEOoSuLCVs2eqK9Qub+hgAAAABAAAAAwAAAAw=",
         "lastModifiedLedgerSeq": 2532198
       }
     ],
     "latestLedger": 2552990
   }
 }

This feature request may be better served by some other service that also implements the RPC getLedgerEntries API but serves data from a different data source. However, the feature is more useful if it is served alongside or with RPC because of the use case of developers using it during testing.

This feature would be used to support fork testing in the Soroban Rust SDK and Quickstart, being discussed here:

@tamirms
Copy link
Contributor

tamirms commented Feb 27, 2025

From protocol 23 onwards rpc will not be storing any ledger entries in its sqlite DB. Instead, rpc will essentially proxy getLedgerEntries requests to a new endpoint in core which fetches ledger entries from bucketlist db (see stellar/stellar-core#4623 ). This endpoint in core actually does allow you to query for the state of a ledger entry synchronized to an older ledger sequence. However, I'm not sure how core will preform if we configure the window of ledger versions to be as large as 7 days.

@SirTyson do you have any insight into the performance implications of configuring QUERY_SNAPSHOT_LEDGERS to be as high as 120960 (approximately equal to 7 days of ledgers)?

@tamirms
Copy link
Contributor

tamirms commented Feb 27, 2025

I reread your issue @leighmcculloch and it seems like you are not necessarily asking for the window of historical ledger entries to be the same as the window we use for transactions, events, and ledgers. How large of a window should suffice for the use cases you had in mind?

@2opremio
Copy link
Contributor

2opremio commented Feb 27, 2025

The new core HTTP interface, which RPC will be using from protocol 23, already allows specifying the ledger number to query but it's not externally exposed through the RPC API. See https://github.com/stellar/go/blob/c453f8b35c758d92f30d18f6083d57b4eba08040/clients/stellarcore/client.go#L248

The number of ledgers kept for this purpose can be specified to through --stellar-captive-core-http-query-snapshot-ledgers (which is forwarded to Captive Core's QUERY_SNAPSHOT_LEDGERS config option). We currrently default it to 4 but @SirTyson mentioned it's quite cheap to increase that number (I doubt it will be cheap enough to increase it to one week of ledgers without impact though).

@2opremio
Copy link
Contributor

2opremio commented Feb 27, 2025

Note however, that, as it is today, there is no way to distinguish a missing entry from its ledger not present in history.

@leighmcculloch
Copy link
Member Author

there is no way to distinguish a missing entry from its ledger not present in history.

@2opremio Is this a limitation of the core getledgerentry (stellar/stellar-core#4623) or a limitation of the RPC getLedgerEntries that wraps it? cc @SirTyson

@SirTyson
Copy link

SirTyson commented Feb 28, 2025

I haven't tested higher numbers yet, but I expect the increase in memory consumption to be somewhat significant. core maintains in-memory indexes for the live BucketList, which is currently like 1-1.5 GB. If you extend the window to 7 days, we need to maintain 7 days worth of these indexes in memory. We only store one index per "bucket" of state, and one "bucket" may be valid for all 7 days of snapshots, so memory usage doesn't scale linearly. That being said, I'd still expect a pretty significant increase for 7 days of history, but I haven't tested it yet. Other than increased memory consumption, core can handle arbitrarily large windows just fine. When you do the endpoint perf and benchmarking work, I'd recommend also trying out the 7 day window and seeing memory consumption.

@SirTyson
Copy link

SirTyson commented Feb 28, 2025

Note however, that, as it is today, there is no way to distinguish a missing entry from its ledger not present in history.

captive-core kinda supports this. As in, if you query a specific ledger, core will tell you if the ledger is not present in history. If the ledger is present, it then tells you if the entry exists or not. The problem is, captive-core does not support queries like "does this ledger entry exist anywhere in the ledgerSeq range [a, b]". This could be implemented in RPC via binary search of the ledger range and submitting a request for each selected ledger, but may be too slow.

@tamirms
Copy link
Contributor

tamirms commented Feb 28, 2025

RPC is already maintaining the latest ledger sequence in it's ingestion workflow. We can assume that captive core is always going to be ahead or equal to RPC's view of the latest ledger. So, as long as the ledger parameter is less than or equal to RPC's view of the latest ledger, then we can assume that if a ledger entry is not present in the core response it must not exist.

@Shaptic
Copy link
Contributor

Shaptic commented Feb 28, 2025

From the feature discussion (stellar/quickstart#625 (comment)), it sounds like the only thing that is important is a stable state source (easier) rather than an arbitrarily historical state source (harder).

If we are to make the SDKs fork testing a better experience, it needs a fast data source of a stable ledger.

This changes the scope of work for this pretty significantly, since Core can guarantee consistency for a /getledgerentry with low add'l memory usage if the look-back window is small. I don't see any benefit to evaluating longer or arbitrary windows when there's no meaningful use case for that scenario.

@leighmcculloch
Copy link
Member Author

leighmcculloch commented Feb 28, 2025

core maintains in-memory indexes for the live BucketList, which is currently like 1-1.5 GB.

@SirTyson Does this need to be in memory or can this be on disk and accessed as needed?

@leighmcculloch
Copy link
Member Author

The ledger needs to be stable / available for someone debugging or working on a test, or a test suite. 7 days would probably be the top end, and probably doesn't need to be that long, but 24 hours may be too short for a smooth developer experience, or an experience anywhere comparable to Ethereum which has free nodes serving all of history on endpoints like this. Whatever is supportable today, pushing the limit, is worth exposing via the RPC's API as is, and then later in the year we can explore changing how indexes are stored, or completely different solutions to take this further.

@SirTyson
Copy link

SirTyson commented Feb 28, 2025

core maintains in-memory indexes for the live BucketList, which is currently like 1-1.5 GB.

@SirTyson Does this need to be in memory or can this be on disk and accessed as needed?

Currently in-memory only. We also don't have plans at the moment to read these indices off disk, but could probably implement it if there was a strong use case. I'd rather not if we can avoid it, so of we really do need the 7 day window, I think it makes sense to do the memory profiling and go from there.

@leighmcculloch
Copy link
Member Author

does this ledger entry exist anywhere in the ledgerSeq range [a, b]

We don't need this.

I think it makes sense to do the memory profiling and go from there.

+1

@Shaptic
Copy link
Contributor

Shaptic commented Feb 28, 2025

This is for quickstart, right? RPC is being run locally, then, and if testing is the use case then we can add a way to pause ingesting, instead. It's still not clear to me that we need to adopt the most complicated possible solution of maintaining N days of ledger state.

@leighmcculloch
Copy link
Member Author

@Shaptic The design on this is early, so docs are thin, I'll share more concrete designs after I've spiked some of the experience.

But, this is for the Soroban Rust SDK and Quickstart. Both will connect to hosted RPCs.

There's a sketch of the sequence of interactions for the SDK here:

The design of how Quickstart will use this endpoint and serve a functioning node, local RPC, etc is still a WIP, but it will probably rely on the same datasource as the SDK.

@leighmcculloch
Copy link
Member Author

add a way to pause ingesting

That doesn't create the fork testing experience that is ideal, or that is seen in other ecosystems. Tests, both coded and live, should be runnable against arbitrary ledgers (within a supported range), and will even result in the case of live testing, will require statefulness. And, performing fork testing should not require running infrastructure.

@leighmcculloch leighmcculloch changed the title Feature Request: Serve recent historical state Serve recent historical state Mar 1, 2025
@leighmcculloch
Copy link
Member Author

Running stellar-core run locally with CATCHUP_RECENT and QUERY_SNAPSHOT_LEDGERS set to the ledger counts below:

Ledgers Memory Buckets DB file
7 days (120,960) 77 GB 112 GB 29 GB
1 day (17,280) 12 GB 31 GB 6 GB
12 hours (8,640) 6 GB 22 GB 3 GB
8 hours (5,760) 4 GB 21 GB 2 GB
1 hour (720) 0.7 GB 18 GB 0.8 GB

@SirTyson Do these numbers seem ballpark what you'd expect?

@leighmcculloch
Copy link
Member Author

leighmcculloch commented Mar 4, 2025

8-12 hours worth of ledgers appears to fit inside the existing max hardware requirements published in the docs at the link below, although with little to no buffer for RPC to be running:

@SirTyson
Copy link

SirTyson commented Mar 4, 2025

@SirTyson Do these numbers seem ballpark what you'd expect?

Thanks for running the test! I think this seems about right on average, but I'd expect some spikes in memory usage. For example, in you test, I doubt we ever updated the largest bucket of entries at the bottom of the BucketList. This changes very rarely, only like once or twice a year. When it does change though, your snapshot memory will spike, since you now have to store 2 copies of the largest index. In your tests, I imagine we were only ever storing one copy of this index and only needed copies of the smaller state buckets.

Generally speaking, this feature wasn't intended for longer periods of history (as the 7 days RAM requirement shows). I'd be a little cautious about going up against any sort of memory limits just because of the variable, somewhat spiky memory consumption pattern that the BucketList lends itself to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants