Serve recent historical state #375

leighmcculloch · 2025-02-26T22:38:53Z

Today the RPC stores and serves via its JSON-RPC endpoint 7 days of recent txs (getTransactions, getTransaction), 7 days of recent events (getEvents), and 7 days of recent ledger headers (getLedgers), but only 1 recent ledger of ledger state (getLedgerEntries).

A request and a response to the getLedgerEntries method today looks like:

Request:

{
  "jsonrpc": "2.0",
  "id": 8675309,
  "method": "getLedgerEntries",
  "params": {
    "keys": [
      "AAAABgAAAAHMA/50/Q+w3Ni8UXWm/trxFBfAfl6De5kFttaMT0/ACwAAABAAAAABAAAAAgAAAA8AAAAHQ291bnRlcgAAAAASAAAAAAAAAAAg4dbAxsGAGICfBG3iT2cKGYQ6hK4sJWzZ6or1C5v6GAAAAAE="
    ]
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 8675309,
  "result": {
    "entries": [
      {
        "key": "AAAAB+qfy4GuVKKfazvyk4R9P9fpo2n9HICsr+xqvVcTF+DC",
        "xdr": "AAAABgAAAAAAAAABzAP+dP0PsNzYvFF1pv7a8RQXwH5eg3uZBbbWjE9PwAsAAAAQAAAAAQAAAAIAAAAPAAAAB0NvdW50ZXIAAAAAEgAAAAAAAAAAIOHWwMbBgBiAnwRt4k9nChmEOoSuLCVs2eqK9Qub+hgAAAABAAAAAwAAAAw=",
        "lastModifiedLedgerSeq": 2552504
      }
    ],
    "latestLedger": 2552990
  }
}

This issue requests that it be possible to request with getLedgerEntries ledgers to include a field, ledger, in the request that causes the RPC to return the state of the ledger entry as it was at a historical point that is within some window of data that RPC stores in addition to the data it stores today:

Request:

 {
   "jsonrpc": "2.0",
   "id": 8675309,
   "method": "getLedgerEntries",
   "params": {
+    "ledger": 2552500,
     "keys": [
       "AAAABgAAAAHMA/50/Q+w3Ni8UXWm/trxFBfAfl6De5kFttaMT0/ACwAAABAAAAABAAAAAgAAAA8AAAAHQ291bnRlcgAAAAASAAAAAAAAAAAg4dbAxsGAGICfBG3iT2cKGYQ6hK4sJWzZ6or1C5v6GAAAAAE="
     ]
   }
 }

Response:

 {
   "jsonrpc": "2.0",
   "id": 8675309,
   "result": {
+    "ledger": 2552500,
     "entries": [
       {
         "key": "AAAAB+qfy4GuVKKfazvyk4R9P9fpo2n9HICsr+xqvVcTF+DC",
         "xdr": "AAAABgAAAAAAAAABzAP+dP0PsNzYvFF1pv7a8RQXwH5eg3uZBbbWjE9PwAsAAAAQAAAAAQAAAAIAAAAPAAAAB0NvdW50ZXIAAAAAEgAAAAAAAAAAIOHWwMbBgBiAnwRt4k9nChmEOoSuLCVs2eqK9Qub+hgAAAABAAAAAwAAAAw=",
         "lastModifiedLedgerSeq": 2532198
       }
     ],
     "latestLedger": 2552990
   }
 }

This feature request may be better served by some other service that also implements the RPC getLedgerEntries API but serves data from a different data source. However, the feature is more useful if it is served alongside or with RPC because of the use case of developers using it during testing.

This feature would be used to support fork testing in the Soroban Rust SDK and Quickstart, being discussed here:

The text was updated successfully, but these errors were encountered:

tamirms · 2025-02-27T12:25:06Z

From protocol 23 onwards rpc will not be storing any ledger entries in its sqlite DB. Instead, rpc will essentially proxy getLedgerEntries requests to a new endpoint in core which fetches ledger entries from bucketlist db (see stellar/stellar-core#4623 ). This endpoint in core actually does allow you to query for the state of a ledger entry synchronized to an older ledger sequence. However, I'm not sure how core will preform if we configure the window of ledger versions to be as large as 7 days.

@SirTyson do you have any insight into the performance implications of configuring QUERY_SNAPSHOT_LEDGERS to be as high as 120960 (approximately equal to 7 days of ledgers)?

tamirms · 2025-02-27T12:36:07Z

I reread your issue @leighmcculloch and it seems like you are not necessarily asking for the window of historical ledger entries to be the same as the window we use for transactions, events, and ledgers. How large of a window should suffice for the use cases you had in mind?

2opremio · 2025-02-27T12:56:59Z

The new core HTTP interface, which RPC will be using from protocol 23, already allows specifying the ledger number to query but it's not externally exposed through the RPC API. See https://github.com/stellar/go/blob/c453f8b35c758d92f30d18f6083d57b4eba08040/clients/stellarcore/client.go#L248

The number of ledgers kept for this purpose can be specified to through --stellar-captive-core-http-query-snapshot-ledgers (which is forwarded to Captive Core's QUERY_SNAPSHOT_LEDGERS config option). We currrently default it to 4 but @SirTyson mentioned it's quite cheap to increase that number (I doubt it will be cheap enough to increase it to one week of ledgers without impact though).

2opremio · 2025-02-27T13:19:48Z

Note however, that, as it is today, there is no way to distinguish a missing entry from its ledger not present in history.

leighmcculloch · 2025-02-28T13:46:51Z

there is no way to distinguish a missing entry from its ledger not present in history.

@2opremio Is this a limitation of the core getledgerentry (stellar/stellar-core#4623) or a limitation of the RPC getLedgerEntries that wraps it? cc @SirTyson

SirTyson · 2025-02-28T18:10:56Z

I haven't tested higher numbers yet, but I expect the increase in memory consumption to be somewhat significant. core maintains in-memory indexes for the live BucketList, which is currently like 1-1.5 GB. If you extend the window to 7 days, we need to maintain 7 days worth of these indexes in memory. We only store one index per "bucket" of state, and one "bucket" may be valid for all 7 days of snapshots, so memory usage doesn't scale linearly. That being said, I'd still expect a pretty significant increase for 7 days of history, but I haven't tested it yet. Other than increased memory consumption, core can handle arbitrarily large windows just fine. When you do the endpoint perf and benchmarking work, I'd recommend also trying out the 7 day window and seeing memory consumption.

SirTyson · 2025-02-28T18:14:40Z

Note however, that, as it is today, there is no way to distinguish a missing entry from its ledger not present in history.

captive-core kinda supports this. As in, if you query a specific ledger, core will tell you if the ledger is not present in history. If the ledger is present, it then tells you if the entry exists or not. The problem is, captive-core does not support queries like "does this ledger entry exist anywhere in the ledgerSeq range [a, b]". This could be implemented in RPC via binary search of the ledger range and submitting a request for each selected ledger, but may be too slow.

tamirms · 2025-02-28T19:29:32Z

RPC is already maintaining the latest ledger sequence in it's ingestion workflow. We can assume that captive core is always going to be ahead or equal to RPC's view of the latest ledger. So, as long as the ledger parameter is less than or equal to RPC's view of the latest ledger, then we can assume that if a ledger entry is not present in the core response it must not exist.

Shaptic · 2025-02-28T19:36:42Z

From the feature discussion (stellar/quickstart#625 (comment)), it sounds like the only thing that is important is a stable state source (easier) rather than an arbitrarily historical state source (harder).

If we are to make the SDKs fork testing a better experience, it needs a fast data source of a stable ledger.

This changes the scope of work for this pretty significantly, since Core can guarantee consistency for a /getledgerentry with low add'l memory usage if the look-back window is small. I don't see any benefit to evaluating longer or arbitrary windows when there's no meaningful use case for that scenario.

leighmcculloch · 2025-02-28T21:49:52Z

core maintains in-memory indexes for the live BucketList, which is currently like 1-1.5 GB.

@SirTyson Does this need to be in memory or can this be on disk and accessed as needed?

leighmcculloch · 2025-02-28T21:53:46Z

The ledger needs to be stable / available for someone debugging or working on a test, or a test suite. 7 days would probably be the top end, and probably doesn't need to be that long, but 24 hours may be too short for a smooth developer experience, or an experience anywhere comparable to Ethereum which has free nodes serving all of history on endpoints like this. Whatever is supportable today, pushing the limit, is worth exposing via the RPC's API as is, and then later in the year we can explore changing how indexes are stored, or completely different solutions to take this further.

SirTyson · 2025-02-28T21:55:10Z

core maintains in-memory indexes for the live BucketList, which is currently like 1-1.5 GB.

@SirTyson Does this need to be in memory or can this be on disk and accessed as needed?

Currently in-memory only. We also don't have plans at the moment to read these indices off disk, but could probably implement it if there was a strong use case. I'd rather not if we can avoid it, so of we really do need the 7 day window, I think it makes sense to do the memory profiling and go from there.

leighmcculloch · 2025-02-28T22:12:16Z

does this ledger entry exist anywhere in the ledgerSeq range [a, b]

We don't need this.

I think it makes sense to do the memory profiling and go from there.

+1

Shaptic · 2025-02-28T22:59:58Z

This is for quickstart, right? RPC is being run locally, then, and if testing is the use case then we can add a way to pause ingesting, instead. It's still not clear to me that we need to adopt the most complicated possible solution of maintaining N days of ledger state.

leighmcculloch · 2025-03-01T01:31:19Z

@Shaptic The design on this is early, so docs are thin, I'll share more concrete designs after I've spiked some of the experience.

But, this is for the Soroban Rust SDK and Quickstart. Both will connect to hosted RPCs.

There's a sketch of the sequence of interactions for the SDK here:

Support forking from mainnet (or any target network) quickstart#625 (comment)

The design of how Quickstart will use this endpoint and serve a functioning node, local RPC, etc is still a WIP, but it will probably rely on the same datasource as the SDK.

leighmcculloch · 2025-03-01T01:33:45Z

add a way to pause ingesting

That doesn't create the fork testing experience that is ideal, or that is seen in other ecosystems. Tests, both coded and live, should be runnable against arbitrary ledgers (within a supported range), and will even result in the case of live testing, will require statefulness. And, performing fork testing should not require running infrastructure.

leighmcculloch · 2025-03-04T02:20:03Z

Running stellar-core run locally with CATCHUP_RECENT and QUERY_SNAPSHOT_LEDGERS set to the ledger counts below:

Ledgers	Memory	Buckets	DB file
7 days (120,960)	77 GB	112 GB	29 GB
1 day (17,280)	12 GB	31 GB	6 GB
12 hours (8,640)	6 GB	22 GB	3 GB
8 hours (5,760)	4 GB	21 GB	2 GB
1 hour (720)	0.7 GB	18 GB	0.8 GB

@SirTyson Do these numbers seem ballpark what you'd expect?

leighmcculloch · 2025-03-04T02:21:36Z

8-12 hours worth of ledgers appears to fit inside the existing max hardware requirements published in the docs at the link below, although with little to no buffer for RPC to be running:

https://developers.stellar.org/docs/data/rpc/admin-guide#hardware-requirements

SirTyson · 2025-03-04T18:46:40Z

@SirTyson Do these numbers seem ballpark what you'd expect?

Thanks for running the test! I think this seems about right on average, but I'd expect some spikes in memory usage. For example, in you test, I doubt we ever updated the largest bucket of entries at the bottom of the BucketList. This changes very rarely, only like once or twice a year. When it does change though, your snapshot memory will spike, since you now have to store 2 copies of the largest index. In your tests, I imagine we were only ever storing one copy of this index and only needed copies of the smaller state buckets.

Generally speaking, this feature wasn't intended for longer periods of history (as the 7 days RAM requirement shows). I'd be a little cautious about going up against any sort of memory limits just because of the variable, somewhat spiky memory consumption pattern that the BucketList lends itself to.

leighmcculloch mentioned this issue Feb 26, 2025

Support forking from mainnet (or any target network) stellar/quickstart#625

Open

leighmcculloch mentioned this issue Feb 28, 2025

Add state archival getledgerentry endpoint stellar/stellar-core#4623

Draft

6 tasks

leighmcculloch changed the title ~~Feature Request: Serve recent historical state~~ Serve recent historical state Mar 1, 2025

leighmcculloch mentioned this issue Mar 1, 2025

Improve the in SDK fork testing experience stellar/rs-soroban-sdk#1448

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serve recent historical state #375

Serve recent historical state #375

leighmcculloch commented Feb 26, 2025 •

edited

Loading

tamirms commented Feb 27, 2025

tamirms commented Feb 27, 2025

2opremio commented Feb 27, 2025 •

edited

Loading

2opremio commented Feb 27, 2025 •

edited

Loading

leighmcculloch commented Feb 28, 2025

SirTyson commented Feb 28, 2025 •

edited

Loading

SirTyson commented Feb 28, 2025 •

edited

Loading

tamirms commented Feb 28, 2025

Shaptic commented Feb 28, 2025 •

edited

Loading

leighmcculloch commented Feb 28, 2025 •

edited

Loading

leighmcculloch commented Feb 28, 2025

SirTyson commented Feb 28, 2025 •

edited

Loading

leighmcculloch commented Feb 28, 2025

Shaptic commented Feb 28, 2025

leighmcculloch commented Mar 1, 2025

leighmcculloch commented Mar 1, 2025

leighmcculloch commented Mar 4, 2025

leighmcculloch commented Mar 4, 2025 •

edited

Loading

SirTyson commented Mar 4, 2025 •

edited

Loading

Serve recent historical state #375

Serve recent historical state #375

Comments

leighmcculloch commented Feb 26, 2025 • edited Loading

tamirms commented Feb 27, 2025

tamirms commented Feb 27, 2025

2opremio commented Feb 27, 2025 • edited Loading

2opremio commented Feb 27, 2025 • edited Loading

leighmcculloch commented Feb 28, 2025

SirTyson commented Feb 28, 2025 • edited Loading

SirTyson commented Feb 28, 2025 • edited Loading

tamirms commented Feb 28, 2025

Shaptic commented Feb 28, 2025 • edited Loading

leighmcculloch commented Feb 28, 2025 • edited Loading

leighmcculloch commented Feb 28, 2025

SirTyson commented Feb 28, 2025 • edited Loading

leighmcculloch commented Feb 28, 2025

Shaptic commented Feb 28, 2025

leighmcculloch commented Mar 1, 2025

leighmcculloch commented Mar 1, 2025

leighmcculloch commented Mar 4, 2025

leighmcculloch commented Mar 4, 2025 • edited Loading

SirTyson commented Mar 4, 2025 • edited Loading

leighmcculloch commented Feb 26, 2025 •

edited

Loading

2opremio commented Feb 27, 2025 •

edited

Loading

2opremio commented Feb 27, 2025 •

edited

Loading

SirTyson commented Feb 28, 2025 •

edited

Loading

SirTyson commented Feb 28, 2025 •

edited

Loading

Shaptic commented Feb 28, 2025 •

edited

Loading

leighmcculloch commented Feb 28, 2025 •

edited

Loading

SirTyson commented Feb 28, 2025 •

edited

Loading

leighmcculloch commented Mar 4, 2025 •

edited

Loading

SirTyson commented Mar 4, 2025 •

edited

Loading