-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serve recent historical state #375
Comments
From protocol 23 onwards rpc will not be storing any ledger entries in its sqlite DB. Instead, rpc will essentially proxy @SirTyson do you have any insight into the performance implications of configuring |
I reread your issue @leighmcculloch and it seems like you are not necessarily asking for the window of historical ledger entries to be the same as the window we use for transactions, events, and ledgers. How large of a window should suffice for the use cases you had in mind? |
The new core HTTP interface, which RPC will be using from protocol 23, already allows specifying the ledger number to query but it's not externally exposed through the RPC API. See https://github.com/stellar/go/blob/c453f8b35c758d92f30d18f6083d57b4eba08040/clients/stellarcore/client.go#L248 The number of ledgers kept for this purpose can be specified to through |
Note however, that, as it is today, there is no way to distinguish a missing entry from its ledger not present in history. |
@2opremio Is this a limitation of the core |
I haven't tested higher numbers yet, but I expect the increase in memory consumption to be somewhat significant. core maintains in-memory indexes for the live BucketList, which is currently like 1-1.5 GB. If you extend the window to 7 days, we need to maintain 7 days worth of these indexes in memory. We only store one index per "bucket" of state, and one "bucket" may be valid for all 7 days of snapshots, so memory usage doesn't scale linearly. That being said, I'd still expect a pretty significant increase for 7 days of history, but I haven't tested it yet. Other than increased memory consumption, core can handle arbitrarily large windows just fine. When you do the endpoint perf and benchmarking work, I'd recommend also trying out the 7 day window and seeing memory consumption. |
captive-core kinda supports this. As in, if you query a specific ledger, core will tell you if the ledger is not present in history. If the ledger is present, it then tells you if the entry exists or not. The problem is, captive-core does not support queries like "does this ledger entry exist anywhere in the ledgerSeq range [a, b]". This could be implemented in RPC via binary search of the ledger range and submitting a request for each selected ledger, but may be too slow. |
RPC is already maintaining the latest ledger sequence in it's ingestion workflow. We can assume that captive core is always going to be ahead or equal to RPC's view of the latest ledger. So, as long as the ledger parameter is less than or equal to RPC's view of the latest ledger, then we can assume that if a ledger entry is not present in the core response it must not exist. |
From the feature discussion (stellar/quickstart#625 (comment)), it sounds like the only thing that is important is a stable state source (easier) rather than an arbitrarily historical state source (harder).
This changes the scope of work for this pretty significantly, since Core can guarantee consistency for a |
@SirTyson Does this need to be in memory or can this be on disk and accessed as needed? |
The ledger needs to be stable / available for someone debugging or working on a test, or a test suite. 7 days would probably be the top end, and probably doesn't need to be that long, but 24 hours may be too short for a smooth developer experience, or an experience anywhere comparable to Ethereum which has free nodes serving all of history on endpoints like this. Whatever is supportable today, pushing the limit, is worth exposing via the RPC's API as is, and then later in the year we can explore changing how indexes are stored, or completely different solutions to take this further. |
Currently in-memory only. We also don't have plans at the moment to read these indices off disk, but could probably implement it if there was a strong use case. I'd rather not if we can avoid it, so of we really do need the 7 day window, I think it makes sense to do the memory profiling and go from there. |
We don't need this.
+1 |
This is for quickstart, right? RPC is being run locally, then, and if testing is the use case then we can add a way to pause ingesting, instead. It's still not clear to me that we need to adopt the most complicated possible solution of maintaining N days of ledger state. |
@Shaptic The design on this is early, so docs are thin, I'll share more concrete designs after I've spiked some of the experience. But, this is for the Soroban Rust SDK and Quickstart. Both will connect to hosted RPCs. There's a sketch of the sequence of interactions for the SDK here: The design of how Quickstart will use this endpoint and serve a functioning node, local RPC, etc is still a WIP, but it will probably rely on the same datasource as the SDK. |
That doesn't create the fork testing experience that is ideal, or that is seen in other ecosystems. Tests, both coded and live, should be runnable against arbitrary ledgers (within a supported range), and will even result in the case of live testing, will require statefulness. And, performing fork testing should not require running infrastructure. |
Running
@SirTyson Do these numbers seem ballpark what you'd expect? |
8-12 hours worth of ledgers appears to fit inside the existing max hardware requirements published in the docs at the link below, although with little to no buffer for RPC to be running: |
Thanks for running the test! I think this seems about right on average, but I'd expect some spikes in memory usage. For example, in you test, I doubt we ever updated the largest bucket of entries at the bottom of the BucketList. This changes very rarely, only like once or twice a year. When it does change though, your snapshot memory will spike, since you now have to store 2 copies of the largest index. In your tests, I imagine we were only ever storing one copy of this index and only needed copies of the smaller state buckets. Generally speaking, this feature wasn't intended for longer periods of history (as the 7 days RAM requirement shows). I'd be a little cautious about going up against any sort of memory limits just because of the variable, somewhat spiky memory consumption pattern that the BucketList lends itself to. |
Today the RPC stores and serves via its JSON-RPC endpoint 7 days of recent txs (
getTransactions
,getTransaction
), 7 days of recent events (getEvents
), and 7 days of recent ledger headers (getLedgers
), but only 1 recent ledger of ledger state (getLedgerEntries
).A request and a response to the
getLedgerEntries
method today looks like:Request:
Response:
This issue requests that it be possible to request with
getLedgerEntries
ledgers to include a field,ledger
, in the request that causes the RPC to return the state of the ledger entry as it was at a historical point that is within some window of data that RPC stores in addition to the data it stores today:Request:
{ "jsonrpc": "2.0", "id": 8675309, "method": "getLedgerEntries", "params": { + "ledger": 2552500, "keys": [ "AAAABgAAAAHMA/50/Q+w3Ni8UXWm/trxFBfAfl6De5kFttaMT0/ACwAAABAAAAABAAAAAgAAAA8AAAAHQ291bnRlcgAAAAASAAAAAAAAAAAg4dbAxsGAGICfBG3iT2cKGYQ6hK4sJWzZ6or1C5v6GAAAAAE=" ] } }
Response:
{ "jsonrpc": "2.0", "id": 8675309, "result": { + "ledger": 2552500, "entries": [ { "key": "AAAAB+qfy4GuVKKfazvyk4R9P9fpo2n9HICsr+xqvVcTF+DC", "xdr": "AAAABgAAAAAAAAABzAP+dP0PsNzYvFF1pv7a8RQXwH5eg3uZBbbWjE9PwAsAAAAQAAAAAQAAAAIAAAAPAAAAB0NvdW50ZXIAAAAAEgAAAAAAAAAAIOHWwMbBgBiAnwRt4k9nChmEOoSuLCVs2eqK9Qub+hgAAAABAAAAAwAAAAw=", "lastModifiedLedgerSeq": 2532198 } ], "latestLedger": 2552990 } }
This feature request may be better served by some other service that also implements the RPC
getLedgerEntries
API but serves data from a different data source. However, the feature is more useful if it is served alongside or with RPC because of the use case of developers using it during testing.This feature would be used to support fork testing in the Soroban Rust SDK and Quickstart, being discussed here:
The text was updated successfully, but these errors were encountered: