Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add state archival getledgerentry endpoint #4623

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

SirTyson
Copy link
Contributor

@SirTyson SirTyson commented Jan 15, 2025

Description

Resolves the p23 portions of #4397 by adding the getledgerentry HTTP endpoint, which returns LedgerEntries with the relevant TTL and state information.

This is also the cutoff point for early RPC integration of p23. Compared to the last core release (22.1), RPC will need to integrate the following changes. Note that all these changes are protocol gated and are only in effect if the package is running protocol 23.

RPC Integration Changes

Persistent entry eviction

Persistent CONTRACT_DATA entries and all CONTRACT_CODE entries are now subject to eviction. This means that an expired entry can be evicted and deleted from the live BucketList during eviction scans. evictedTemporaryLedgerKeys will now contains evicted temporary CONTRACT_DATA keys and TTLs as before, but will now additionally hold CONTRACT_CODE and persistent CONTRACT_DATA keys along with their TTLs. evictedPersistentLedgerEntries is still never populated. With the final p23 build we will rename evictedTemporaryLedgerKeys but it's not included in this build. XDR changes will come later.

Changes to Restoration Meta

Restoration meta will depend on the "state" of the entry being restored. The two states of the entry being restored are

  • archived: Entry has an expired TTL, but still lives on the live BucketList. Entry has not yet been evicted, so the key has not been populated in the evictedTemporaryLedgerKeys vector yet.
  • evicted: Entry has been fully evicted from the live BucketList and is now stored in the Hot Archive BucketList. Key has been emitted in the evictedTemporaryLedgerKeys vector.

Not: I realize this terminology is a bit confusing because we now have two "archived" states. That being said, the evicted state is only relevant to RPC and core and is completely abstracted from developers and people invoking contracts. Given the feedback from the original expiration terminology, I'd like to avoid it if at all possible and use the archived vs evicted terminology.

RestoreFootprintOp will produce meta as follows.

  • archived keys
    In protocols < 23, meta was as follows:
CONTRACT_CODE/CONTRACT_DATA entry:
no meta

TTL entry:
LEDGER_ENTRY_STATE(oldValue), LEDGER_ENTRY_UPDATED(newValue)

In protocol >= 23, meta is as follows:

CONTRACT_CODE/CONTRACT_DATA entry:
LEDGER_ENTRY_RESTORE(value)

TTL entry:
LEDGER_ENTRY_STATE(oldValue), LEDGER_ENTRY_RESTORE(newValue)
  • evicted keys

In protocol < 23, there were no evicted key restorations.

In protocol >= 23, meta is as follows:

CONTRACT_CODE/CONTRACT_DATA entry:
LEDGER_ENTRY_RESTORE(value)

TTL entry:
LEDGER_ENTRY_RESTORE(value)

getledgerentry captive-core endpoint for Ledger State

With protocol 23, ledger DBs are getting more complicated, as entries are stored both in the live BucketList DB and in the Hot Archive DB. To properly simulate TXs, it will be necessary to know what DB an entry is in, as well as it's "state" wrt TTL value. This logic is complicated to replicate outside of core, especially since much of the state information is intrinsic to the structure of the BucketList and may be expensive to replicate in SQL. Instead of maintaining a DB of ledger state, it is recommended that RPC use the new getledgerentry captive-core endpoint for all LedgerState access. Core now comes with a multithreaded HTTP server implementation that seems sufficiently fast over local host for all state accesses see this.

By default, this "query server" is disabled in core. It can be enabled and configured with the following captive-core flags

# HTTP_QUERY_PORT (integer) default 0
# What port stellar-core listens for query commands on,
# such as getledgerentryraw.
# If set to 0, disable HTTP query interface entirely.
# Must not be the same as HTTP_PORT if not 0.
HTTP_QUERY_PORT=0

# QUERY_THREAD_POOL_SIZE (integer) default 4
# Number of threads available for processing query commands.
# If HTTP_QUERY_PORT == 0, this option is ignored.
QUERY_THREAD_POOL_SIZE=4

# QUERY_SNAPSHOT_LEDGERS (integer) default 0
# Number of historical ledger snapshots to maintain for
# query commands. Note: Setting this to large values may
# significantly impact performance. Additionally, these
# snapshots are a "best effort" only and not persisted on
# restart. On restart, only the current ledger will be
# available, with snapshots avaiable as ledgers close.
QUERY_SNAPSHOT_LEDGERS = 0

Once enabled, the following endpoint will be available:

getledgerentry

Used to query both live and archived LedgerEntries. While getledgerentryraw does simple key-value lookup
on the live ledger, getledgerentry will query a given key in both the live BucketList and Hot Archive BucketList.
It will also report whether an entry is archived, evicted, or live, and return the entry's current TTL value.

A POST request with the following body:

ledgerSeq=NUM&key=Base64&key=Base64...
  • ledgerSeq: An optional parameter, specifying the ledger snapshot to base the query on.
    If the specified ledger in not available, a 404 error will be returned. If this parameter
    is not set, the current ledger is used.
  • key: A series of Base64 encoded XDR strings specifying the LedgerKey to query. TTL keys
    must not be queried and will return 400 if done so.

A JSON payload is returned as follows:

{
"entries": [
     {"e": "Base64-LedgerEntry", "state": "live", /*optional*/ "ttl": uint32},
     {"e": "Base64-LedgerKey", "state": "new"},
     {"e": "Base64-LedgerEntry", "state": "archived"}
],
"ledger": ledgerSeq
}
  • entries: A list of entries for each queried LedgerKey. Every key queried is guaranteed to
    have a corresponding entry returned.
  • e: Either the LedgerEntry or LedgerKey for a given key encoded as a Base64 string. If a key
    is live or archived, e contains the corresponding LedgerEntry. If a key does not exist
    (including expired temporary entries) e contains the corresponding LedgerKey.
  • state: One of the following values:
    • live: Entry is live.
    • new: Entry does not exist. Either the entry has never existed or is an expired temp entry.
    • archived: Entry is archived, counts towards disk resources.
  • ttl: An optional value, only returned for live Soroban entries. Contains
    a uint32 value for the entry's liveUntilLedgerSeq.
  • ledgerSeq: The ledger number on which the query was performed.

Classic entries will always return a state of live or new.
If a classic entry does not exist, it will have a state of new.

Similarly, temporary Soroban entries will always return a state of live or
new. If a temporary entry does not exist or has expired, it
will have a state of new.

This endpoint will always give correct information for archived entries. Even
if an entry has been archived and evicted to the Hot Archive, this endpoint will
still the archived entry's full LedgerEntry as well as the proper state.

RPC Testing

To test the new changes, RPC will want to have tests that restore entries that are in both the archived and evicted state. What I've been doing in my tests is populating state with a bunch of persistent entries, letting them expire, but setting my eviction scan parameters such that only 1 or 2 entries are evicted at a time. You can keep track of an entry's starting TTL and whether or not you have seen eviction meta to determine it's state within the test. These flags are helpful.

OVERRIDE_EVICTION_PARAMS_FOR_TESTING=true

# Scan 1 million bytes per ledger. This lets you evict aggressively, but if it slows down the test too
# much it can be reduced
TESTING_EVICTION_SCAN_SIZE=1000000

# Entries are eligible for eviction early than normal
TESTING_STARTING_EVICTION_SCAN_LEVEL=2

# A maximum of 2 entries will be evicted per ledger. This helps the test maintain a good
# mix of entries in both the evicted and archived state
TESTING_MAX_ENTRIES_TO_ARCHIVE=2

# Entries are eligible for eviction sooner
TESTING_MINIMUM_PERSISTENT_ENTRY_LIFETIME=16

Make sure you also have the QUERY_SERVER flags set as well so you can use the core endpoint. I think this has been thorough enough, but if I missed anything CAP 62 and CAP 66 have full specs.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@SirTyson SirTyson force-pushed the getledgerentry-endpoint branch from 45ab670 to e45fdf4 Compare February 18, 2025 20:42
@leighmcculloch
Copy link
Member

This branch crashes intermittently. Config and captured logs below.

Config
NETWORK_PASSPHRASE="Test SDF Network ; September 2015"
CATCHUP_RECENT=0
DEPRECATED_SQL_LEDGER_STATE=false

HTTP_QUERY_PORT=11726
QUERY_SNAPSHOT_LEDGERS=10

COMMANDS=[
  "ll?level=ERROR",
  "ll?level=INFO&partition=Ledger",
]

EXPERIMENTAL_BACKGROUND_OVERLAY_PROCESSING=true
TARGET_PEER_CONNECTIONS=3
MAX_ADDITIONAL_PEER_CONNECTIONS=2

UNSAFE_QUORUM=true
FAILURE_SAFETY=0

[[VALIDATORS]]
NAME="SDF 1"
PUBLIC_KEY="GCGB2S2KGYARPVIA37HYZXVRM2YZUEXA6S33ZU5BUDC6THSB62LZSTYH"
ADDRESS="core-live-a.stellar.org:11625"
HISTORY="curl -sf http://history.stellar.org/prd/core-live/core_live_001/{0} -o {1}"
HOME_DOMAIN="testnet.stellar.org"
QUALITY="MEDIUM"

[[VALIDATORS]]
NAME="sdf_testnet_2"
PUBLIC_KEY="GCUCJTIYXSOXKBSNFGNFWW5MUQ54HKRPGJUTQFJ5RQXZXNOLNXYDHRAP"
ADDRESS="core-testnet2.stellar.org"
HISTORY="curl -sf http://history.stellar.org/prd/core-testnet/core_testnet_002/{0} -o {1}"
HOME_DOMAIN="testnet.stellar.org"
QUALITY="MEDIUM"

[[VALIDATORS]]
NAME="sdf_testnet_3"
PUBLIC_KEY="GC2V2EFSXN6SQTWVYA5EPJPBWWIMSD2XQNKUOHGEKB535AQE2I6IXV2Z"
ADDRESS="core-testnet3.stellar.org"
HISTORY="curl -sf http://history.stellar.org/prd/core-testnet/core_testnet_003/{0} -o {1}"
HOME_DOMAIN="testnet.stellar.org"
QUALITY="MEDIUM"
Logs
2025-02-28T15:01:54.074 GDXXV [Ledger INFO] Got consensus: [seq=1372108, prev=4d7edc, txs=7, ops=12, sv: [ SIGNED@sdf_testnet_2 txH: 4a4f60, ct: 174071891
2, upgrades: [ ] ]]
2025-02-28T15:01:54.074 GDXXV [Ledger INFO] Close of ledger 1372108 buffered. mSyncingLedgers has 13 ledgers
mBucketHAS->getBucketListHash() == mVerifiedLedgerRangeStart.header.bucketListHash at catchup/CatchupWork.cpp:264
Cannot provide readable stack trace. Use ParseDump.py to translate stack.
0   stellar-core                        0x0000000100c0c38c _ZN7stellar21printCurrentBacktraceEv + 96
1   stellar-core                        0x0000000100c14930 _ZN7stellar26printAssertFailureAndAbortEPKcS1_i + 64
2   stellar-core                        0x000000010095edc4 _ZN7stellar11CatchupWork17assertBucketStateEv + 764
3   stellar-core                        0x0000000100960084 _ZN7stellar11CatchupWork14runCatchupStepEv + 1516
4   stellar-core                        0x0000000100960ac8 _ZN7stellar11CatchupWork6doWorkEv + 44
5   stellar-core                        0x0000000100c32ef4 _ZN7stellar4Work5onRunEv + 332
6   stellar-core                        0x0000000100c30a38 _ZN7stellar9BasicWork9crankWorkEv + 324
7   stellar-core                        0x0000000100c32e0c _ZN7stellar4Work5onRunEv + 100
8   stellar-core                        0x0000000100c30a38 _ZN7stellar9BasicWork9crankWorkEv + 324
9   stellar-core                        0x0000000100c33f60 _ZNSt3__110__function6__funcIZN7stellar13WorkScheduler11scheduleOneENS_8weak_ptrIS3_EEE3$_0NS_9
allocatorIS6_EEFvvEEclEv + 60
10  stellar-core                        0x0000000100abd808 _ZNSt3__110__function6__funcIZN7stellar15ApplicationImpl16postOnMainThreadEONS_8functionIFvvEEE
ONS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEENS2_9Scheduler10ActionTypeEE3$_0NSB_ISH_EES5_EclEv + 152
11  stellar-core                        0x0000000100c20f40 _ZN7stellar9Scheduler11ActionQueue7runNextERNS_12VirtualClockENSt3__16chrono8durationIxNS4_5rat
ioILl1ELl1000000000EEEEE + 176
12  stellar-core                        0x0000000100c20d68 _ZN7stellar9Scheduler6runOneEv + 400
13  stellar-core                        0x0000000100c249e0 _ZN7stellarL9crankStepERNS_12VirtualClockENSt3__18functionIFmvEEEm + 132
14  stellar-core                        0x0000000100c24638 _ZN7stellar12VirtualClock5crankEb + 300
15  stellar-core                        0x0000000100abdf34 _ZN7stellar6runAppENSt3__110shared_ptrINS_11ApplicationEEE + 188
16  stellar-core                        0x0000000100b0b408 _ZNSt3__110__function6__funcIZN7stellar3runERKNS2_15CommandLineArgsEE3$_0NS_9allocatorIS6_EEFiv
EEclEv + 996
17  stellar-core                        0x0000000100ae11dc _ZN7stellar12_GLOBAL__N_111runWithHelpERKNS_15CommandLineArgsENSt3__16vectorINS0_20ParserWithVa
lidationENS4_9allocatorIS6_EEEENS4_8functionIFivEEE + 908
18  stellar-core                        0x0000000100aef998 _ZN7stellar3runERKNS_15CommandLineArgsE + 1340
19  stellar-core                        0x0000000100af5960 _ZN7stellar17handleCommandLineEiPKPc + 9304
20  stellar-core                        0x0000000100b52fb8 main + 336
21  dyld                                0x00000001938e4274 start + 2840
[1]    2196 abort      stellar-core run --conf testnet.cfg

@leighmcculloch
Copy link
Member

leighmcculloch commented Feb 28, 2025

Pretty exciting 👏🏻 . I am interested in the interface. Comments about the request body and response from above:


Request Body:

ledgerSeq=NUM&key=Base64&key=Base64...

Response:

{
"entries": [
     {"e": "Base64-LedgerEntry", "state": "live", /*optional*/ "ttl": uint32},
     {"e": "Base64-LedgerKey", "state": "new"},
     {"e": "Base64-LedgerEntry", "state": "archived"}
],
"ledger": ledgerSeq
}

1️⃣ – order
Are entries in entries returned in the same order as the keys in the request? I'm aware that the /getledgerentriesraw endpoint doesn't, so just checking if that's the case here, as it would be ideal from the perspective of general batch API design.

2️⃣ – ledgerSeq
The request ledgerSeq and the response value ledger have different names, but contain the same information. It's helpful for developers if the API has some symmetry and uses the same vocab in both the req/resp so that the number of concepts a reader has to hold in their head are less.

3️⃣ – e
The response returns a list of entries with a field named e that looks like the contents of e would always be an entry. The field e, any reason we wouldn't name that entry? If the goal is optimisation, why e, then state and ttl, instead of s and t?

4️⃣ – new
In the state=new case, if the records are returned in the same order, there shouldn't be a need to return the key, since the caller passed it as an input.

The state=new state could be given a more intuitive name. On first read I thought new meant it was "somehow new". Suggest using not-found or does-not-exist or none? Or is there semantics that make new more accurate than not-found?

5️⃣ –
If the ledgerSeq is not available, it says the error is a 404. Does the endpoint return 404 in any other case? Asking because the case when a ledgerSeq isn't available should be uniquely identifiable from any other error, but HTTP 404 errors on endpoints have a habit of capturing more than a single error case. So if 404 can occur for other reasons, we should introduce a stable and parseable error code / error name in the response. (Related to: stellar/stellar-rpc#375 (comment).)


If implemented the above comments would result in:

Request Body:

ledger=NUM&key=Base64&key=Base64...

Response:

{
"entries": [
     {"state": "live", "entry": "Base64-LedgerEntry", /*optional*/ "ttl": uint32},
     {"state": "not-found"},
     {"state": "archived", "entry": "Base64-LedgerEntry"}
],
"ledger": ledger
}

@leighmcculloch
Copy link
Member

Thinking about this further, this API is quite similar to the JSON-RPC API that RPC is exposing.

I think we should consider, either in this release or next, changing the shape of this API, and changing the shape of the RPC JSON-RPC API to align (which we can do in a backwards compatible way), so that core's Query endpoints are actually the same API as – and a subset and superset of – the RPC JSON-RPC API.

Why:

The APIs are almost the same already and communicate the same data.

In the Ethereum ecosystem the RPC API is a standardised thing that multiple implementations and services implement. It allows tooling to move between using different tools and services seamlessly and is generally a very cool experience as a developer.

Not the full RPC API needs implementing.

Core is after-all the lowest level RPC.

cc @tomerweller This is something we discussed recently.

cc @mollykarcher

@tomerweller
Copy link
Contributor

Totally agree with @leighmcculloch. The prior art in the ethereum ecosystem demonstrates that having a uniform API across the stack (but with different implementations and retention policy) makes it easy for new services to emerge and for developers to choose the right layer of the stack for them. This specifically can also make stellar-core more independently useful

@SirTyson
Copy link
Contributor Author

SirTyson commented Mar 6, 2025

Thanks for the feedback. I don't design many HTTP web APIs, so I have no objections for these changes. RPC has integrated the current interface, so I'll leave it as is for now so I can get the early integration test working. @Shaptic any thoughts on these API changes?

@leighmcculloch
Copy link
Member

Are entries in entries returned in the same order as the keys in the request?

@SirTyson ☝🏻

@Shaptic
Copy link

Shaptic commented Mar 6, 2025

the contents of e would always be an entr [...]
Are entries returned in the same order as the keys

Just fyi, we don't assume ordering in RPC (the existing SQL-based impl. also has no order guarantees), so we need the key returned on new for matching. If they're returned in the same order that's fine, but it doesn't matter to us.

semantics that make new more accurate

The implication is that this key represents a pending new ledger key in the state. I guess not-found is semantically the same, but since the nomenclature is usually around a user creating a new key that's what we ("we" as in Garand as the state archival author and me as the reader lol) are accustomed to.

any thoughts on these API changes?

It's mildly annoying to re-coordinate the API work that's already been done for an API that only RPC uses (I disagree with the implication that people should Core directly), but otherwise they seem fine. (As long as there's no implication to make the response match the actual JSON-RPC spec, which I don't think there is, but I'm just double-checking.)

@Shaptic
Copy link

Shaptic commented Mar 6, 2025

Core is after-all the lowest level RPC.

Just to expand on my disagreement with this:

  • It can't do simulation.
  • None of the endpoints besides this one are designed to scale.
  • Its endpoints are not designed to be public-facing.

If the end-goal is to resolve those issues, that's awesome and it can eventually be a first-class "RPC", but there's a long way to go so I certainly don't think it's at a point where we should encourage people to use it directly today. (This isn't a knock against Core btw: it was intentionally designed this way and that's been perfectly fine so far 💪)

@SirTyson
Copy link
Contributor Author

SirTyson commented Mar 6, 2025

Are entries in entries returned in the same order as the keys in the request?

No, currently there is no defined ordering.

@SirTyson
Copy link
Contributor Author

SirTyson commented Mar 6, 2025

The state=new state could be given a more intuitive name. On first read I thought new meant it was "somehow new". Suggest using not-found or does-not-exist or none? Or is there semantics that make new more accurate than not-found?

In p23, new is the same as not-found, but this won't be the case in the future when we introduce archival proofs. In particular, most of the time, creating a new entry will not require an archival proof, but sometimes, a proof will be required. To help facilitate this, the "new" type will be extended to something like new-no-proof and new-proof.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants