Skip to content

engine/blobs: move partial return support to mandatory engine_getBlobsV3 #674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

raulk
Copy link
Member

@raulk raulk commented Jul 8, 2025

Context

The current specification is too lax: it does not explicitly require EL clients to support partial blob returns. Consequently, most EL clients (if not all) are omitting this support in Osaka. This gap will require additional coordination work in the future when the CL actually wants to use this feature. To prevent this, we propose biting the bullet now so that no future effort needs to be invested.

In addition, the current approach does not allow the CL (or another party) to determine which return mode the EL supports. This is suboptimal, given that the Engine API already supports capability exchange to signal feature support.

Proposed changes

After posting concerns on Discord and discussing separately with @MariusVanDerWijden, we propose:

  • Making partial blob return support mandatory via the ⁠engine_getBlobsV3 method.

  • Restoring the all-or-nothing return behavior for ⁠engine_getBlobsV2.

This approach seems to hit the optimal tradeoff:

  • Mandatorily adds functional support for partial blob returns.
  • Avoids the serde penalty while the CL consumes V2.
  • Enables the CL to adopt V3 when optimizations are implemented, without further ACD coordination.
  • Leverages built-in capability exchange mechanisms.

Implementation

This should be straightforward to implement. For reference, it's already done in Geth: ethereum/go-ethereum#32170.

Misc

Also fixed a bug in the OpenRPC spec that did not account for literal null returns from engine_getBlobsV2.

- Introduce mandatory `engine_getBlobsV3` method to support partial
  responses when fetching blobs from the EL blob pool.
- Add associated BlobAndProofV3 alias for BlobAndProofV2.
- Restore `engine_getBlobsV2` to all-or-nothing.

This mandates partial blob return support in Osaka, but avoids
extra serde in V2 while unutilized by the CL. The CL can switch to
V3 when p2p optimizations ship, without having to coordinate with
EL devs.
@raulk raulk changed the title engine/blobs: move partial return support to mandatory engine_getBlobsV3 engine/blobs: move partial return support to mandatory engine_getBlobsV3 Jul 8, 2025
2. Given an array of blob versioned hashes, if client software has every one of the requested blobs, it **MUST** return an array of _`BlobAndProofV3`_ objects whose order exactly matches the input array. For instance, if the request is `[A_versioned_hash, B_versioned_hash, C_versioned_hash]` and client software has `A`, `B` and `C` available, the response **MUST** be `[A, B, C]`.
3. If one or more of the requested blobs are unavailable, _the client **MUST** return an array of the same length and order, inserting `null` only at the positions of the missing blobs._ For instance, if the request is `[A_versioned_hash, B_versioned_hash, C_versioned_hash]` and client software has data for blobs `A` and `C`, but doesn't have data for `B`, _the response **MUST** be `[A, null, C]`. If all blobs are missing, the client software must return an array of matching length, filled with `null` at all positions._
4. Client software **MUST** support request sizes of at least 128 blob versioned hashes. The client **MUST** return `-38004: Too large request` error if the number of requested blobs is too large.
5. Client software **MUST** return `null` if syncing or otherwise unable to generally serve blob pool data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows an EL client to still return null as a response. From the CL's perspective V3 is identical to main's V2.

Copy link
Member Author

@raulk raulk Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V3 guarantees that partial responses will be served (unless no responses can be served at all, and this is not due to an internal error, which is what the null literal case covers here). Main's V2 does not make any such behavioural guarantee.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V3 changes the EL behavior relative to V2, no argument there.

From the CL's perspective V3 is identical to main's V2.

This is my point. The CL has to handle receiving null or a list of partial responses, just as it does with V2. This is not a point against the PR, just a note.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a deliberate choice. V1 would return a null filled array, but it’s clearer semantics to return a single null literal that translates to an Option or pointer type in Rust and Go, given this outcome affects the whole request anyway. IMO it’s even cleaner to return an error, but I assumed there was a reason V1 didn’t from the get-go.

Copy link
Contributor

@mkalinin mkalinin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, looks good to me 👍
Let’s get some approvals from EL and CL devs and then merge!

@kasey
Copy link

kasey commented Jul 14, 2025

In addition, the current approach does not allow the CL (or another party) to determine which return mode the EL supports.

I can think of 2 advantages to the changes this implies to v2:

  1. Save resources if the CL isn't able to do anything meaningful with partial responses.
  • As stated in discord, EL clients can already decide to save this overhead by just returning a nil value. So this seems like we aren't gaining anything.
  1. Mitigate risk of a bug in a dormant and insufficiently tested CL code path that gets triggered when an EL starts sending partial responses after an upgrade.
  • The extra version mitigates this risk and the need for extra testing at the fusaka fork.

TL;DR the choice is between extra testing at fusaka or a small bit of tech debt. If we are confident the partial response code in CL clients is safe, we wind up with simpler specs and code. If we want to avoid risk, the extra engine api version just adds a bit of boilerplate and complicates the getblobs code path.

@raulk
Copy link
Member Author

raulk commented Jul 14, 2025

@kasey thanks for chiming in. The risk argument is compelling -- I hadn't considered that advantage. Yes, in the case something goes wrong with partial responses (V3), CL clients can roll back to V2 (and the associated unoptimized CL code) without requiring any further action in the EL (e.g. changing a config flag). That's a benefit of stricter and more deterministic/explicit behaviour definitions for each method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants