Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment of Headers with proof spec changes #363

Open
kdeme opened this issue Jan 29, 2025 · 6 comments
Open

Deployment of Headers with proof spec changes #363

kdeme opened this issue Jan 29, 2025 · 6 comments

Comments

@kdeme
Copy link
Collaborator

kdeme commented Jan 29, 2025

Deployment of Headers with proof spec changes

The (breaking) spec changes

None removal in BlockHeaderWithProof

Merged PR: #341

This PR added a new header type, the epehemeral headers. In this PR the None has been removed for the BlockHeaderProof, as all headers of the original content type MUST now be provided with a proof.

The Result of the None removal is that the Union selector shifts with one and headers with proof encoding becomes different for all headers. It also invalidates any header of this type that comes without a proof. This is a breaking change and thus should be deployed coordinated.

SSZ Union removal in BlockHeaderWithProof

Open PR: #362

This PR completely removes the Union and adds a SSZ serialization layer for the proof.
This is a breaking change and thus should be deployed coordinated.
If this proposal is accepted, then it would be best to do this change on the mainnet now so that we don't need to deploy two breaking changes in a row for this.

Addition of ephemeral headers

Merged PR: #341

In the same PR of the None removal, the new header type was added.
Implementing and deploying this can be done independent among clients, it does not need to be coordinated. So I do not consider this a breaking change. One thing to be wary of however is that clients don't ban/blacklist peers because of "invalid" content key in case the client did not add this new content key type yet.

Testing

Same testing with Hive can be applied as was done for ping extensions (#360), however it requires updated test vectors considering the actual encoded content that is being tested changes.

Deployment

The issue for deployment of this change is twofold:

  • Coordinated deployment of client implementations with the changed content type encoding.
  • Possible database migration plans rolled out at the same time of this deployment.

I add the second bullet there because the content type encoding change might cause for clients also to have invalid encoded data in their databases. Nodes might need to either delete/reset or migrate their databases. Coordinating a plan for this would be necessary if we don't want the current network to show a lot of failures on content searches for a while.

So a possible roll-out plan could be:

  • Decide on spec (open PR item)
  • Clients implementation
  • Update hive testing & run it on the implementations
  • Clients work on migration plan (this can be different per client) + test this
  • Decide date for coordinated deployment

Actions for clients

It would be great if client teams could:

  • Read this issue and provide feedback regarding the roughly stated plan. Raise any concerns regarding it that might be missing.
  • Have a look at SSZ Union Removal PR and give feedback.
  • Indicate their client specific plan and/or issues regarding migration of this encoded data (A plan can be as simple as: "we delete all databases").
  • Have a rough estimate by when this could be implemented.
@kdeme
Copy link
Collaborator Author

kdeme commented Jan 29, 2025

Preliminary info for Nimbus/fluffy:

  • We have the version without SSZ Union implemented (Remove SSZ Union usage in BlockHeaderWithProof type status-im/nimbus-eth1#3019) and tested with updated test vectors
  • Looking into adding the code to migrate database:
    • Because the fluffy implementation does not store the content key (for history network content), this means iterating over all content in the database and trying to decode as the current Union with the None, if that fails, ignore, if that passes adjust the data if there is a proof and delete the data if there is no proof.
    • We will need to test this on the bigger databases to see how long this takes.
    • Not sure yet if we add this inside the client or outside the client code.
    • The other option is to delete the databases but because Fluffy takes up a rather big chunk of nodes in the current network, this would not be ideal. Especially because that content database is shared with the state network also.
  • Can probably add and test this within 2 weeks.

@acolytec3
Copy link
Contributor

I have an initial implementation in flight for Ultralight here

We'll likely just delete all databases and start from scratch.

@njgheorghita
Copy link
Contributor

For trin, we've got a tracking issue here that outlines our rollout strategy. Our plan is "loosely" to write a script that converts all existing headers (that have a proof) to the new type in our prod dbs, and delete any headers that don't have a proof. I'm not good at estimates, but I'd estimate that we hope be ready to deploy by the end of next week / before sync on Monday the 17th.

@acolytec3
Copy link
Contributor

I think our code is ready to go. And, as noted, we'll just blow away DBs and start fresh.

@ScottyPoi has implemented era1 file support in our era package recently so we could leverage something like that repopulate our DBs.

@morph-dev
Copy link
Contributor

Also to add that trin is in favor of #362 .

@fearlessfe
Copy link

For shisui, we add a issue to track the todos, we can finish these changes in 2 weeks.

We will write a tool for repopulating our DBs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants