Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Forks on Testnet Causing Roots to Stall #30669

@bw-solana

Description

@bw-solana

Problem

We are occasionally seeing long lived (~2 minutes) forks on testnet that cause root creation to stall.

One such example started on 3/7 at 23:06:56 UTC:
image

Last common ancestor is 184353487. Looks like 184353488 was late getting to some nodes, likely due to requiring a couple dozens of repaired shreds on average.
image

This caused 184353492 to be built off 184353487 and start what ended up being the minority fork (we'll call this Fork B and the fork containing 184353488 to be fork A).
image

We see 1673 nodes vote for 184353488 around 23:06:56 UTC.
image
We see 581 nodes vote for 184353492 around the same time.
image

But it seems that the votes for 184353488 were all sent to a leader building off Fork B, which caused the turbine (id=1)vote transactions to be rejected with blockhash_not_found.
image

This caused Fork B to look like the majority fork early on. Nodes that voted on 184353492 (e.g. 7mtKMUgM24GPTiR2krRimUiQgXRRmMPmmPkQBzMZak8a) kept voting. Nodes that voted on 184353488 (e.g. 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on) stopped voting and presumably were waiting to switch over to the other fork (requires 38% votes observed to switch).
image

It seems Fork B never reached the switching threshold. Eventually (around 23:08:29 UTC), The validators that originally voted on fork A refreshed their votes for slots 184353488,184353489,184353490, and this tipped the scales to allow Fork B voters to switch, consensus to be achieved, and new roots to be confirmed.
image
Refresh occurred at this point because the blockhash for original votes expired. I.e. MAX_PROCESSING_AGE slots had been created on fork A.

The big question is why didn't some leader on fork A ingest the original votes via gossip? Why did it take refreshing these votes?

Proposed Solution

Debug and Fix so that we can come to consensus sooner

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions