Forks on Testnet Causing Roots to Stall

#### Problem
We are occasionally seeing long lived (~2 minutes) forks on testnet that cause root creation to stall.

One such example started on 3/7 at `23:06:56 UTC`:
![image](https://user-images.githubusercontent.com/44715351/224348024-83bca505-29e0-4219-a14c-56e911edb672.png)

Last common ancestor is `184353487`. Looks like `184353488` was late getting to some nodes, likely due to requiring a couple dozens of repaired shreds on average.
![image](https://user-images.githubusercontent.com/44715351/224349023-a0145128-206e-4dcb-8814-30b8bf091c05.png)

This caused `184353492` to be built off `184353487` and start what ended up being the minority fork (we'll call this Fork B and the fork containing `184353488` to be fork A).
![image](https://user-images.githubusercontent.com/44715351/224348666-7aa354e5-a4da-439a-9230-c32902913b8c.png)

We see 1673 nodes vote for `184353488` around `23:06:56 UTC`.
![image](https://user-images.githubusercontent.com/44715351/224350183-a7430132-5032-46a2-9621-9ca25af23f43.png)
We see 581 nodes vote for `184353492` around the same time.
![image](https://user-images.githubusercontent.com/44715351/224350623-65297e47-515b-486a-9d73-5f52fabce4a3.png)

But it seems that the votes for `184353488` were all sent to a leader building off Fork B, which caused the turbine (`id=1`)vote transactions to be rejected with `blockhash_not_found`.
![image](https://user-images.githubusercontent.com/44715351/224354294-571a240e-1f06-47f1-b127-fadb60ae1b06.png)

This caused Fork B to look like the majority fork early on. Nodes that voted on `184353492` (e.g. `7mtKMUgM24GPTiR2krRimUiQgXRRmMPmmPkQBzMZak8a`) kept voting. Nodes that voted on `184353488` (e.g. `5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on`) stopped voting and presumably were waiting to switch over to the other fork (requires 38% votes observed to switch).
![image](https://user-images.githubusercontent.com/44715351/224352489-67d67b25-f7e5-4570-b263-54108cfb8ba4.png)

It seems Fork B never reached the switching threshold. Eventually (around `23:08:29 UTC`), The validators that originally voted on fork A refreshed their votes for slots `184353488`,`184353489`,`184353490`, and this tipped the scales to allow Fork B voters to switch, consensus to be achieved, and new roots to be confirmed.
![image](https://user-images.githubusercontent.com/44715351/224355488-cff69bfe-125b-4cd0-820b-2a478dbb43b0.png)
Refresh occurred at this point because the blockhash for original votes expired. I.e. `MAX_PROCESSING_AGE` slots had been created on fork A.

The big question is why didn't some leader on fork A ingest the original votes via gossip? Why did it take refreshing these votes?

#### Proposed Solution
Debug and Fix so that we can come to consensus sooner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Forks on Testnet Causing Roots to Stall #30669

Problem

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Forks on Testnet Causing Roots to Stall #30669

Description

Problem

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions