Skip to content

Attestation verification access of fork_choice lock is racey #7839

@michaelsproul

Description

@michaelsproul

We have an error that seems to be newly occurring in v7.1.0 but has been around for a while.

Aug 02 11:46:20.133 ERROR Unable to validate attestation beacon_block_root: 0xfa3cd459ceecf9a1068e7dbf72aac795ea6a2e4337b7ca33058329a4fe163ab3, slot: Slot(23028367), attestation_type: "unaggregated", peer_id: 16Uiu2HAm6ZGnRpyWUByouFjEv2aXZDiujQPZaogXUSY17Fq89tX4, error: MissingBeaconBlock(0xfa3cd459ceecf9a1068e7dbf72aac795ea6a2e4337b7ca33058329a4fe163ab3)

There are 14 occurrences across our infra in the last 2 weeks.

For this Gnosis block:

  • att slot: 23028367
  • block slot: 23028324
  • gap: 43 slots (almost 3 epochs on Gnosis)

Hypothesised ordering of events that triggers the error:

  1. Attestation verification checks that head block exists (verify_head_block_is_known). It does.
  2. We finalize a new block, and prune fork choice, deleting the stale block from fork choice.
  3. Attestation verification tries to load the block from fork choice, and errors because it was pruned in (2).

This only happens rarely because the time between (1) and (3) should be short, only a few ms. However, if the fork choice pruning routine runs in between it will take the fork choice lock and complete pruning before (3) can happen. Commonly the ordering of events would be 1, 3, 2. Or, this case would not be relevant because there aren't many attestations to blocks that are about to be pruned (these are low quality attestations).

Correct outcome:

  • Reject the attestation because it conflicts with the new finalization.
  • LH should NOT log an error, as this is completely expected.

How attestation verification works currently:

  1. We read fork choice once (for verify_head_block).
  2. We read it again (for verify_attestation_is_finalized_checkpoint_or_descendant).
  3. We read it again (for map_attestation_committees).
  4. We read it again (for with_committee_cache).

Proposed fix:

  • Should be different errors for:
    • MissingBeaconBlockFromDisk (database corruption)
    • MissingBeaconBlockFromForkChoice (probably fine)
  • Should only log an error for MissingBeaconBlockFromDisk.

Alternative solution:

  • Only read fork choice once when verifying the block. And keep the fork choice block in memory
    for future steps. <---
  • Downside: we would accept the attestation, and then it would error when we tried to apply it to
    fork choice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingconsensusAn issue/PR that touches consensus code, such as state_processing or block verification.fork-choice

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions