Skip to content

Add validated epoch catch-up sync path (follow-up to #5750) #5950

@Scottcjn

Description

@Scottcjn

Context

PR #5750 (merged) closed a finality-forgery vulnerability: generic STATE sync was importing finalized epoch data from any peer's snapshot, with only a sender-signature check. A valid peer signature authenticates who sent the snapshot, not that the epoch was actually committed by quorum. #5750 removed that path — epochs now enter epoch_crdt only via the validated EPOCH_COMMIT handler, which requires the receiver to have locally observed accept votes from a quorum.

The gap this creates

A node that was offline during epoch voting + commit has no entries in its local _epoch_votes for that epoch. When it comes back online and a peer sends EPOCH_COMMIT, the new _handle_epoch_commit validation rejects it with unverified_voters — the recovering node never saw the votes, so it can't confirm the quorum.

Result: offline nodes can no longer catch up epoch finality. This is an acceptable short-term tradeoff (closing a forgery hole beats easy catch-up), but it needs a proper fix.

What's needed

A validated catch-up sync path — something that lets a recovering node re-establish finality for missed epochs without trusting an unverified snapshot. Options to evaluate:

  1. Vote-record sync — let a recovering node request the raw signed accept-votes for an epoch from peers, verify each signature itself, then apply the same quorum check _handle_epoch_commit uses. Finality is then locally derived, not imported.
  2. Commit-certificate — have the committing node assemble a quorum certificate (the set of signed accept-votes) and attach it to EPOCH_COMMIT. A recovering node verifies the certificate's signatures + quorum independently.
  3. Checkpoint sync — periodic signed checkpoints with their own multi-sig quorum, for nodes that are very far behind.

Option 2 (commit-certificate) is likely the cleanest — it makes EPOCH_COMMIT self-verifying instead of depending on the receiver's local vote history.

Acceptance

  • A node offline during epoch N's voting can rejoin and establish finality for epoch N from peer data without any path that accepts unverified finalized flags.
  • Regression test: the test_signed_state_sync_cannot_inject_epoch_finality invariant from fix: ignore epoch finality from state sync #5750 still holds.

Filed as a follow-up to keep #5750's security fix shippable now while the catch-up path is designed properly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions