Skip to content

eRPC: add consensus failsafe (maxParticipants=3, agreementThreshold=2) for state-read methods #358

@bussyjd

Description

@bussyjd

Problem

The obol-stack eRPC config (internal/embed/infrastructure/values/erpc.yaml.gotmpl) routes each RPC request to a single upstream chosen by eRPC's selectionPolicy + score, with hedge for latency fallback. For four EVM networks (mainnet, hoodi, base, base-sepolia) we currently rely on one upstream's answer per request.

For our read paths that underpin payment/registration correctness — agent-registration document fetches, ERC-8004 registry reads, USDC balance checks, eth_call of payment requirements — a single malicious or desynced upstream can return a wrong answer that no layer above detects. Consensus validation between multiple upstreams catches this cheaply.

Proposed config

Add a consensus entry to the failsafe list for high-trust read methods on each EVM network:

failsafe:
  - matchMethod: "eth_call|eth_getLogs|eth_getTransactionReceipt|eth_getTransactionByHash|eth_getBlockByNumber|eth_getBlockByHash|eth_chainId"
    consensus:
      maxParticipants: 3        # fan out to 3 upstreams in parallel
      agreementThreshold: 2     # 2 of 3 must match → return majority answer
      punishMisbehavior:
        disputeThreshold: 3
        disputeWindow: 10s
        sitOutPenalty: 5m
  - matchMethod: "*"            # non-consensus path stays for latency-sensitive reads
    timeout:
      duration: 30s
    retry:
      maxAttempts: 2
      delay: 100ms
    hedge:
      delay: 500ms
      maxCount: 1

Apply to all four EVM network blocks (lines ~80-145 of the gotmpl). Keep the existing selectionPolicy intact for eth_sendRawTransaction routing; consensus only activates for read methods.

Why 3/2, not 2/2

  • 2/2 means every paid request fails as soon as one upstream is flaky → negates the resilience we already have.
  • 3/2 tolerates one upstream failure/disagreement per request, returns the majority answer, and the punishMisbehavior block auto-quarantines consistently-misbehaving upstreams for 5 min.

Upstream prerequisite

Each affected chain must have ≥ 3 upstreams configured in the upstreams: array for consensus to have anyone to vote with. Current state:

  • chainId: 1 (mainnet) — verify count; add more public RPCs via obol network add if needed.
  • chainId: 560048 (hoodi) — likely only 1 today.
  • chainId: 8453 (base) — verify.
  • chainId: 84532 (base-sepolia) — 1 (base-sepolia-publicnode) + whatever is added by obol network add.

When count < 3, eRPC degrades gracefully — it queries however many exist — but the resilience goal isn't met. So this issue should include bumping the default ChainList seed count or guaranteeing a minimum.

Explicit non-goals

  • Do not apply consensus to eth_sendRawTransaction — routing stays single-upstream (already handled by selectionPolicy).
  • Do not apply to eth_blockNumber/eth_syncing/latency-critical head checks — treat those as the matchMethod: "*" fallthrough.
  • Do not use agreementThreshold: 3 (of 3) — one slow upstream fails every call.

Optional: nonce handling

For eth_getTransactionCount, lagging replicas routinely disagree. Instead of strict consensus, use:

- matchMethod: "eth_getTransactionCount"
  consensus:
    maxParticipants: 3
    agreementThreshold: 1
    preferHighestValueFor:
      eth_getTransactionCount:
        - result

This returns the highest observed nonce, preventing stale-nonce transaction failures.

Validation plan

  • Unit: internal/network/erpc_test.go — template render with the new failsafe block.
  • Integration: seed 3 base-sepolia RPCs (publicnode + alchemy public + drpc public). Probe eth_call against the registry contract; flip one upstream to return wrong data (mock); confirm request still returns majority answer.
  • Observability: eRPC emits metrics on consensus participation — expose them in Grafana.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions