Skip to content

Regression v2.0.0: concurrent archive RPC causes severe tail latency (P95 10–20s) #23678

@alex-kulam

Description

@alex-kulam

Describe the bug

Hello, Team!
My Ethereum Mainnet archival nodes running v2.0.0/v2.1.0 experience periodic CPU > latency spikes up to 50-60s under sustained concurrent load from multiple RPC methods simultaneously.

At the same time nodes on v1.11.3 version can easily handle much more pressure on identical hardware (50 cores / 128Gi memory), like x5 times more.

I have had a thought that this regression can be caused by #22631, but expanding the db.max-readers up to 512 gives no visible effect.

Would appreciate any feedback/recommendations on this!

Steps to reproduce

Steps to reproduce:

  1. Run archive node on v2.1.0. with config:
    - reth
    - node
    - --datadir=/opt/reth
    - --chain=mainnet
    - --http
    - --http.addr=0.0.0.0
    - --http.api=net,eth,web3,txpool,debug,trace
    - --http.port=8545
    - --http.corsdomain=*
    - --ws
    - --ws.addr=0.0.0.0
    - --ws.api=net,eth,web3,txpool,debug,trace
    - --ws.port=8546
    - --ws.origins=*
    - --rpc.gascap=550000000
    - --rpc.max-request-size=15
    - --rpc.max-response-size=160
    - --rpc.max-subscriptions-per-connection=2048
    - --rpc.max-connections=10000
    - --rpc.max-blocks-per-filter=100000
    - --rpc.max-trace-filter-blocks=1000
    - --rpc.max-logs-per-response=20000
    - --rpc.max-tracing-requests=1000
    - --rpc-cache.max-blocks=0
    - --rpc-cache.max-concurrent-db-requests=512
    - --authrpc.jwtsecret=/etc/jwt/jwt.hex
    - --authrpc.addr=0.0.0.0
    - --authrpc.port=8551
    - --metrics=0.0.0.0:9001
    - --txpool.pricebump=10
    - --max-outbound-peers=100
    - --max-inbound-peers=100
    - --rpc.eth-proof-window=10000
    - --db.max-size=8TB
  1. Start concurent RPC in the same time:
  • debug_traceTransaction – 40-60 RPS
  • eth_getStorageAt – 10 RPS
  • add eth_getBalance – 30 RPS
  • add trace_block – 5 RPS

Even at this pretty modest load I observe heavy 4000-5000% CPU throttling and latency like this:
Image

Node logs


Platform(s)

Linux (x86)

Container Type

Docker

What version/commit are you on?

Reth Version: 2.1.0
Commit SHA: d58c6e3

What database version are you on?

Current database version: 2
Local database version: 2

Which chain / network are you on?

Mainnet

What type of node are you running?

Archive (default)

What prune config do you use, if any?

no prune

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

  • I agree to follow the Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugAn unexpected or incorrect behaviorS-needs-triageThis issue needs to be labelled

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions