This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Execution Payloads Benchmarks (expb) is a Python CLI tool for benchmarking Ethereum execution clients using real-world historical payloads. It orchestrates Docker containers to run execution clients with different configurations and uses Grafana K6 for load testing, measuring performance metrics like payload processing time and gas throughput.
# Install dependencies using uv
uv sync
# Activate virtual environment
source .venv/bin/activate
# Install the tool in development mode
uv pip install -e .# Run all tests
pytest
# Run specific test file
pytest tests/api/test_something.py
# Run with verbose output
pytest -vThe application uses Typer for the CLI interface. The main entry point is in src/expb/__init__.py, which aggregates sub-commands from:
generate_payloads- Extracts historical payloads from an Ethereum RPC endpointexecute_scenario- Runs a single benchmark scenarioexecute_scenarios- Runs multiple benchmark scenarios (optionally in a loop)compress_payloads- Compresses multiple smaller payloads into larger blockssend_payloads- Directly sends payloads to an execution engine endpoint
The tool supports multiple Ethereum execution clients, each with their own configuration in src/expb/clients/:
- Nethermind
- Besu
- Geth
- Reth
- Erigon
- Ethrex
- NimbusEL
Each client implementation extends ClientConfig and defines:
- Default Docker image
- Client-specific command-line flags
- Network-specific configurations
- Prometheus metrics endpoint path
Three snapshot backends are available in src/expb/payloads/executor/services/snapshots/:
- overlay - Uses Docker overlay filesystem (fastest, Linux-only, default)
- zfs - Uses ZFS snapshots (requires ZFS filesystem)
- copy - Simple directory copy (slowest, most compatible)
The snapshot system allows each benchmark run to start from a clean, consistent blockchain state.
The executor (src/expb/payloads/executor/executor.py) orchestrates the entire benchmark lifecycle:
- Setup Phase: Creates snapshots, prepares JWT secrets, generates K6 scripts
- Execution Phase: Starts execution client container, runs K6 load tests, collects metrics
- Cleanup Phase: Stops containers, captures logs, processes per-payload metrics
The executor uses Jinja2 templates in src/expb/payloads/executor/services/templates/ to generate:
- K6 JavaScript test scripts (k6-script.js.j2)
- Grafana Alloy configuration for metrics collection (config.alloy.j2)
Configuration is managed through Pydantic models in src/expb/configs/:
- scenarios.py - Scenario definitions (client, payloads, resources, etc.)
- exports.py - Metrics export configuration (Prometheus, Pyroscope)
- networks.py - Network-specific settings (genesis hash, chain ID)
- snapshots.py - Snapshot backend configuration
Configuration files are YAML-based. See example-expb.yaml for a fully documented example.
The tool extensively uses the Docker Python SDK to:
- Manage execution client containers with resource limits (CPU, memory, bandwidth)
- Create isolated networks for each benchmark
- Mount volumes (data directories, JWT secrets, extra volumes)
- Execute additional commands inside running containers
- Stream and capture container logs
Network bandwidth limiting is implemented in src/expb/payloads/utils/networking.py using Linux tc (traffic control).
To prevent concurrent benchmark runs that would conflict over shared resources (Docker, snapshots, ports), expb implements a file-based locking mechanism (src/expb/utils/lock.py):
- Enabled by default on
execute_scenarioandexecute_scenarioscommands - Uses the
filelocklibrary for cross-platform file locking - Default lock file location:
/tmp/expb.lock(Unix) or%LOCALAPPDATA%\Temp\expb.lock(Windows) - Lock acquisition fails immediately (timeout=0) if another instance is running
- Can be disabled with
--no-use-lockflag - Lock file path can be customized with
--lock-fileoption
If a benchmark is killed abruptly, the lock file is automatically released. Manual cleanup is rarely needed.
Two types of metrics are collected:
-
K6 Load Test Metrics: HTTP request durations, success rates, throughput
- Exported to Prometheus Remote Write or JSON files
- Configured via the
export.prometheus_remote_writesection
-
Execution Client Metrics: Block processing times, gas throughput, resource usage
- Scraped via Grafana Alloy from client Prometheus endpoints
- Sent to configured Prometheus Remote Write endpoint
Per-payload metrics can be enabled with --per-payload-metrics flag, generating individual metrics for each payload (warning: can be high cardinality).
Extract historical Ethereum payloads from an RPC endpoint:
expb generate-payloads \
--rpc-url https://eth-mainnet.g.alchemy.com/v2/YOUR_KEY \
--start-block 19000000 \
--end-block 19001000 \
--output-dir ./payloads \
--threads 10 \
--workers 30Execute a benchmark scenario defined in a config file:
# Basic execution
expb execute-scenario --scenario-name example --config-file expb.yaml
# With console output and per-payload metrics
expb execute-scenario \
--scenario-name example \
--config-file expb.yaml \
--print-logs \
--per-payload-metrics \
--per-payload-metrics-logs
# Disable execution lock (allow concurrent runs)
expb execute-scenario \
--scenario-name example \
--config-file expb.yaml \
--no-use-lock
# Use custom lock file location
expb execute-scenario \
--scenario-name example \
--config-file expb.yaml \
--lock-file /path/to/custom.lockExecute all scenarios in a config file:
# Run once
expb execute-scenarios --config-file expb.yaml
# Run in continuous loop
expb execute-scenarios --config-file expb.yaml --loop
# Filter scenarios by regex pattern
expb execute-scenarios --config-file expb.yaml --filter "^nethermind.*"Combine multiple small payloads into larger blocks for stress testing:
expb compress-payloads \
--nethermind-snapshot-dir ./snapshots/nethermind \
--nethermind-docker-image nethermindeth/nethermind:latest \
--input-payloads-file ./payloads/payloads.jsonl \
--output-payloads-dir ./compressed-payloads \
--compression-factor 2 \
--target-gas-limit 4000000000Send payloads directly to a running execution client (bypass benchmarking):
expb send-payloads \
--engine-url http://localhost:8551 \
--payloads-file ./payloads/payloads.jsonl \
--fcus-file ./payloads/fcus.jsonl \
--jwt-secret-file ./jwt.hexAfter running a scenario, outputs are stored in <outputs-directory>/expb-executor-<scenario-name>-<timestamp>/:
k6-script.js- Generated K6 test scriptk6-config.json- K6 configurationk6-summary.json- Test results summaryk6.log- K6 process logsk6-results.jsonl- Detailed metrics (if file export enabled)config.alloy- Grafana Alloy configuration<client_type>.log- Execution client logsvolumes/- Additional Docker volumescommands/- Output from extra commands (cmd-0.log, cmd-1.log, etc.)
The YAML configuration file (example-expb.yaml) defines:
pull_images- Whether to pull Docker images before executionimages- Docker images for K6 and Alloypaths- Working and output directoriesexport- Prometheus Remote Write and Pyroscope configurationresources- Default CPU, memory, and bandwidth limitsscenarios- Dictionary of named scenario configurations
client- Execution client (nethermind, besu, geth, reth, erigon, ethrex, nimbusel)snapshot_source- Path to snapshot directory or ZFS snapshot namepayloads- Path to payloads JSONL filefcus- Path to forkchoice updates JSONL filenetwork- Ethereum network (mainnet, sepolia, holesky)snapshot_backend- overlay (default), zfs, or copyamount- Number of payloads to executewarmup- Number of warmup payloads (no metrics collected)delay- Delay between payloads in secondsduration- Max scenario durationwarmup_duration- Max warmup phase durationstartup_wait- Wait time for client startup (seconds)image- Override default client Docker imageextra_flags- Additional client command-line flagsextra_env- Additional environment variablesextra_volumes- Additional Docker volume mountsextra_commands- Commands to run inside the container during executionrepeat- Number of times to repeat the scenario
The tool uses structlog for structured logging. Log level can be controlled via --log-level flag (DEBUG, INFO, WARNING, ERROR).
Key loggers are initialized in src/expb/logging/__init__.py.
When --per-payload-metrics-logs is enabled, the executor:
- Captures K6 console logs containing
EXPB_PER_PAYLOAD_METRICmarkers - Parses these logs using regex pattern
PER_PAYLOAD_METRIC_LOG_PATTERN - Extracts payload index, gas used, and processing time
- Renders a formatted table after execution completes
This feature is useful for identifying performance regressions on specific payloads.
Each execution requires a JWT secret for Engine API authentication:
- Generated automatically as 32 random bytes (hex-encoded)
- Stored in
<work_dir>/jwt-secret/jwtsecret.hex - Mounted into both execution client and K6 containers
- Used for authenticating Engine API requests
Each scenario creates an isolated Docker bridge network:
- Execution client container joins the network
- K6 and Alloy containers join the same network
- Enables service discovery by container name
- Network is removed during cleanup
Container resources are limited via Docker API:
cpu- CPU count/sharesmem- Memory limit (e.g., "32g")download_speed/upload_speed- Bandwidth limits using tc-netem
Bandwidth limiting requires host capabilities and may require privileged mode on some systems.