`--full --prune.sender-recovery.distance` causes deterministic static-file inconsistency panic ~25 min after snapshot import on Reth v2.0.0 — fix already exists in closed-unmerged PR #21636

## TL;DR

Reth v2.0.0 with `--full --storage.v2 --prune.sender-recovery.distance 2000000` deterministically panics on `check_consistency` for the `TransactionSenders` segment after a fresh snapshot import. PR #21918 (merged 2026-02-06) fixed this for `--prune.sender-recovery.full` but did not cover the `Distance` variant. **PR [#21636](https://github.com/paradigmxyz/reth/pull/21636) was authored specifically for this case by @gakonst, closed as stale 2026-02-25 without merging, and implements the exact fix.** I extracted its 29-line logic change, adapted it to v2.0.0's current `ensure_invariants` function, rebuilt reth locally, and confirmed the node starts cleanly and runs the full sync pipeline (headers → bodies → SenderRecovery → execution → merkle → ... → finish) without any panic. The fix works. Please either reopen #21636 or cherry-pick its logic into main.

## What happens

Reth v2.0.0 with `--full --storage.v2 --prune.sender-recovery.distance 2000000` panics deterministically on the `TransactionSenders` consistency check approximately 25 minutes after startup. The panic occurs every time, including after a full wipe-and-re-download from a fresh snapshot. The node cannot stay running.

The panic site is `crates/node/builder/src/launch/common.rs:533:13`:

```
thread 'main' panicked at crates/node/builder/src/launch/common.rs:533:13:
assertion `left != right` failed: A static file inconsistency was found that would trigger an unwind to block 0
  left: 0
 right: 0
```

The same assertion fires at `crates/cli/commands/src/common.rs:214:13` for any `reth db` or `reth stage` subcommand that touches the database — there is no escape hatch short of `reth db stats --skip-consistency-checks`.

## Reproducer

1. Perform a fresh snapshot import on a clean datadir:
   ```bash
   reth download --chain mainnet --full --storage.v2 -y
   ```
2. Start reth with:
   ```bash
   reth node \
     --datadir /path/to/datadir \
     --storage.v2 \
     --full \
     --prune.block-interval 100 \
     --prune.sender-recovery.distance 2000000 \
     --prune.transaction-lookup.distance 2000000 \
     --prune.receipts.distance 2000000 \
     --prune.account-history.distance 2000000 \
     --prune.storage-history.distance 2000000 \
     --prune.bodies.distance 2000000
   ```
3. Let a connected consensus client (Lighthouse, in this case) push forkchoice updates.
4. After approximately 25 minutes, reth emits a `Failed to load static file jar` warning for the `transaction-senders` segment covering the current tip, then panics.
5. On the next startup (or on any `reth db`/`reth stage` invocation), the consistency check finds the 0-byte placeholder files Reth wrote during the previous startup, computes `unwind_target=0`, and panics again. The node is now in a permanent crash loop that survives across full wipes.

This has been reproduced twice on identical hardware from two independent snapshot downloads — once on 2026-04-09 after upgrading to v2.0.0, and again on 2026-04-10 after a full wipe-and-redownload recovery attempt.

## Consistency check output at panic time

In read-only mode (e.g., `reth db stats`), the check logs rather than panics. The output is:

```
INFO check_consistency{read_only=true}: Verifying storage consistency.
INFO check_consistency{read_only=true}: Checking consistency for segment{segment=TransactionSenders}:
INFO   ensure_invariants{
         highest_static_file_entry=None
         highest_static_file_block=None
         table="TransactionSenders"
       }: Setting unwind target. checkpoint_block_number=24850381 unwind_target=0
WARN Inconsistent storage. Restart node to heal. unwind_target=Unwind(0)
```

In write mode (`reth node`, `reth stage unwind`, `reth db stage-checkpoints set`), the same code path escalates to the assertion panic shown above.

## On-disk state at crash

Static files directory listing (`ls -la /path/to/datadir/static_files/static_file_transaction-senders*`):

```
-rw-r--r--  1 _unknown  _unknown   0  static_file_transaction-senders_0_49999
-rw-r--r--  1 _unknown  _unknown  55  static_file_transaction-senders_0_49999.conf
-rw-r--r--  1 _unknown  _unknown   9  static_file_transaction-senders_0_49999.off
-rw-r--r--  1 _unknown  _unknown   0  static_file_transaction-senders_24850000_24899999
-rw-r--r--  1 _unknown  _unknown  55  static_file_transaction-senders_24850000_24899999.conf
-rw-r--r--  1 _unknown  _unknown   9  static_file_transaction-senders_24850000_24899999.off
```

Both data files are 0 bytes. The `.conf` and `.off` sidecars have the standard initialization sizes but no real content.

Notably, the segment covering all blocks between `50000` and `24799999` — where corresponding `Transactions` segment files do exist — is entirely absent. There are no tx-senders files for the full synced range.

The segment at the tip advances between crashes. Crash 1 produced `24800000_24849999`; after wipe-and-redownload, crash 2 produced `24850000_24899999`. This is deterministic with the snapshot tip, not random corruption.

## MDBX state

From `reth db --datadir /path/to/datadir stats --skip-consistency-checks`:

- Static file catalog (MDBX): 5 segment types — `Headers`, `Transactions`, `Receipts`, `AccountChangeSets`, `StorageChangeSets`. **`TransactionSenders` is not in the catalog at all.** The MDBX metadata does not believe any tx-senders static files exist.
- `TransactionSenders` MDBX table: **0 entries, 0 bytes** — fully distance-pruned from MDBX, as configured.
- `SenderRecovery` stage checkpoint: `StageCheckpoint { block_number: 24850381, stage_checkpoint: None }`
- `PruneSenderRecovery` checkpoint: `mode=Distance(2000000)`, `block=24850381`
- RocksDB `TransactionHashNumbers`: 509M entries, ~23 GiB — transaction hashes are NOT pruned (only senders)

The inconsistency in summary: MDBX says SenderRecovery completed to block 24,850,381, the catalog says no tx-senders static files exist, and the filesystem says two 0-byte files exist. These three facts are irreconcilable to `check_consistency`.

## The trigger sequence

Based on log analysis:

1. Reth starts from the snapshot at block 24,850,381. Lighthouse begins pushing forkchoice updates. Reth does not advance any blocks but does run background pipeline work (e.g., `TransactionLookup` rebuilds ~1.4M entries during the 25-minute window).
2. ~25 minutes in, Reth attempts to read the `transaction-senders` static file for the segment covering the current tip:
   ```
   WARN Failed to load static file jar ... Os { code: 2, kind: NotFound, message: "No such file or directory" },
        path: ".../static_files/static_file_transaction-senders_24800000_24849999.conf"
   ```
3. The engine-tree error chain propagates up:
   ```
   ERROR engine::persistence: Persistence service failed...
   ERROR engine::tree: Fatal error in consensus engine...
   INFO  reth::cli: Fatal error in consensus engine...shutting down
   ```
4. During shutdown, Reth's "three-way healing" creates the 0-byte placeholder files at the expected segment boundaries (`0_49999` and the tip segment).
5. On next startup, `check_consistency` walks the static files directory, finds the 0-byte placeholders, sees `highest_static_file_block=None` because the files have no rows, compares to `checkpoint_block_number=24850381`, computes `unwind_target=0`, and panics.
6. All subsequent startups reproduce step 5 because Reth also recreates the `0_49999` placeholder as a startup artifact.

## Root cause analysis

The chain of causation points to a gap in `segments_to_check()` / `should_check_segment()` in the static file manager (approximately `crates/storage/provider/src/providers/static_file/manager.rs`):

1. `check_consistency` iterates `segments_to_check()`.
2. `should_check_segment()` skips `TransactionSenders` only when `is_segment_fully_pruned(SenderRecovery)` returns `true`.
3. `is_segment_fully_pruned` appears to return `true` only when the prune mode is `Full` — i.e., when `--prune.sender-recovery.full` is set.
4. With `--prune.sender-recovery.distance 2000000`, the stored prune mode is `Distance(2000000)`, which does not satisfy the `Full` check.
5. So `check_consistency` runs for `TransactionSenders`, finds no static files, and panics — even though this config never wrote any tx-senders to static files in the first place.

The deeper issue is that the snapshot import procedure sets `SenderRecovery` stage checkpoint to the snapshot tip but does not write any `TransactionSenders` static files. This is the correct behavior for a `--full + distance` config (distance pruning keeps senders in MDBX for the recent window, not in static files). But `check_consistency` does not know this: it sees a non-zero stage checkpoint and a missing static file catalog entry, and concludes the data was lost.

## What I tried that does not work

| Approach | Result |
|---|---|
| Quarantine the 0-byte placeholder files | `reth node` recreates them on startup and hits the same panic |
| `reth stage unwind --datadir /path to-block <N>` | `reth stage` runs `check_consistency` at startup and panics before doing any unwind |
| `reth db --datadir /path stage-checkpoints set --stage sender-recovery 0` | Same: write-mode consistency check runs before accepting any arguments, panics |
| `reth node --prune.sender-recovery.full` on the existing datadir | Saves the new prune config to `reth.toml` but does NOT update the `PruneSenderRecovery` entry in MDBX, so `check_consistency` still reads `Distance(2000000)` and panics |
| Full wipe + fresh snapshot (`reth download --full --storage.v2 -y`) | Produces a clean state that runs for ~25 minutes and then crashes with the same panic at the next 50k-block segment boundary. Reproduced twice. |

The only command that can inspect the broken datadir without panicking is `reth db --datadir /path stats --skip-consistency-checks`. There is no equivalent escape hatch for `reth node`, `reth stage`, or `reth db stage-checkpoints set`.

## Related issues

- **#6187** — "Make sure deleted static files don't leave dangling files on unexpected shutdown." Acknowledged the class of bug in 2023. Closed as stale 2026-01-20 with no fix merged.
- **#21914** — Identical panic signature: `assertion left != right failed: A static file inconsistency... unwind_target=0` on `TransactionSenders`. Reported by a maintainer, closed via PR **#21918** (merged 2026-02-06). That fix moves sender recovery inline when senders are **fully** pruned (`--prune.sender-recovery.full` / `--minimal`). It does not cover `--full + --prune.sender-recovery.distance`.
- **#9725** — Same assertion at the same source line, in an op-reth bedrock import context. Different upstream cause, same underlying storage-consistency-check gap.

The difference between what PR #21918 fixed and what this issue describes is the prune mode: `Full` vs `Distance`. The fix in #21918 added a skip path for `is_segment_fully_pruned`, but distance-pruned senders are not "fully pruned" by that definition even though they are also never written to static files.

## The already-existing fix: PR #21636

[**PR #21636 "fix(storage): respect prune checkpoint in static file consistency check"**](https://github.com/paradigmxyz/reth/pull/21636) was authored by @gakonst (branch `joshie/fix-static-file-prune-unwind`) specifically for this bug. The PR body describes it word-for-word:

> **Root cause**: When `--full` prunes segments like `TransactionSenders`, the static files are deleted but the stage checkpoint remains high. On next startup, the consistency check in `ensure_invariants` sees `checkpoint_block_number > highest_static_file_block` (e.g., `22M > 0`) and assumes data corruption, triggering an unwanted unwind.

The PR's approach: add `+ PruneCheckpointReader` to the `ensure_invariants` trait bound, and change the naive `if checkpoint_block_number > highest_static_file_block` comparison to use `effective_available_block = max(highest_static_file_block, prune_checkpoint_block)`. For distance-pruned segments, the prune checkpoint's `block_number` is the tip, so `effective_available_block` equals the stage checkpoint and no unwind is triggered. The PR includes a green unit test (`test_consistency_respects_prune_checkpoint`) that seeds the exact failure mode.

**PR #21636 was closed 2026-02-25 by @emmajam** with the comment "Hey! We're doing some spring cleaning on our PR backlog 🧹 Closing old PRs to keep things tidy. If this is still relevant, please feel free to re-open" — it was never merged. It is still relevant: v2.0.0 shipped 2026-04-08 with the bug still present.

## I confirmed PR #21636 fixes this bug

I extracted PR #21636's logic change (the full diff does not apply cleanly to v2.0.0 because v2.0.0 already landed PR #21918's prerequisite imports), hand-adapted it to v2.0.0's current `ensure_invariants` in `crates/storage/provider/src/providers/static_file/manager.rs`, rebuilt reth from source with default release features, and bootstrapped my previously-broken node. Results:

- **Startup `check_consistency` passes cleanly.** No `unwind_target=Unwind(0)` warning, no panic.
- **Full pipeline cycle completes.** Stages 1–14 advanced 24850381 → 24851130 (the snapshot tip + new blocks). Specifically the `SenderRecovery` stage (stage 3/14) ran in 17 seconds and wrote **275,698 sender rows to `static_file_transaction-senders` static files** (498 writer opens, 497 commits, verified via `reth_static_files_jar_provider_calls_total{segment="transaction-senders",operation="append"}`). This is the exact code path that previously panicked the node at runtime.
- **Second pipeline cycle immediately started.** Headers stage advanced to block 24858908 (7778 more blocks). `eth_blockNumber` advanced from `0x17b2fcd` (24850381) to `0x17b32ba` (24851130) and continues climbing.
- **Zero panics in `reth.daemon.err.log`** after 30+ minutes of continuous operation across two full pipeline cycles.

The adapted diff I applied (the minimum change for v2.0.0) is:

```diff
     fn ensure_invariants_for<Provider>(
         ...
         where
-            Provider: DBProvider + BlockReader + StageCheckpointReader,
+            Provider: DBProvider + BlockReader + StageCheckpointReader + PruneCheckpointReader,
             N: NodePrimitives<Receipt: Value, BlockHeader: Value, SignedTx: Value>,

     fn ensure_invariants<Provider, T: Table<Key = u64>>(
         ...
         where
-            Provider: DBProvider + BlockReader + StageCheckpointReader,
+            Provider: DBProvider + BlockReader + StageCheckpointReader + PruneCheckpointReader,

         // inside ensure_invariants, replacing the naive comparison:
+        let prune_segment = match segment {
+            StaticFileSegment::TransactionSenders => Some(PruneSegment::SenderRecovery),
+            StaticFileSegment::Receipts => Some(PruneSegment::Receipts),
+            _ => None,
+        };
+        let effective_available_block = if let Some(ps) = prune_segment {
+            let prune_checkpoint_block =
+                provider.get_prune_checkpoint(ps)?.and_then(|c| c.block_number).unwrap_or(0);
+            std::cmp::max(highest_static_file_block, prune_checkpoint_block)
+        } else {
+            highest_static_file_block
+        };
-        if checkpoint_block_number > highest_static_file_block {
+        if checkpoint_block_number > effective_available_block {
             info!(
                 target: "reth::providers::static_file",
                 checkpoint_block_number,
-                unwind_target = highest_static_file_block,
+                unwind_target = effective_available_block,
                 ?segment,
                 "Setting unwind target."
             );
-            return Ok(Some(highest_static_file_block));
+            return Ok(Some(effective_available_block));
         }
```

29 insertions, 6 deletions, one file. Identical in behavior to the `ensure_invariants` portion of PR #21636 — I did NOT port the `ensure_invariants_from_db` portion of #21636 since v2.0.0 renamed that function to `ensure_changeset_invariants_by_block` and my bug doesn't exercise it. For a full upstream fix, that portion should also be ported.

## Proposed fixes (in order of preference)

**Option 1 (preferred): Reopen PR #21636 or cherry-pick its logic into main.** The work is already done and I've verified it works on v2.0.0. PR #21636's approach (use `max(highest_static_file_block, prune_checkpoint_block)` as the effective "data available from" threshold) is cleaner than a skip-path extension because it also handles the `Receipts` variant of the same bug and any future prune-aware segment.

**Option 2: Extend PR #21918's skip path to cover distance-pruned senders.** Add a `Distance` branch to `is_segment_fully_pruned` (or introduce `is_segment_in_non_static_file_storage`). Simpler than #21636 but narrower — only fixes TransactionSenders, not Receipts.

**Option 3: Migrate prune config from `reth.toml` to MDBX `PruneCheckpoints` before running `check_consistency`.** Lets an operator recover by changing `--prune.sender-recovery.distance` → `--prune.sender-recovery.full` in their config and restarting, without hand-patching the binary. Useful as an escape hatch independent of the main fix.

**Option 4: Add `--skip-consistency-checks` to `reth db stage-checkpoints set` and `reth stage unwind`.** Gives operators a manual escape hatch to repair MDBX state without rebuilding reth. Currently `--skip-consistency-checks` exists only on `reth db stats` and is read-only. Even a one-time `reth db stage-checkpoints set --skip-consistency-checks --stage sender-recovery --block-number 0` would have let me recover without the ~12 hours of diagnostic + rebuild work this session took.

**Option 5 (minimum): Improve diagnostics.** The current error — `assertion left != right failed` with both sides equal to `0` — gives operators nothing to act on. A log line that says "TransactionSenders static files are missing but the SenderRecovery stage checkpoint is at block N; if you are using distance pruning, this is likely a known bug — see issue #XXXXX" would meaningfully reduce diagnostic burden.

## Environment

- **OS:** macOS 25 (Darwin 25.3.0), Apple Silicon
- **Reth version:** v2.0.0, commit `eb4c15e5`, built 2026-04-07
- **Chain:** mainnet
- **Storage format:** Storage v2 (`--storage.v2`)
- **Consensus client:** Lighthouse v8.1.3 (healthy, not involved in the crash)
- **Datadir size:** ~390 GB (210 GB MDBX + 23 GB RocksDB + remainder headers/transactions/receipts static files)
- **Available disk:** 3.2 TB free on a 3.6 TB volume

**Full `reth node` invocation (from startup script):**

```bash
reth node \
  --datadir /Volumes/ETHDATA/reth \
  --storage.v2 \
  --http \
  --http.addr 127.0.0.1 \
  --http.api eth,net,web3 \
  --ws.addr 127.0.0.1 \
  --authrpc.addr 127.0.0.1 \
  --authrpc.port 8551 \
  --authrpc.jwtsecret /Volumes/ETHDATA/reth/jwt.hex \
  --metrics 127.0.0.1:9001 \
  --full \
  --prune.block-interval 100 \
  --prune.sender-recovery.distance 2000000 \
  --prune.transaction-lookup.distance 2000000 \
  --prune.receipts.distance 2000000 \
  --prune.account-history.distance 2000000 \
  --prune.storage-history.distance 2000000 \
  --prune.bodies.distance 2000000
```

**Confirmed not a config typo or one-off:** two independent snapshot downloads on the same hardware reproduced the same panic at the same assertion. The trigger is the first prune cycle or segment rotation after startup, not random corruption.


Approach	Result
Quarantine the 0-byte placeholder files	`reth node` recreates them on startup and hits the same panic
`reth stage unwind --datadir /path to-block <N>`	`reth stage` runs `check_consistency` at startup and panics before doing any unwind
`reth db --datadir /path stage-checkpoints set --stage sender-recovery 0`	Same: write-mode consistency check runs before accepting any arguments, panics
`reth node --prune.sender-recovery.full` on the existing datadir	Saves the new prune config to `reth.toml` but does NOT update the `PruneSenderRecovery` entry in MDBX, so `check_consistency` still reads `Distance(2000000)` and panics
Full wipe + fresh snapshot (`reth download --full --storage.v2 -y`)	Produces a clean state that runs for ~25 minutes and then crashes with the same panic at the next 50k-block segment boundary. Reproduced twice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--full --prune.sender-recovery.distance` causes deterministic static-file inconsistency panic ~25 min after snapshot import on Reth v2.0.0 — fix already exists in closed-unmerged PR #21636 #23463

TL;DR

What happens

Reproducer

Consistency check output at panic time

On-disk state at crash

MDBX state

The trigger sequence

Root cause analysis

What I tried that does not work

Related issues

The already-existing fix: PR #21636

I confirmed PR #21636 fixes this bug

Proposed fixes (in order of preference)

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

--full --prune.sender-recovery.distance causes deterministic static-file inconsistency panic ~25 min after snapshot import on Reth v2.0.0 — fix already exists in closed-unmerged PR #21636 #23463

Description

TL;DR

What happens

Reproducer

Consistency check output at panic time

On-disk state at crash

MDBX state

The trigger sequence

Root cause analysis

What I tried that does not work

Related issues

The already-existing fix: PR #21636

I confirmed PR #21636 fixes this bug

Proposed fixes (in order of preference)

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`--full --prune.sender-recovery.distance` causes deterministic static-file inconsistency panic ~25 min after snapshot import on Reth v2.0.0 — fix already exists in closed-unmerged PR #21636 #23463