Skip to content

feat(derivation): SPEC-005 — validator cleanup + Path A/B verification mode + derivation-driven tag management#951

Open
curryxbo wants to merge 28 commits into
feat/multi_batchfrom
feat/derivation-batch-verify
Open

feat(derivation): SPEC-005 — validator cleanup + Path A/B verification mode + derivation-driven tag management#951
curryxbo wants to merge 28 commits into
feat/multi_batchfrom
feat/derivation-batch-verify

Conversation

@curryxbo
Copy link
Copy Markdown
Contributor

@curryxbo curryxbo commented May 12, 2026

Scope

Implements the three targets of morph-l2/morph-specs SPEC-005 (tracking issue):

  • Target A — Validator cleanup (commit c82a1456): Delete node/validator/ plus all call sites / flags / docker-compose entries / monitoring panels.
  • Target B — Path A/B verification mode (commit 9f01c530): Add --derivation.verify-mode={pathA,pathB} (default pathA), bound at startup and not switchable at runtime. Path B does not pull blobs from beacon — it rebuilds versioned hashes locally from L2 blocks using the sequencer-consistent encoding and compares them against the values recorded on L1.
  • Target C — Derivation-driven tag management (commit a73639de): Delete the entire node/blocktag/ package; the safe / finalized heads are advanced solely by the derivation main loop + the in-process finalizer subcomponent; the finalized ≤ safe ≤ unsafe invariant is explicitly asserted; the SetBlockTags engine RPC signature is unchanged.

Design doc: tech-design.md

Base dependency

Base = feat/multi_batch (PR #935). Reuses common/batch.{ParsingTxs, BuildBlockContext, BatchData} + common/blob.{CompressBatchBytes, MakeBlobTxSidecar}. This PR will need a rebase once #935 lands on main.

Breaking change

The external semantics of eth_getBlockByNumber("safe") change from "L1 N confirmations" → "locally derivation-verified". The driver of finalized moves from blocktag to the derivation-internal finalizer. Downstream consumers (wallets / indexers / bridge relayers) must be informed via changelog; the morph repository itself has 0 business call sites (only EL protocol-layer references), so external consumers will be notified separately. See tech-design §6 / §7.

Test coverage

  • static_scan_test.go (3): TestNoValidatorReferences / TestNoBlocktagReferences / TestPathBUsesCommonBlobPackage
  • tag_advance_test.go (7): safe / dedup / invariant ×2 / monotonic / l1-reorg-reset / block-number-error
  • ⚠️ Still to add (see spec §5.1): TestVerifierDispatch_* (3) / TestPathA_* (2) / TestPathB_* (4) / TestTagAdvance_FinalizedFromL1 / TestTagAdvance_PathBVsPathAParity

Devnet verification

Pending (7 scenarios listed in spec §5.2, including an L1 reorg triggering a one-shot safe rewind while finalized stays put).

Rollback

Hard revert via git revert + redeploy (no feature-flag dual-run window; rationale in tech-design §7).

Code review

  • CodeRabbit auto-skips because the base ≠ default branch. Either comment @coderabbitai review to force a run, or wait until feat: multi blob #935 merges and this PR is rebased onto main.
  • One round of manual code review has been run; report at tech-design.code-review.md (S2-A "join finalizer in Stop" and S2-B "guard zero BlockNumber in finalizer.tick" have been applied).

Refs

Summary by CodeRabbit

  • New Features

    • Added support for alternative L2 execution client (Reth) alongside Geth
    • Introduced configurable SPEC-005 derivation verification mode (Path A/B)
    • Added finalizer polling interval configuration
  • Bug Fixes & Updates

    • Refactored execution layer service naming from "morph-geth" to "morph-el" across devnet and test environments
    • Removed validator challenge functionality
    • Removed block tag service in favor of integrated tag advancer component
  • Configuration

    • New CLI flags: derivation.verify-mode and derivation.finalizer-interval
    • Makefile now supports EXECUTION_CLIENT selection with Reth build parameters

panos-xyz and others added 14 commits May 7, 2026 15:34
Replace the internal codename "bitget" with the neutral term "polyrepo"
in build context references, variable names, and container paths.
Cover the rest of the file with one-line docstrings to satisfy
CodeRabbit's docstring coverage threshold.
…g Dockerfile.l2-geth

Move the morph-el-0 build: section from docker-compose-4nodes.yml into a
dedicated docker-compose-geth-build.yml, included only when
EXECUTION_CLIENT=geth. The reth overlay now sees no build: on morph-el-0,
eliminating the risk of docker compose up building geth code and tagging
it as the reth image when the reth image is absent.
Keep the base devnet compose file self-contained for geth while using the reth overlay to explicitly reset inherited geth build definitions.

Constraint: Do not include the devnet execution-client test file in this commit
Rejected: Keep a separate geth build compose file | changes direct base compose usage
Confidence: high
Scope-risk: narrow
[codex] support reth execution client in devnet
- Delete node/validator package (config.go, validator.go, validator_test.go)
- Drop validator wiring from node/cmd/node/main.go and derivation.NewDerivationClient signature
- Drop validator field from Derivation struct
- Drop ChallengeEnable/ChallengeState invocation in derivation rollback path
- Remove validator.challengeEnable / validator.privateKey CLI flags
- Remove MORPH_NODE_VALIDATOR_PRIVATE_KEY env from docker compose files

Refs: morph-l2/morph-specs SPEC-005 §4.1
Equivalent in intent to PR #948 commit 3e49457, but applied directly to
main without the Phase A reorg/halted/rollback context that 3e49457
brought along.
Same as the merge previously applied to PR #948's branch — needed because
PR #935 (multi_batch / Batch Header v2) hasn't landed in main yet, but
SPEC-005 narrow scope's Path B implementation depends on common/batch +
common/blob packages introduced by it.

Resolutions:
- ops/docker/docker-compose-4nodes.yml: take origin/feat/multi_batch's
  TX_SUBMITTER_PRIORITY_ROLLUP=true and drop TX_SUBMITTER_SEAL_BATCH;
  that env block is owned by tx-submitter, which feat/derivation-batch-verify
  should not be modifying.
- go-ethereum submodule: keep at 045be0fd (v2.2.2). Multi_batch's 53aee9b
  internal merge accidentally regressed the pointer to 1d460577; that
  regression is unrelated to multi_batch's design intent and should not
  propagate to feat/derivation-batch-verify.
Pull the inline state-root / withdrawal-root mismatch check out of
derivationBlock into a standalone verifyBatchRoots(batchInfo, lastHeader)
function in a new node/derivation/verify.go.

Both roots are read from L1 calldata at parse time, so verifyBatchRoots
is independent of blob data — this is the SPEC-005 §3.4 invariant that
later allows Path B (local-rebuild verification) to reuse this same
check without modification.

No behavior change: the main loop still logs + returns on mismatch, sets
stateException on the metric, and continues otherwise. Only the location
and error-message format change.

Refs: morph-l2/morph-specs SPEC-005 §4.2 / §6 ("两种模式共享同一份 verifyBatchRoots")
@curryxbo curryxbo requested a review from a team as a code owner May 12, 2026 10:08
@curryxbo curryxbo requested review from tomatoishealthy and removed request for a team May 12, 2026 10:08
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13d2517a-4147-42b8-a138-b32611e3aa31

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR removes validator and blocktag services, introduces SPEC-005 Path B batch verification with a finalizer subcomponent and tag advancer, abstracts the L2 execution layer naming from geth to a generic "el" service, and refactors devnet/Makefile to support pluggable execution clients (geth/reth).

Changes

Derivation Path B Verification and Safe/Finalized Tag Management

Layer / File(s) Summary
Batch metadata parsing and blob hash tracking
node/derivation/batch_info.go
BatchInfo adds blobHashes field and new ParseBatchMetadataOnly method to extract L1-calldata metadata without decoding transactions or sidecar blobs.
Path B verification core logic
node/derivation/verify_path_b.go, node/derivation/verify_path_b_test.go
fetchBatchInfoPathB unpacks L1 commitBatch calldata, verifyBatchContentPathB reconstructs local batch payload and compares blob versioned hashes against L1 declaration, and fetchLocalLastHeader reads L2 headers for root verification. Tests validate V1/V2 round-trips and error paths.
Root and header verification
node/derivation/verify.go
verifyBatchRoots compares L2 state root and L2-to-L1 message withdrawal root against batch metadata, returning mismatch errors with both values.
Safe and finalized head advancement
node/derivation/tag_advance.go, node/derivation/tag_advance_test.go
tagAdvancer centralizes SetBlockTags RPC calls to engine, deduplicates redundant updates, enforces finalized <= safe <= unsafe invariant, handles L1 reorg resets, and exposes SafeMaxBatchIndex for finalizer read access. Tests verify monotonicity, deduplication, and invariant enforcement.
Finalization polling loop
node/derivation/finalizer.go
finalizer periodically reads L1 finalized header, queries rollup contract for committed batch index, resolves candidate batch's last L2 block via BatchDataStore, and advances finalized tag via tagAdvancer.advanceFinalized.
Derivation client integration
node/derivation/derivation.go
NewDerivationClient removes validator dependency, adds verifyMode selection and finalizer/tagAdvancer wiring. derivationBlock branches on verifyMode: pathB fetches+verifies batch metadata/content/headers, pathA uses prior tx-hash derivation. Both paths then call tagAdvancer.advanceSafe. Start() launches finalizer, Stop() waits for it.
Verification mode configuration
node/derivation/config.go, node/derivation/config_test.go
Config adds VerifyMode and FinalizerInterval fields with CLI parsing via DerivationVerifyMode/DerivationFinalizerInterval flags. Helper validateAndDefaultVerifyMode normalizes and validates mode strings. Tests confirm defaults and error handling.
Metrics for Path B and tag advancement
node/derivation/metrics.go
Metrics adds counters/gauges for PathBTriggered, PathBFailed, SafeAdvanceTotal, FinalizedAdvanceTotal, block numbers, L1 reorg resets, and tag invariant violations. PrometheusMetrics registers all with corresponding Prometheus names.

Service Architecture Removal

Layer / File(s) Summary
Validator service and config deletion
node/validator/validator.go, node/validator/validator_test.go, node/validator/config.go
Complete removal of Validator type, ChallengeState challenge-submission logic, and Config struct. Eliminates the challenge-enabled flow and associated L1 client/contract bindings.
Blocktag service and config deletion
node/blocktag/service.go, node/blocktag/config.go
Complete removal of BlockTagService polling loop, safe/finalized tag computation, and Config struct with CLI wiring. Functionality replaced by integrated tagAdvancer+finalizer in derivation.
Node startup refactor
node/cmd/node/main.go
Removes validator-specific sync/executor paths and block-tag-service initialization. Node now always creates shared store/syncer, dials L1, creates executor from shared syncer, and starts either mock sequencer or Tendermint consensus node followed by derivation client with integrated tag management.
CLI flag consolidation
node/flags/flags.go
Removes ValidatorEnable, ChallengeEnable, ValidatorPrivateKey, and BlockTagSafeConfirmations flags. Adds DerivationVerifyMode and DerivationFinalizerInterval flags.
Removal validation tests
node/derivation/static_scan_test.go
Three static-scan tests assert that no remaining code references banned validator or blocktag symbols, env vars, or import paths, and that Path B code uses morph-l2/common/blob helpers.

Execution Client Abstraction and Devnet Reth Support

Layer / File(s) Summary
Makefile execution-client selection
Makefile
Introduces EXECUTION_CLIENT variable (default geth), exports Reth build/runtime config (MORPH_RETH_*), computes DEVNET_COMPOSE_FILES and DEVNET_EXECUTION_DEPS conditionally, and updates devnet-* targets to use computed variables. New reth-image and reth targets for Reth image pull/build.
Devnet Python client selection
ops/devnet-morph/devnet/__init__.py
Adds --execution-client CLI option and compose_file_args helper to conditionally include docker-compose-reth.yml. Updated docstrings for helper functions; devnet bring-up now uses selected client compose files.
Docker Compose ReTH service template
ops/docker/docker-compose-reth.yml
New file defines YAML anchors for ReTH command and shared service template, creating five services (morph-el-0..3, sentry-el-0) that inherit parameterized image/entrypoint from MORPH_RETH_* variables.

Execution Service Rebranding (geth → el)

Layer / File(s) Summary
Mainnet/testnet compose updates
node/ops-morph/docker-compose.yml, node/ops-morph/testnet/docker-compose.yml, ops/docker/docker-compose-4nodes.yml, ops/docker/static-nodes.json
Service names changed from morph-geth-{0..3} to morph-el-{0..3} and sentry-geth-0 to sentry-el-0. Environment variables (MORPH_NODE_L2_ETH_RPC, MORPH_NODE_L2_ENGINE_RPC, RPC URLs) updated to reference new service names. Volume names updated (sequencer_geth_datasequencer_el_data, sentry_geth_datasentry_el_data). Validator services removed.
Validator Docker Compose deletion
node/ops-morph/docker-compose-validator.yml
File removed; eliminates validator node service and data volume.
Sequencer test infrastructure updates
ops/docker-sequencer-test/docker-compose.override.yml, ops/docker-sequencer-test/run-test.sh, ops/docker-sequencer-test/scripts/tx-generator.sh
Service names and image refs updated from morph-geth-* to morph-el-*. Docker build step now produces morph-el-test:latest instead of morph-geth-test.

Supporting Changes

Layer / File(s) Summary
Batch processing helper exports
common/batch/batch_cache.go
parsingTxs and buildBlockContext renamed to exported ParsingTxs and BuildBlockContext with expanded docstrings documenting on-chain encoding and Path B usage. Callers updated accordingly.
Dockerfile path and context updates
ops/docker-sequencer-test/Dockerfile.l2-geth-test, ops/docker-sequencer-test/Dockerfile.l2-node-test
Build context updated from bitget/ to polyrepo root. Dockerfile paths changed from /bitget/... to /polyrepo/... for module COPY and final artifact staging.
Oracle L2 endpoint update
oracle/docker-compose.yml
STAKING_ORACLE_L2_ETH_RPC repointed from morph-geth-0 to morph-el-0.

Sequence Diagram(s)

sequenceDiagram
    participant L1
    participant Node as Derivation<br/>Client
    participant TagAdv as TagAdvancer
    participant Final as Finalizer
    participant L2 as L2 Engine

    autonumber
    
    rect rgba(100, 150, 255, 0.5)
    Note over Node,Final: Batch Verification & Safe Advancement (Per Batch)
    Node->>Node: Check verifyMode
    
    alt pathB Mode
        Node->>L1: fetchBatchInfoPathB<br/>(L1 commitBatch tx)
        L1-->>Node: BatchInfo with blobHashes
        Node->>Node: verifyBatchContentPathB<br/>(reconstruct & compare)
        Node->>Node: fetchLocalLastHeader
    else pathA Mode
        Node->>L1: fetchRollupDataByTxHash
        L1-->>Node: RollupData
        Node->>Node: derive (legacy path)
    end
    
    Node->>TagAdv: advanceSafe(header, batchIndex)
    TagAdv->>L2: SetBlockTags (safe)
    L2-->>TagAdv: OK
    end
    
    rect rgba(150, 200, 100, 0.5)
    Note over Final,L2: Finalization Poll (Every 30s)
    Final->>L1: Read L1 finalized header
    L1-->>Final: blockNumber
    Final->>L1: Query LastCommittedBatchIndex @ L1 finalized
    L1-->>Final: batchIndex
    Final->>Node: BatchDataStore lookup<br/>candidate batch → lastL2Block
    Node-->>Final: L2 block number
    Final->>Node: Fetch L2 header @ lastL2Block
    Node-->>Final: Header
    Final->>TagAdv: advanceFinalized(header, batchIndex)
    TagAdv->>TagAdv: Check finalized ≤ safe
    TagAdv->>L2: SetBlockTags (finalized)
    L2-->>TagAdv: OK
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • morph-l2/morph#857: This PR directly removes the block tag service and configuration introduced by #857, undoing prior block tag support.
  • morph-l2/morph#947: Identical Makefile devnet reth/geth execution-client selection logic, computed compose-file variables, and new reth-* targets.
  • morph-l2/morph#925: Related Makefile changes to devnet-clean-build target's docker compose file selection and volume cleanup behavior.

Suggested reviewers

  • panos-xyz
  • Web3Jumb0

🐰 Derivation paths diverge with blobs so bright,
Safe heads advance by day and night,
Validators fade, new finalists take flight,
While geth becomes "el" in docker's sight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and comprehensively summarizes the three main objectives of the PR: validator cleanup, Path A/B verification mode selection, and derivation-driven tag management. It accurately reflects the primary changes across the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/derivation-batch-verify

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Introduces Path B verification mode (SPEC-005 §4) as a startup-time
mutually-exclusive alternative to Path A. Selected via
`--derivation.verify-mode` (default "pathA", preserves current
behaviour). The two modes do not interact: under no circumstance does
the node fall back from one to the other; switching requires changing
the config and restarting.

Path B mode:
  - Skips beacon-side blob fetch entirely.
  - Reads L1 commitBatch tx calldata + tx.BlobHashes() only.
  - Reads local L2 blocks for the batch range, replays the sequencer's
    encoding (parsingTxs + buildBlockContext + BatchData) to rebuild
    the blob bytes.
  - Compresses + builds a BlobTxSidecar; compares its versioned hashes
    against the values from L1 calldata.
  - Reuses the shared verifyBatchRoots for state/withdrawal root
    verification (independent of blob).

Touches:
  - common/batch: export ParsingTxs / BuildBlockContext (rename from
    package-private). No semantic change for tx-submitter.
  - node/derivation/batch_info.go: add BatchInfo.blobHashes field;
    add ParseBatchMetadataOnly (calldata-only parse, no blob).
  - node/derivation/config.go + flags: add VerifyMode + flag with
    fail-fast validation on unknown values.
  - node/derivation/derivation.go: dispatch in main loop based on
    verifyMode; populate blobHashes in Path A's fetch helper too so
    BatchInfo is consistent across modes.
  - node/derivation/verify_path_b.go: fetchBatchInfoPathB +
    verifyBatchContentPathB + fetchLocalLastHeader.
  - node/derivation/metrics.go: path_b_triggered_total /
    path_b_failed_total counters.

Verified clean: `go build ./node/... ./common/...`, `go vet`,
test compile.

Refs: morph-l2/morph-specs SPEC-005 §4 / §5 / §6
Resolves the WIP item flagged in PR #951.
@curryxbo curryxbo changed the title [WIP] feat(derivation): SPEC-005 narrow scope — validator removal + Path A/B mode select + Path B impl pending feat(derivation): SPEC-005 narrow scope — validator removal + Path A/B startup-time verification mode select May 12, 2026
@curryxbo curryxbo changed the base branch from main to feat/multi_batch May 12, 2026 10:24
…e cleanup

Completes the remaining SPEC-005 work after Path B (commit 9f01c53):

Target A residual -- delete validator node role:
- Remove ValidatorEnable flag (--validator / MORPH_NODE_VALIDATOR)
- Collapse cmd/node/main.go isValidator branch into a single linear
  assembly path; all non-mock nodes now run consensus + derivation
  simultaneously, sharing a single store + syncer instance
- Remove validator-el / validator_node containers from
  docker-compose-4nodes.yml and the docker-compose-reth.yml override
- Delete node/ops-morph/docker-compose-validator.yml

Target C -- delete blocktag, derivation drives safe/finalized heads:
- Delete node/blocktag/ package and BlockTagSafeConfirmations flag
- Add tag_advance.go: tagAdvancer with advanceSafe / advanceFinalized,
  finalized<=safe<=unsafe invariant assertion, L1-reorg reset hook,
  and dedup of redundant SetBlockTags calls
- Add finalizer.go: in-process subcomponent that polls L1 finalized
  -> Rollup.LastCommittedBatchIndex(@Finalized) and forwards to
  tagAdvancer.advanceFinalized; lifecycle joined to derivation.Stop
- Wire derivation main loop: advanceSafe after verifyBatchRoots passes;
  finalizer goroutine started in Start() and joined in Stop()
- Add 6 prometheus metrics (safe/finalized advance + l2 number gauges +
  l1 reorg reset + invariant violation counters)
- Add --derivation.finalizer-interval flag (default 30s)

Tests:
- 7 tag_advance behavioural tests (safe/finalized advance, dedup,
  monotonicity, invariant skip, L1 reorg reset, BlockNumber failure)
- 3 static-scan tests guarding against validator/blocktag residue and
  common/blob package drift in Path B

Note on SPEC-005 section 4.3 [must]: tagAdvancer.reset() infrastructure
is in place but is not yet invoked from production code; the trigger
(derivation main loop's L1 reorg detection) is deferred per
tech-design section 8. The blocktag-side independent polling path is
fully removed as required.

Refs: morph-l2/morph-specs SPEC-005 sections 4.1, 4.3

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@curryxbo curryxbo changed the title feat(derivation): SPEC-005 narrow scope — validator removal + Path A/B startup-time verification mode select feat(derivation): SPEC-005 — validator cleanup + Path A/B verification mode + derivation-driven tag management May 13, 2026
corey and others added 2 commits May 13, 2026 19:29
Add 9 unit-test cases covering SPEC-005 §5.1 entries that don't require
the full L1 / beacon / authclient mock stack:

- config_test.go (3): TestVerifyMode_DefaultIsPathA / _AcceptsPathB /
  _RejectsUnknown -- exercise validateAndDefaultVerifyMode and assert
  DefaultConfig().VerifyMode pins to pathA.
- verify_path_b_test.go (6): round-trip OK V1 + V2 (using empty L2
  blocks against the same encoder pipeline as the L1 oracle), versioned
  hash mismatch, local block missing, local block read error, three
  invalid-input branches (firstBlock=0, last<first, empty blobHashes).

Two small refactors enable the tests without constructing a Derivation:

- verify_path_b.go: extract pathBBlockReader interface and free
  function verifyPathBContent; the Derivation method becomes a 1-line
  delegate so behaviour and error messages are unchanged.
- config.go: extract validateAndDefaultVerifyMode from the inline
  switch in SetCliContext so the validation runs without a cli.Context.

Tests deferred to devnet (§5.2): TestPathA_*, TestTagAdvance_FinalizedFromL1,
TestTagAdvance_PathBVsPathAParity -- these require L1 client / beacon
client / Rollup contract / RetryableClient fakes that add ~300-400
lines of mock scaffolding for limited unit-test value above devnet
integration coverage.

Refs: morph-l2/morph-specs SPEC-005 §5.1

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Brings in 6 commits from base feat/multi_batch since the previous merge
(09a2c72) so PR #951 stops reporting CONFLICTING:

- 4be9cda update challenge handler dep
- c6f7cc2 add v0/v1 blob check
- 7ec11cf Merge branch 'feat/multi_batch' of github.com:morph-l2/morph
- fa9f4e8 fix submitter replay batch with config max blob count
- 1fe0d74 add rustc version desc
- fbd1bee chore: align go-ethereum submodule with origin/main

Single content conflict in common/batch/batch_cache.go: base added a new
sealEffectiveBlobCount function in the same hunk where SPEC-005 exported
BuildBlockContext (PR #935-derived rename). Resolved by keeping both:
sealEffectiveBlobCount stays where base placed it; the exported
BuildBlockContext (with SPEC-005 docstring) follows. base's lowercase
"buildBlockContext" comment was dropped (now stale -- we renamed it).

All other touched files auto-merged: contracts/Rollup.sol,
BlobVersionedHashLegacy.t.sol (added by base), prover/Cargo.{toml,lock},
prover/{,bin/challenge/}rust-toolchain (deleted by base),
tx-submitter/mock/rollup.go, common/batch/batch_restart_test.go.

go build ./common/batch/ ./common/blob/ ./node/... ./tx-submitter/...
./oracle/... -- clean.
go test ./node/derivation/... -- 22 tests PASS (incl. 9 added in
b9ad14e). common/batch tests hang on a pre-existing txSenderCacher
goroutine leak unrelated to this merge (reproducible on origin/feat/multi_batch
in isolation); not introduced here.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@curryxbo
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

CI golangci-lint flagged two categories on the SPEC-005 derivation files:

- gofmt: metrics.go, static_scan_test.go, tag_advance_test.go,
  verify_path_b.go -- alignment / spacing drift. Auto-fixed with
  gofmt -w; no semantic changes.
- misspell (US English): four "behaviour" -> "behavior" sites in
  config.go (added in b9ad14e), finalizer.go (a73639d),
  tag_advance.go (a73639d), and verify_path_b.go (b9ad14e).

go test ./node/derivation/... still passes 22 cases (incl. the 9 added
in b9ad14e). No symbols renamed; comments only.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
node/derivation/metrics.go (1)

20-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Run gofmt on this file to unblock CI.

The current pipeline is failing due to formatting in this file; apply gofmt and re-run checks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/derivation/metrics.go` around lines 20 - 42, The file's formatting is
out of sync with gofmt; run gofmt on the Metrics struct file (e.g., the Metrics
type and its fields like L1SyncHeight, PathBTriggered, SafeAdvanceTotal, etc.)
to normalize spacing and alignment, then re-add the formatted file to your
commit; you can run `gofmt -w node/derivation/metrics.go` (or `gofmt -w .` at
repo root) and re-run CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@node/cmd/node/main.go`:
- Around line 122-129: The Syncer has unsynchronized concurrent reads/writes to
the uint64 field latestSynced (accessed by fetchL1Messages(), the background
goroutine started in Start(), and readers via LatestSynced()), causing a data
race; fix by protecting all accesses to latestSynced: either add a sync.Mutex
(e.g., s.mu) and wrap every read/write in fetchL1Messages() and LatestSynced()
with s.mu.Lock()/Unlock(), or convert latestSynced to a uint64 and use
sync/atomic functions (atomic.LoadUint64/atomic.StoreUint64) in
fetchL1Messages(), Start() and LatestSynced() so all reads/writes are atomic and
race-free. Ensure you update any logging that reads latestSynced to use the
protected/atomic read.

---

Outside diff comments:
In `@node/derivation/metrics.go`:
- Around line 20-42: The file's formatting is out of sync with gofmt; run gofmt
on the Metrics struct file (e.g., the Metrics type and its fields like
L1SyncHeight, PathBTriggered, SafeAdvanceTotal, etc.) to normalize spacing and
alignment, then re-add the formatted file to your commit; you can run `gofmt -w
node/derivation/metrics.go` (or `gofmt -w .` at repo root) and re-run CI.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4791e23f-6353-4395-8d21-d6788540fd53

📥 Commits

Reviewing files that changed from the base of the PR and between 4be9cda and ffad1fc.

📒 Files selected for processing (35)
  • Makefile
  • common/batch/batch_cache.go
  • node/blocktag/config.go
  • node/blocktag/service.go
  • node/cmd/node/main.go
  • node/derivation/batch_info.go
  • node/derivation/config.go
  • node/derivation/config_test.go
  • node/derivation/derivation.go
  • node/derivation/finalizer.go
  • node/derivation/metrics.go
  • node/derivation/static_scan_test.go
  • node/derivation/tag_advance.go
  • node/derivation/tag_advance_test.go
  • node/derivation/verify.go
  • node/derivation/verify_path_b.go
  • node/derivation/verify_path_b_test.go
  • node/flags/flags.go
  • node/ops-morph/docker-compose-validator.yml
  • node/ops-morph/docker-compose.yml
  • node/ops-morph/testnet/docker-compose.yml
  • node/ops-morph/testnet/static-nodes.json
  • node/validator/config.go
  • node/validator/validator.go
  • node/validator/validator_test.go
  • ops/devnet-morph/devnet/__init__.py
  • ops/docker-sequencer-test/Dockerfile.l2-geth-test
  • ops/docker-sequencer-test/Dockerfile.l2-node-test
  • ops/docker-sequencer-test/docker-compose.override.yml
  • ops/docker-sequencer-test/run-test.sh
  • ops/docker-sequencer-test/scripts/tx-generator.sh
  • ops/docker/docker-compose-4nodes.yml
  • ops/docker/docker-compose-reth.yml
  • ops/docker/static-nodes.json
  • oracle/docker-compose.yml
💤 Files with no reviewable changes (6)
  • node/validator/config.go
  • node/validator/validator_test.go
  • node/validator/validator.go
  • node/blocktag/config.go
  • node/blocktag/service.go
  • node/ops-morph/docker-compose-validator.yml

Comment thread node/cmd/node/main.go
Comment on lines +122 to +129
// Reuse the shared syncer instance -- DevSequencer mode is the only path
// that pulls a syncer out of NewExecutor, so we hand back the same one
// rather than letting NewExecutor open a second store + syncer.
newSyncerFunc := func() (*sync.Syncer, error) { return syncer, nil }
executor, err = node.NewExecutor(newSyncerFunc, nodeConfig, pubKey)
if err != nil {
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if sync.Syncer has internal locking or concurrency documentation

# Check for mutex/lock fields in the Syncer struct
rg -nP --type=go -A 20 'type Syncer struct' node/sync/

# Check for concurrent access documentation or warnings
rg -nP --type=go -C 5 'concurrent|thread.?safe|goroutine' node/sync/syncer.go

Repository: morph-l2/morph

Length of output: 1102


🏁 Script executed:

# Check how syncer is used in executor initialization
rg -nP --type=go -B 5 -A 10 'NewExecutor' node/cmd/node/main.go

# Check executor's use of syncer
fd -e go -path "*/executor*" | head -20

# Check derivation client initialization and syncer usage
rg -nP --type=go -B 3 -A 10 'derivation|Derivation' node/cmd/node/main.go | head -50

Repository: morph-l2/morph

Length of output: 2505


🏁 Script executed:

# Find where derivation is initialized with syncer
rg -nP --type=go 'derivation\.' node/cmd/node/main.go | head -30

# Check derivation initialization signature
fd -name derivation.go | xargs rg -nP 'func New.*Derivation' -A 5

# Check if derivation stores and uses syncer
fd -name "*.go" -path "*/derivation/*" -type f

# Check how executor and derivation are used (concurrent or sequential)
rg -nP --type=go 'go ' node/cmd/node/main.go | grep -E '(executor|derivation|dvNode)'

Repository: morph-l2/morph

Length of output: 1357


🏁 Script executed:

# Check if dvNode and executor run concurrently
rg -nP --type=go 'go dvNode|go executor|dvNode\.Start|executor\.' node/cmd/node/main.go

# Check syncer methods to see if they have internal synchronization
rg -nP --type=go 'func.*Syncer.*\(' node/sync/syncer.go | head -20

# Check Syncer's actual method implementations for locks/synchronization
rg -nP --type=go 'sync\.Mutex|sync\.RWMutex|\bLock\(\)|\bUnlock\(' node/sync/

# Look at how derivation uses syncer
rg -nP --type=go 'syncer\.' node/derivation/derivation.go | head -20

Repository: morph-l2/morph

Length of output: 678


🏁 Script executed:

# Check Syncer.Start() implementation to see if it uses goroutines
rg -nP --type=go -A 20 'func.*Start.*\{' node/sync/syncer.go | grep -A 20 'func (s \*Syncer) Start'

# Check fetchL1Messages for goroutines
rg -nP --type=go -A 15 'func.*fetchL1Messages' node/sync/syncer.go

# Check how latestSynced is modified
rg -nP --type=go 'latestSynced\s*=' node/sync/syncer.go

# Check executor's use of syncer
rg -nP --type=go -B 3 -A 10 'newSyncerFunc\|syncer\.' node/core/ | head -40

Repository: morph-l2/morph

Length of output: 1322


🏁 Script executed:

# Check if executor actually uses syncer after initialization
rg -nP --type=go -B 5 -A 15 'type Executor' node/core/executor.go | head -50

# Search for syncer usage in executor
rg -nP --type=go 'syncer\.' node/core/executor.go | head -20

# Check what NewExecutor does with the syncer function
rg -nP --type=go -A 30 'func NewExecutor' node/core/executor.go | head -50

Repository: morph-l2/morph

Length of output: 1704


🏁 Script executed:

# Verify when syncer is started
rg -nP --type=go 'syncer.*Start\|syncer.*New' node/cmd/node/main.go

# Check the full flow: when is syncer created vs when is dvNode started
sed -n '77,151p' node/cmd/node/main.go

# Check if derivation.Start() is async (goroutine)
rg -nP --type=go -B 2 -A 10 'func.*Derivation.*Start' node/derivation/derivation.go | head -30

Repository: morph-l2/morph

Length of output: 3244


🏁 Script executed:

# Check if executor calls newSyncerFunc or uses syncer
rg -nP --type=go 'newSyncerFunc\|e\.syncer' node/core/executor.go

# Check the full Start method of Derivation to confirm syncer.Start() timing
rg -nP --type=go -A 30 'func.*Derivation.*Start' node/derivation/derivation.go

Repository: morph-l2/morph

Length of output: 875


🏁 Script executed:

# Check for atomic operations on latestSynced
rg -nP --type=go 'atomic\.' node/sync/syncer.go

# Check if there are any channels or other sync mechanisms
rg -nP --type=go 'chan\|sync\.' node/sync/syncer.go | grep -v 'package\|import'

# Verify latestSynced access patterns
rg -nP --type=go 'latestSynced' node/sync/syncer.go

# Check derivation's use of syncer - does it call LatestSynced or ReadL1MessagesInRange?
rg -nP --type=go 's\.latestSynced\|LatestSynced\|ReadL1MessagesInRange' node/derivation/derivation.go

Repository: morph-l2/morph

Length of output: 733


🏁 Script executed:

# Check if there are any other synchronization mechanisms (channels, etc)
cat node/sync/syncer.go | head -100

# Check how ReadL1MessagesInRange accesses latestSynced
rg -nP --type=go -B 5 -A 10 'ReadL1MessagesInRange' node/sync/syncer.go

Repository: morph-l2/morph

Length of output: 3258


Syncer requires synchronization for concurrent access to latestSynced field.

The sync.Syncer struct shares the latestSynced field (uint64) between the background polling goroutine spawned by Start() and external readers (e.g., LatestSynced()). The fetchL1Messages() method reads and writes this field without any mutex or atomic protection, creating a data race when the syncer is concurrently accessed by the derivation client.

The latestSynced field is read at line 131 (loop initialization) and lines 136-137 (logging), and written at line 167, all within fetchL1Messages() which runs in a background goroutine. This unsynchronized concurrent access can cause memory corruption or unpredictable behavior.

Add sync.Mutex to protect latestSynced or use sync/atomic for the uint64 operations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@node/cmd/node/main.go` around lines 122 - 129, The Syncer has unsynchronized
concurrent reads/writes to the uint64 field latestSynced (accessed by
fetchL1Messages(), the background goroutine started in Start(), and readers via
LatestSynced()), causing a data race; fix by protecting all accesses to
latestSynced: either add a sync.Mutex (e.g., s.mu) and wrap every read/write in
fetchL1Messages() and LatestSynced() with s.mu.Lock()/Unlock(), or convert
latestSynced to a uint64 and use sync/atomic functions
(atomic.LoadUint64/atomic.StoreUint64) in fetchL1Messages(), Start() and
LatestSynced() so all reads/writes are atomic and race-free. Ensure you update
any logging that reads latestSynced to use the protected/atomic read.

corey and others added 9 commits May 13, 2026 20:20
The shared *Syncer is wired through node/cmd/node/main.go to two
independent start sites:

  Path 1: Derivation.Start spawns a goroutine that calls d.syncer.Start
          (node/derivation/derivation.go:166, since the very first commit).
  Path 2: Executor.updateSequencerSet calls e.syncer.Start (or `go
          e.syncer.Start`) on the !isSequencer -> isSequencer transition
          (node/core/sequencers.go:86,88). On a real Tendermint sequencer
          node, e.isSequencer starts at false (zero value), so the first
          updateSequencerSet call always fires this branch -- producing
          a second Start on the syncer Derivation already started.

The previous Start was not idempotent: it (a) re-ran the blocking
initial fetchL1Messages, (b) spawned a duplicate poller racing on
s.latestSynced and s.metrics, and (c) double-closed s.stop on shutdown
(panic: close of closed channel). On verifier-only nodes only Path 1
fires, masking the issue; the mock-sequencer devnet path avoids
Tendermint and also masks it; real Tendermint sequencer nodes hit the
race.

This bug predates SPEC-005 (Path 1 was added in 7f2f887, the initial
commit, 2024-01-12), but the SPEC-005 PR consolidates how derivation
drives shared services and is the natural place to retire the
duplicate-load path. Wrapping Start in s.startOnce.Do() keeps both call
sites correct without changing ownership semantics: the second caller
returns immediately, the first poller continues, shutdown closes
s.stop exactly once.

go build ./node/sync/ ./node/derivation/ ./node/core/ -- clean.
go test ./node/derivation/... -- 22 tests PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
QA hit a "Path B silently stops" symptom on hoodi against an old snapshot:
local L2 latest 4,470,254 lagged the L1-committed batch's required
range 5,279,569-5,279,890 by ~800k blocks, but the only visible log
was `path B fetched batch metadata`; no error surfaced for tens of
minutes.

Root cause: retryableError() classified every error except a literal
"discontinuous block number" string as retryable, so RetryableClient
kept exponentially backing off ethereum.NotFound for the full 30-minute
MaxElapsedTime budget. SPEC-005 Path B is the first caller to read L2
blocks the local node may not yet have sealed; older callers
(AssembleL2Block / NewSafeL2Block / sequencer paths) only ever read
known-existing blocks, masking the issue.

Treat ethereum.NotFound as permanent so it escapes the backoff loop on
the first attempt:

- retryableError() short-circuits on errors.Is(err, ethereum.NotFound)
  (handles fmt.Errorf wrapping too).
- HeaderByNumber / BlockByNumber log Info on retryable failures (still
  transient chatter) and Error on the non-retryable escape path so the
  signal is visible even when the caller layer's logging is filtered.

Net effect for the QA scenario: BlockByNumber returns NotFound to
verify_path_b on the first attempt; verify_path_b returns
"path B: read local block N failed: not found"; derivation.go logs
"path B content verification failed" Error and the next pollInterval
re-evaluates. The operator immediately sees the local-height gap
instead of staring at a silent log.

Adds node/types/retryable_client_test.go covering NotFound (direct +
wrapped), DiscontinuousBlockError, and generic transient errors.

go build ./node/types/ ./node/derivation/ -- clean.
go test ./node/types/... ./node/derivation/... -count=1 -- PASS
(3 new in types, 22 in derivation).

Refs: morph-l2/run-morph-node#92 testing report (2026-05-14).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
When SPEC-005 Path B verification fails — particularly with a versioned
hash mismatch — operators previously had to grep across logs and hand-
fetch L1 calldata to reconstruct the batch shape that produced the
mismatch. Add a single structured Error log at every Path B fail point
so the relevant fields are present without re-running anything:

  kind: invalid_block_range / empty_blob_hashes / local_block_missing
        / local_block_read_error / parsing_txs_error / compress_error
        / sidecar_build_error / blob_count_mismatch / versioned_hash_mismatch
  always: batchIndex, version, firstBlock, lastBlock,
          parentTotalL1Popped, expectedBlobs
  per-site: blockNumber (block-level errors); encoding, payloadLen,
            compressedLen, rebuiltBlobs, rebuiltHashes,
            expectedHashes, mismatchIndex (encoding / hash errors)

The pathBFail helper centralises the log + metric increment + error
wrap so call sites stay one-liners, and the existing
"path B fetched batch metadata" entry log is enriched with batchIndex,
version, parentTotalL1Popped, expectedBlobs so an operator can spot
abnormal entry conditions without waiting for a failure.

New metric:
  derivation_path_b_failed_by_kind_total{kind="..."}
incremented alongside the unlabelled path_b_failed_total via
IncPathBFailedKind so dashboards can split failures by category.

Cost: zero on the success path; the diagnostic computation (slice
lengths, hex CSV of <= 6 hashes) only runs at fail points.

go test ./node/derivation/... -- 22 cases PASS.

Refs: morph-l2/run-morph-node#92 (operator request: don't make us
hand-roll a one-shot script every time Path B fails).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ant, not Version byte

QA hit a versioned hash mismatch on hoodi (commit txHash 0x763f5f76...,
batchIndex 17367) with rebuilt=0x015a6d... vs expected=0x018577... (both
valid EIP-4844 hashes, just different bytes). Tracing the sequencer
side end to end shows Path B's V1/V2 dispatch is keyed on the wrong
field:

Sequencer's actual encoding decision (chain of code):

  1. tx-submitter passes isBatchUpgraded=nil to NewBatchCache
     (tx-submitter/services/rollup.go:127-138).
  2. NewBatchCache defaults nil to `func(uint64) bool { return true }`
     (common/batch/batch_cache.go:102-104), so isBatchUpgraded is
     effectively always true for any live sequencer.
  3. handleBatchSealing always enters the V2 branch first and uses
     TxsPayloadV2 whenever the compressed result fits in sealBlobCap
     (batch_cache.go:787-829). The V1 fallback only triggers when V2
     overflows AND isBatchV2Upgraded(ts) is still false, which is rare
     for normal-sized batches.
  4. createBatchHeader stamps the version byte from isBatchV2Upgraded
     alone (batch_cache.go:918-934): before that governance flag flips,
     version=1 even when the payload is V2-encoded.
  5. The new commitBatch ABI (rollup.go:1128-1136) does not carry
     BlockContexts in calldata, so the blob payload MUST be V2-encoded
     for the chain history to be reconstructable.

Path A already keys off `batch.BlockContexts != nil` from calldata
(batch_info.go::ParseBatch), which is the correct discriminator. Path B
keyed off `batchInfo.version >= 2`, treating every version=1 batch as
V1-encoded — exactly the failure surfaced by QA on hoodi during the
V1->V2 transition window.

Fix:

- BatchInfo gains hasCalldataBlockContexts, set in
  ParseBatchMetadataOnly to len(batch.BlockContexts) > 0. (Field doc
  spells out why version byte is wrong here.)
- verifyPathBContent dispatches on hasCalldataBlockContexts:
    true  -> TxsPayload   (legacy ABI: blob = txs only)
    false -> TxsPayloadV2 (new ABI:    blob = blockContexts || txs)
  The previous `version >= 2` branch is gone.
- pathBFail structured log adds hasCalldataBlockContexts so future
  diagnoses see the dispatch input directly.

Tests:

- Renamed RoundTripOK_V1/V2 to RoundTripOK_LegacyABI/NewABI and
  switched the oracle's parameter from `version` to `useV2Encoding`.
- Added TestPathB_VersionByte1_NewABI_UsesV2Encoding as a direct
  regression for the QA case (version=1 + new ABI -> blob=V2). This
  test fails on the prior dispatch and passes on the fix.

go build ./node/derivation/ -- clean.
go test ./node/derivation/... -count=1 -- 23 cases PASS (was 22; +1 regression).

Refs: morph-l2/run-morph-node#92 (hoodi hash mismatch report,
2026-05-15 09:38).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ap, not Rollup.BatchDataStore

QA observed finalized stuck on hoodi while safe kept advancing. Cast
queries against the rollup contract showed:

  lastCommittedBatchIndex@L1Finalized = 17797
  batchDataStore(17797).blockNumber  = 5418036
  batchDataStore(17796) = 0,0,0,0
  batchDataStore(17389) = 0,0,0,0

The contract intentionally clears BatchDataStore storage for older
batches as part of its on-chain GC; only the latest committed batch's
record stays populated. The finalizer's
`Rollup.BatchDataStore(candidate)` lookup therefore returns zero for
any candidate that isn't the very newest, the existing zero-guard
skips advancement, and finalized never moves while node logs
`finalizer: batch has zero lastL2Block; skipping`.

The discriminator-source fix: tagAdvancer is the right place to hold
the (batchIndex -> header) mapping because advanceSafe is already
called once per verified batch with the header in hand. Move the
lookup off-chain entirely:

- tagAdvancer gains verifiedBatches map[uint64]*eth.Header, populated
  inside advanceSafe alongside safeMaxBatchIndex.
- New LookupVerifiedBatchHeader(batchIndex) replaces
  finalizer.lookupBatchLastL2Block + the contract call.
- advanceFinalized evicts map entries <= the new finalized index,
  keeping the map bounded by the steady-state safe-vs-finalized lag.
- reset (L1 reorg) clears the map: pre-reorg entries aren't
  authoritative against the new L1 view; derivation refills naturally
  as it walks the rewound cursor.
- finalizer.tick: 4 RPC calls -> 2 (drop BatchDataStore +
  HeaderByNumber); the L2 client / lookupBatchLastL2Block helper /
  zero-BlockNumber defensive guard are gone since none are reachable
  anymore. newFinalizer no longer takes l2Client.

Restart behavior is unchanged: map starts empty; first finalizer ticks
log
  finalizer: verified batch header not found in local map; will retry
until derivation has re-verified up to a candidate that intersects the
new map. Same outcome, clearer signal -- and it doesn't depend on
contract state retention.

Tests:
- TestTagAdvance_VerifiedBatchLookup: roundtrip after advanceSafe.
- TestTagAdvance_VerifiedBatchEvictedOnFinalize: entries <= finalized
  are dropped, entries > finalized retained.
- TestTagAdvance_VerifiedBatchClearedOnReset: L1 reorg wipes the map.

Spec impact: tech-design.md §4.7.4's finalizer description still names
"Rollup.BatchDataStore" as the lookup source. That sentence needs an
update in morph-specs to "tagAdvancer's local verifiedBatches map";
not blocking the implementation PR.

go build ./node/derivation/ -- clean.
go test ./node/derivation/... -count=1 -- 26 cases PASS (was 23; +3 lookup/eviction/reset).

Refs: morph-l2/run-morph-node#92 (hoodi: finalized stuck while safe
advances; finalizer: batch has zero lastL2Block; skipping batchIndex=17394).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…k@finalizedBlock), not BatchDataStore reverse-lookup

QA's finalized-stuck observation on hoodi traced to two issues:

  1. Rollup.batchDataStore is on-chain GC'd (Rollup.sol:665 deletes
     batchDataStore[_batchIndex - 1] inside finalizeBatch), so the
     previous design's `min(committedAtFin, safeMaxBatchIndex)` candidate
     would frequently land on an older batchIndex whose storage slot
     had been cleared, returning zero and triggering the defensive skip.
  2. The previous fix (verifiedBatches map in tagAdvancer) compensated
     by holding a local batchIndex -> header map. That worked but was
     more machinery than needed: the contract retains
     batchDataStore[committedAtFin] at any given L1 block, because
     committedAtFin >= lastFinalizedBatchIndex@thatBlock, which is
     above the GC threshold at that block's state.

The cleaner fix is to pin both rollup queries to the L1 finalized block
and operate on L2 block NUMBERS, not batchIndex round-trips:

  L1FinalizedLastBlock = batchDataStore[committedAtFin]@finalizedBlock.blockNumber
  finalized.blockNumber = min(localSafe.number, L1FinalizedLastBlock)

For the common case (L1FinalizedLastBlock >= localSafe.number, default
Confirmations=finalized derivation steady-state) we anchor finalized to
local safe directly: hash + number are already in tagAdvancer memory.
For the other case (operator set Confirmations < finalized so derivation
ran ahead of L1 finalized) we anchor to L1FinalizedLastBlock and pull
the L2 hash via l2Client.HeaderByNumber (the block is local because
L1FinalizedLastBlock < localSafe.number and we verified up to localSafe).

Plus a defensive canonicality check: before advancing finalized, re-read
HeaderByNumber(safeNumber) against the L2 client and require the hash
to still equal tagAdvancer.safeL2Hash. On mismatch (L2 client state
divergence; or, future, an L1 reorg whose detection hasn't yet
re-synced the tag advancer) we skip the advance and reset tagAdvancer
to force re-verification rather than finalizing a stale safe.

Reverts the verifiedBatches map / LookupVerifiedBatchHeader / per-finalize
eviction added in c20983d -- not needed once we stop reverse-looking
batchIndex against the contract.

Changes:

- tagAdvancer.advanceFinalized signature: (ctx, batchIndex, *eth.Header)
  -> (ctx, batchIndex, hash, number). The "anchor to local safe" branch
  has hash + number directly without fabricating a synthetic header.
- tagAdvancer.Safe() new getter returns (safeL2Hash, safeL2Number) under
  mutex for atomic read by finalizer.
- finalizer.tick rewritten: 1 L1 RPC + 2 L1 contract calls (both pinned
  to L1 finalized) + 1 L2 RPC for the canonicality check, plus a second
  L2 RPC only for the rare safeNum > L1FinalizedLastBlock branch.
- finalizer struct keeps l2Client (needed for canonicality check + the
  rare-branch header fetch); newFinalizer signature unchanged from
  pre-c20983d4 era (l2Client back in).
- BatchDataStore zero-blockNumber defensive guard remains as a sanity
  fallback even though it should never fire under the pinned-query
  design (committed at finalized always > GC threshold at that block).
- Drops the 3 verifiedBatches lookup/eviction/reset tests; replaces
  with a single TestTagAdvance_SafeGetter covering the new snapshot.

Spec impact: tech-design.md §4.7.4's lookup phrasing changes from "look
up batch lastL2Block via Rollup.BatchDataStore" to "compare local safe
number against L1FinalizedLastBlock derived from the latest committed
batch at L1 finalized; anchor finalized to whichever is smaller". I'll
update morph-specs in a follow-up doc PR (c20983d's commit message
already promised this update; the new phrasing replaces it).

go build ./node/derivation/ -- clean.
go test ./node/derivation/... -count=1 -- 23 cases PASS (was 26 with
verifiedBatches tests; -3 dropped, +0 net since we replaced all 3
with TestTagAdvance_SafeGetter and the old finalizer/lookup tests
covered the same code paths).

Refs: morph-l2/run-morph-node#92 (hoodi: finalized stuck while safe
advances; node.log "finalizer: batch has zero lastL2Block; skipping
batchIndex=17394"; cast batchDataStore(17389) = 0,0,0,0,
batchDataStore(17797).blockNumber = 5418036).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
QA's hoodi observation makes the safe canonicality story incomplete
without L1 reorg detection: when --derivation.confirmations is
configured below finalized (a future operational option once the
default ships), an L1 reorg can invalidate already-saved L1 commit
batches and leave localSafe pointing at orphaned-from-L1 state. The
finalizer's defensive HeaderByNumber check (bbc25f7) catches the L2
side; reorg detection catches the L1 side.

This brings detectReorg / handleL1Reorg / recordL1Blocks into the
SPEC-005 implementation PR (originally drafted in the now-closed
[DEFER] PR #950). Departures from #950's draft:

  - Always on. #950 gated detection on `confirmations != finalized`;
    here the scan runs unconditionally. With confirmations=finalized
    the fast path is one HeaderByNumber call per poll that
    deterministically returns "no reorg" -- a uniform code path is
    simpler than a gate that becomes dead code the moment the
    confirmations default is relaxed in a follow-up.
  - L2 chain rollback is intentionally NOT included. verifyBlockContext
    + halted state machine + rollbackLocalChain are SPEC-005 §3
    non-goals; if a reorg replaces a committed batch with structurally
    different blocks, verifyBatchRoots will catch the divergence on
    re-derivation and log + halt at that batch (operator intervention),
    rather than auto-rolling-back L2 head.
  - On reorg detection, also reset tagAdvancer (clears safeL2Hash,
    rolls safeMaxBatchIndex back by one) so the finalizer waits for
    derivation to re-verify before advancing again.

Components:

- node/db/keys.go: new derivationL1BlockPrefix + DerivationL1BlockKey.
- node/db/store.go: DerivationL1Block{Number, Hash} struct + 4
  methods (Write/Read/ReadRange/DeleteFrom). RLP-encoded.
- node/derivation/database.go: Reader/Writer interface gains
  ReadDerivationL1BlockRange / WriteDerivationL1Block /
  DeleteDerivationL1BlocksFrom.
- node/derivation/reorg.go: detectReorg (fast path + slow scan),
  handleL1Reorg (cursor rewind + DB cleanup + tagAdvancer.reset),
  recordL1Blocks (post-poll persist).
- node/derivation/config.go: ReorgCheckDepth uint64, default 64.
- node/flags/flags.go: --derivation.reorg-check-depth flag, env
  DERIVATION_REORG_CHECK_DEPTH.
- node/derivation/derivation.go: derivationBlock now begins with
  detectReorg → handleL1Reorg path; ends with recordL1Blocks before
  cursor advance. Recording failure aborts the cursor advance to
  avoid permanent gaps in the L1 hash record.

Cost in the common case (confirmations=finalized): 1 extra L1
HeaderByNumber per poll (fast-path check against newest saved hash)
plus one Write per L1 block scanned. Sub-millisecond and bounded by
fetchBlockRange (default 100).

Spec impact: SPEC-005 §3 non-goals previously listed "L1 reorg
detection" under deferred work. With this commit, the detection
itself moves into scope; the L2 rollback executor / halted state
machine remain non-goals. tech-design.md §4.7.6 phrasing in
morph-specs needs an update; will be done in a follow-up doc PR.

go build ./node/db/ ./node/derivation/ -- clean.
go test ./node/derivation/... ./node/db/... -count=1 -- PASS
(23 derivation cases + db tests; reorg.go inherits coverage from
#950's field-tested implementation, plus the existing
TestNoBlocktagReferences static guard that asserts no stale
references slip in).

Refs: morph-l2/morph-specs SPEC-005 §4.7.6;
#950 (closed; scope folded back in here).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Now that L1 reorg detection is wired up (36b7980), the derivation
default can move off the previous finalized-only stance. L2 safe head
visibility ~halves the previous lag:

  L1 finalized  ~13 min lag (2 epochs)
  L1 safe       ~ 6 min lag (1 epoch, justified)

The trade-off: L1 safe blocks are theoretically reorg-able if a Casper
FFG slashing condition fires (extremely rare in practice; would imply a
1/3+ stake double-vote). detectReorg / handleL1Reorg / recordL1Blocks
introduced in the prior commit catch any such reorg, rewind the
derivation cursor, and reset the tag advancer so the finalizer's
canonicality check (bbc25f7) doesn't advance over an orphaned safe.

This brings morph in line with how OP-stack / Arbitrum Nitro / Scroll
treat L1 reads -- read fresh-ish with reorg awareness, rather than
waiting two epochs at the cost of latency. SPEC-005's tech-design
§4.7.6 already anticipated the dependency between reorg detection and
relaxing confirmations; with both pieces now in this PR the dependency
is satisfied.

Operators who want the previous finalized-only behaviour can still set
either of the existing override flags:
  --derivation.confirmations=-3
or env DERIVATION_CONFIRMATIONS=-3 (rpc.FinalizedBlockNumber). The
sync (l1 message ingestion) default is intentionally NOT changed in
this commit; it remains finalized.

go build ./node/derivation/ -- clean.
go test ./node/derivation/... -count=1 -- 23 cases PASS.

Spec impact: tech-design.md §0 / §1 / §6 / §7 currently imply
finalized as the read level; the description should be updated to
"safe by default with reorg detection; finalized via override".
Follow-up doc PR.

Refs: morph-l2/morph-specs SPEC-005 §4.7.6.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…set for canonicality recovery

Two related cleanups:

1. Drop the dedicated finalizer goroutine. The finalizer was originally
   split off "so L1-finalized polling does not block the derivation main
   loop's batch verification cadence", but the main loop already does
   slower L1 RPCs (eth_getLogs / eth_getTransactionByHash) every poll, so
   the separate goroutine bought nothing while introducing cross-goroutine
   writes (cursor / tagAdvancer) that the canonicality recovery path then
   had to dance around.

   Now the finalizer step is a method on Derivation (finalizerTick) called
   inline at the end of each derivationBlock iteration. Single goroutine =
   no race between handleL1Reorg and finalizerTick on the L1 cursor.

2. Extract rewindAndReset(rewindToL1Height) from handleL1Reorg's body and
   reuse it from finalizerTick's canonicality-failure path. Both
   recoveries (L1 reorg vs local L2 safe-block hash divergence) collapse
   to op-stack-style "reset to a known good parent and re-derive forward":
   rewind cursor, clear saved L1 hashes, reset tagAdvancer; the next poll
   re-fetches L1 commit batch logs, re-runs Path A or Path B verification,
   and re-populates safe via advanceSafe. Persistent breakage surfaces as
   verifyBatchRoots failure on re-derivation, transient state-client
   weirdness self-heals.

   handleL1Reorg becomes a thin wrapper around rewindAndReset for call-site
   readability.

Also removes the now-redundant --derivation.finalizer-interval flag (and
DefaultFinalizerInterval / FinalizerInterval config field). The finalizer
runs at the main loop's pollInterval cadence; if rate-limiting becomes a
concern later we can reintroduce a "run finalizer every N polls" knob,
but that's premature optimisation for now.

Spec impact: tech-design.md §4.7.4 (finalizer goroutine) and §4.7.8
(finalizer-interval flag) need a follow-up doc PR; will land in
morph-specs alongside this commit.

Net diff:
  derivation.go      +/-  Start no longer spawns finalizer goroutine; Stop
                          no longer joins finalizer.stopped; struct loses
                          finalizer + finalizerInterval fields.
  finalizer.go       -finalizer struct + run() + stopped channel + ticker;
                     +(d *Derivation) finalizerTick() method that calls
                     rewindAndReset on canonicality fail.
  reorg.go           +rewindAndReset(rewindToL1Height); handleL1Reorg now
                     a 4-line wrapper.
  config.go / flags  -DefaultFinalizerInterval / FinalizerInterval field /
                     DerivationFinalizerInterval flag + Flags entry.

go build ./node/derivation/ ./node/flags/ -- clean.
go test ./node/derivation/... -count=1 -- 23 cases PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants