Skip to content

Collator stability fixes#214

Merged
bvscd merged 5 commits into
release/node/v0.9.0from
collator
Jun 14, 2026
Merged

Collator stability fixes#214
bvscd merged 5 commits into
release/node/v0.9.0from
collator

Conversation

@bvscd

@bvscd bvscd commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings June 12, 2026 09:56
@bvscd bvscd changed the base branch from master to release/node/v0.9.0 June 12, 2026 09:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses multiple collator/validator stability and parity issues across the VM, block primitives, Simplex consensus (restart recovery + invariant hardening + repair validation), and ADNL transport (RLDP/QUIC resiliency + metrics).

Changes:

  • Fix VM CDEPTH to report represented depth for Merkle-proof pruned-branch cells and add coverage.
  • Harden validator/collator and Simplex restart/recovery paths (skip/final cert replay, skipscan non-panicking fallbacks, pruned dispatch-queue handling) and tighten requestCandidate repair validation/merging.
  • Improve ADNL RLDP/QUIC robustness (worker pool, reconnect regression tests, size caps, counters exported via metrics) and update versions/docs/changelog.

Reviewed changes

Copilot reviewed 38 out of 61 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/vm/tests/test_cell_serialization.rs Adds regression test for CDEPTH on pruned-branch cells in Merkle proofs.
src/vm/src/executor/serialization.rs Changes CDEPTH to always use stored/represented depth (fixes pruned-branch behavior).
src/node/src/validator/validator_group.rs Alters candidate-observation handling and resolver-cache updates in validator group.
src/node/src/validator/validate_query.rs Treats pruned dispatch-queue access as a “have unprocessed queue” condition instead of hard error.
src/node/src/validator/tests/test_validator_group.rs Updates test to feed a real deserializable block body consistent with observed block id.
src/node/src/validator/tests/test_collator.rs Adds bundle-driven regression tests for pruned dispatch queue / removed cells in proof.
src/node/src/validator/accept_block.rs Gates top-shard-descr promotion on signature finality for Simplex.
src/node/src/tests/test_signature.rs Adds tests for top-shard-descr promotion rules (ordinary vs final/notarized Simplex).
src/node/src/tests/static/0.e000000000000000_67704975_6cea4ad1_collator_test_bundle/index.json Adds a new static collator test bundle index.
src/node/src/tests/static/0.8000000000000000_76770988_db9fc78e_collator_test_bundle/index.json Adds a new static collator test bundle index.
src/node/simplex/src/tests/test_simplex_state.rs Adds extensive regression coverage for restart parity, blocker recovery, skipscan fallback semantics.
src/node/simplex/src/tests/test_restart.rs Extends restart recovery tests for final/skip cert replay ordering and listener hooks.
src/node/simplex/src/tests/test_receiver.rs Improves receiver tests to use valid leader signatures and quorum notar bytes for repair paths.
src/node/simplex/src/tests/test_candidate_resolver.rs Adds tests for merge semantics and requestCandidate response validation (identity + notar signatures).
src/node/simplex/src/startup_recovery.rs Adds startup replay mode + restores persisted SkipCert/FinalCert before restart skip generation.
src/node/simplex/src/simplex_state.rs Moves diagnostics types, adds startup replay gating, makes skipscan non-panicking with bounded fallback.
src/node/simplex/src/receiver.rs Validates repair responses before merge/cache; preserves C++-like partial merge semantics safely.
src/node/simplex/README.md Updates version/semantics notes and documents new stability/async DB/restart/observer behaviors.
src/node/simplex/CHANGELOG.md Adds 0.7.1 release notes capturing the parity/stability work.
src/node/simplex/Cargo.toml Bumps simplex crate version to 0.7.1.
src/node/consensus-common/src/tests/test_async_key_value_storage.rs Updates tests to assert the new typed “already taken” sentinel behavior.
src/node/consensus-common/src/lib.rs Introduces StorageResultAlreadyTaken typed sentinel and documents downcast-based detection.
src/node/consensus-common/src/async_key_value_storage.rs Emits StorageResultAlreadyTaken instead of stringly-typed errors on taken results.
src/node/Cargo.toml Bumps node crate version to 0.9.0.
src/Cargo.lock Updates lockfile for version bumps and added deps (e.g., metrics).
src/block/src/storage_stat.rs Ensures removed cells are marked visited in UsageTree so proofs include removals.
src/block/src/dictionary/mod.rs Hardens dictionary label reading/iteration against pruned cells; propagates pruned access as PrunedCellAccess.
src/block/src/dictionary/hashmapaug.rs Adds/notes diff-scanning helper doc comment in macro-generated API.
src/adnl/tests/test_quic.rs Adds reconnect regression test and updates imports/utilization.
src/adnl/src/rldp/send.rs Adds SendWorkerPool and refactors send paths/state to reduce coupling and improve pacing.
src/adnl/src/rldp/recv.rs Adjusts RLDP total-size cap to account for TL overhead.
src/adnl/src/rldp/mod.rs Refactors outbound job scheduling/execution and aligns closed-transfer behavior; tracks extended caps separately.
src/adnl/src/quic/stat.rs Introduces event counters with per-dump windows + cumulative Prometheus counters via metrics.
src/adnl/src/quic/mod.rs Uses new counters, pre-registers metrics, and adjusts sender-task connect-failure/flush/retry behavior.
src/adnl/src/adnl/node.rs Adds periodic yields in heavy/expired/non-channeled paths to reduce worker-thread monopolization.
src/adnl/Cargo.toml Adds metrics dependency for QUIC transport metrics.
src/adnl/benches/bench_rldp.rs Tweaks debug loss simulation and timeout to use probabilistic drop.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/node/src/validator/validator_group.rs Outdated
Comment thread src/node/src/validator/validator_group.rs
Comment thread src/node/simplex/src/receiver.rs
mnogoborec and others added 2 commits June 12, 2026 20:03
…ion recovery

Address Copilot review on #214:
- receiver: reject empty / zero-weight validator set in
  validate_repair_notar_signature_set before the threshold_66 quorum gate,
  since threshold_66(0) == 0 would otherwise accept a signature-less notar
  response.
- validator_group: on candidate-body deserialize failure, record the
  observation flags-only (block = None) instead of dropping it, so flag
  updates can OR-merge and a later valid body can overwrite the entry
  instead of stranding resolver waiters.

Add regression tests for both paths.
@bvscd bvscd merged commit 59580fa into release/node/v0.9.0 Jun 14, 2026
6 checks passed
@bvscd bvscd deleted the collator branch June 14, 2026 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants