Skip to content

Add post-dump plausibility gate returning status:degraded (#334)#433

Open
yangsec888 wants to merge 1 commit into
DeusData:mainfrom
yangsec888:fix/dump-verify-plausibility-gate-334
Open

Add post-dump plausibility gate returning status:degraded (#334)#433
yangsec888 wants to merge 1 commit into
DeusData:mainfrom
yangsec888:fix/dump-verify-plausibility-gate-334

Conversation

@yangsec888

Copy link
Copy Markdown
Contributor

Summary

  • Adds a self-referential plausibility gate after index_repository completes: when persisted SQLite node rows fall far below the in-memory graph buffer count at dump time, return status:"degraded" with expected_nodes / expected_edges instead of silent status:"indexed".
  • Closes the observability gap left after fix(store): checkpoint WAL on close and startup to prevent orphan accumulation #387 (WAL checkpoint on open/close) for cases where the store is partial or integrity-check auto-clean removes the DB.

Fixes #334 (design b as discussed in the issue thread).

Motivation

Rapid kill/restart cycles could leave status:"indexed" with a small fraction of the true node count. Maintainer agreed on design (b): compare persisted rows to extracted/committed rows (self-referential, no cross-repo assumptions).

Changes

  • src/foundation/dump_verify.c — pure cbm_dump_verify_is_degraded() + CBM_DUMP_VERIFY_MIN_RATIO env (default 0.5, 0 disables).
  • src/pipeline/pipeline.c — capture committed_nodes / committed_edges at dump; accessor cbm_pipeline_get_committed_counts.
  • src/mcp/mcp.c — gate after successful pipeline; checkpoint+recount once; new response fields.
  • tests/test_dump_verify.c — pure-function case matrix.
  • tests/test_dump_verify_io.c — store I/O tests (real cbm_store_count_nodes, shortfall simulation, fork/crash WAL recovery).

Response shape

{
  "project": "...",
  "status": "degraded",
  "nodes": 469,
  "edges": 1200,
  "expected_nodes": 5915,
  "expected_edges": 9531,
  "hint": "Persisted far fewer nodes than indexed — likely durability loss..."
}

isError remains false so partial graphs stay queryable. Downstream parsers that only require nodes/edges continue to work; new status is opt-in.

Design notes

  • Nodes-only gate — edges legitimately shrink at dump when endpoints fail to resolve; comparing edge counts would false-trigger.
  • resolve_store NULL (integrity auto-clean) → degraded with nodes:0, not silent indexed.
  • CBM_DUMP_VERIFY_MIN_RATIO=0 disables the gate (escape hatch).
  • committed_nodes = -1 sentinel when dump did not run (explicit init; calloc zero would be ambiguous).

Test plan

  • make -f Makefile.cbm test green (5581 passed)
  • suite_dump_verify pure-function matrix
  • suite_dump_verify_io store I/O + fork/crash WAL recovery (POSIX)
  • Manual: index repo, kill sibling mid-WAL, re-index → status:"degraded"

Related

Compare persisted SQLite node counts to in-memory dump counts after
index_repository completes so partial WAL/durability loss surfaces as
status:"degraded" instead of silent indexed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Silent index corruption after rapid kill/restart cycles

1 participant