Skip to content

feat: confidence-gated cost cascade for AI name recovery (metaharness thesis)#5

Open
ruvnet wants to merge 1 commit into
mainfrom
feat/metaharness-cost-cascade
Open

feat: confidence-gated cost cascade for AI name recovery (metaharness thesis)#5
ruvnet wants to merge 1 commit into
mainfrom
feat/metaharness-cost-cascade

Conversation

@ruvnet

@ruvnet ruvnet commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

Wires the metaharness cost-cascade thesis onto ruDevolution's existing
per-inference confidence score: run a cheap model tier first, escalate
to a frontier tier only when confidence < threshold. Most names get
recovered cheaply ($0 corpus tier); you pay the frontier price only for the
hard, low-confidence ones. Pure cost-Pareto — no accuracy lost, because the
cheap answer is kept whenever it already clears the bar.

Default behavior is unchanged — the cascade is strictly opt-in. The
standard decompile() pipeline is untouched and all 59 pre-existing tests stay
green.


Part 1 — Honest architecture review (overlap with metaharness)

ruDevolution already implements much of the metaharness thesis natively:

Capability ruDevolution (native) metaharness analogue Verdict
Self-learning inferrer::learn_from_ground_truth extracts LearnedPatterns from ground-truth comparisons; TrainingCorpus (210+ patterns) feeds inference; real_world tests run "with learning". darwin evolve loop Already has it. Don't bolt on a second learner.
Witness chains witness.rs: SHA3-256 content hashes + binary Merkle root, verify_witness_chain self-check, serializable WitnessChainData, self-verified inside decompile(). ADR-011 Ed25519 witness Already has it. (Hash-chain provenance vs signed authorship — see interop note.)
MinCut module detection partitioner.rs: exact MinCut via ruvector-mincut::GraphPartitioner for <5K nodes, Louvain (rayon-parallel) for ≥5K. ruvector-mincut / ADR-190 Already has it — literally the same ruvector-mincut crate.
Confidence scoring InferredName.confidence (0..1) per inference, 5-strategy ladder (corpus → string patterns → property correlation → multi-literal → structural), Confidence::{High,Medium,Low} thresholds. confidence gate Already has it — and this is the hook.
AI inference path neural.rs (feature neural): NeuralInferrer with 3 backends (pure-Rust transformer .bin, ONNX .onnx, GGUF/RVF stub). infer_names_neural already does a crude "neural first, fall back to corpus" at a hardcoded 0.8 cutoff. model router Partial. Single model, fixed cutoff, no cost awareness, no tier ordering, no outcome recording.

Where metaharness genuinely adds value: the AI inference path. ruDevolution
has a confidence gate but spends it on a fallback decision (neural → corpus),
not a cost-routing decision (cheap → frontier, pay only on escalation). That
is exactly the cascade thesis, and it slots onto the existing confidence field
with zero redundant machinery. Self-learning, witnesses, and MinCut are not
touched — they already exist.

Part 2 — What was implemented (the genuine fit)

New src/cascade.rs:

  • NameInferrer trait — one method (infer) + label()/cost(). A tier is
    anything from the $0 corpus inferrer to a neural model to a remote frontier API.
  • CorpusTier — wraps the existing inferrer::infer_declaration_name +
    TrainingCorpus as the cheap ($0) tier. No new inference logic.
  • CascadeInferrer — cheapest-first tiers + escalation threshold (default
    0.9, mirroring Confidence::High). Runs tiers in order, stops as soon as a
    tier clears the threshold
    (never invokes more expensive tiers), keeps the
    highest-confidence answer if none clear it.
  • CascadeOutcome / CascadeStats — per-inference record (winning tier,
    confidence, tiers tried, escalated?, did escalation change the answer?, cost)
    and aggregates (cheap-win rate, cost spent vs frontier-only baseline, savings).
  • suggest_threshold / self_tune — read recorded outcomes and adjust the
    threshold for the next run: lower it when the frontier keeps confirming
    the cheap tier (stop paying for confirmations), raise it when the frontier
    keeps overturning it. This is the self-tuning loop that closes with
    ruDevolution's existing "gets smarter every run" learning.

src/lib.rs: pub mod cascade + opt-in infer_names_cascade(modules, tiers, threshold).
The default decompile() path is not changed.

examples/cost_cascade.rs: deterministic, model-free, $0 demo. Output:

a    -> tier=corpus         conf=0.95 escalated=false cost=0
b    -> tier=corpus         conf=0.95 escalated=false cost=0
c    -> tier=frontier(mock) conf=0.93 escalated=true cost=100
d    -> tier=frontier(mock) conf=0.93 escalated=true cost=100
cheap wins: 2 (50%)  escalations: 2  cost saved: 200 (50% vs frontier-only)

How it maps to the existing confidence score

The cascade reads nothing new — it routes purely on InferredName.confidence,
the score the crate has always produced. The cheap tier is the existing
5-strategy inferrer verbatim; escalation fires exactly when that score is below
the bar.

Default-unchanged confirmation

  • decompile() / decompile_default() still call inferrer::infer_names — no change.
  • A single-tier cascade can never escalate, so its output == the wrapped tier
    (test single_tier_never_escalates_default_unchanged,
    single_corpus_matches_existing_inferrer).
  • All 59 pre-existing tests pass unmodified.

Part 3 — Interop notes (design-level, no heavy cross-language dep)

  • ruDevolution as darwin's decompile engine: the cascade's CascadeStats
    (cheap-win rate, cost saved, escalation overturn rate) are exactly the fitness
    signals a darwin/metaharness self-improvement loop optimizes. Expose them as a
    run summary and darwin can evolve the threshold + tier mix.
  • @metaharness/router as an optional sidecar: it fits behind NameInferrer
    as a frontier tier — implement infer() to POST the InferenceContext to the
    router over HTTP and map the response into InferredName{confidence}. No
    cross-language build dependency; the router is just one more Box<dyn NameInferrer>.
    Kept as a documented bridge, not wired in (no real model calls in this PR).
  • Witness-chain alignment: ruDevolution uses SHA3-256 Merkle content hashing;
    ADR-011 uses Ed25519 signatures. They compose rather than conflict — the
    Merkle chain_root is the natural payload to Ed25519-sign for authorship,
    giving "these bytes derive from that bundle" (have) + "signed by this agent"
    (add). Noted as a clean future bridge, not implemented here.

Test results

cargo test (built against the real ruvector-mincut dep chain):

lib unit:     54 passed   (41 prior + 13 new cascade tests)
ground_truth:  5 passed
integration:   8 passed
real_world:    4 passed
doc-tests:     1 passed
------------------------------
TOTAL:        72 passed, 0 failed   (was 59; +13 new, 0 pre-existing changed)

cargo clippy --all-targets: zero new warnings from cascade.rs/lib.rs
(the repo's pre-existing clippy warnings in other files are untouched).

Notes / honest status

  • Nothing stubbed in the cascade itself — routing, threshold gating,
    best-answer keeping, outcome recording, stats, and self-tuning are all real and
    tested. The example's "frontier" tier is an intentional deterministic mock so
    the demo is $0 and reproducible; production plugs a real model into the same trait.
  • Build note: the public repo inherits version.workspace/serde.workspace
    and a ../ruvector-mincut path dep, so it builds as a member of the ruvector
    workspace (where ruvector-mincut lives). This PR was built/tested against that
    real dependency; the committed Cargo.toml keeps the workspace inheritance
    intact and only adds the [[example]] entry.

🤖 Generated with claude-flow

https://claude.ai/code/session_019rVRYrRDKyxYK18kuVrDSf

Wire the metaharness cost-cascade thesis onto rudevolution's existing
per-inference confidence score: run a cheap model tier first and escalate
to a frontier tier ONLY when confidence < threshold. Most names are
recovered cheaply; pay frontier cost only for the hard, low-confidence ones.

- src/cascade.rs: NameInferrer trait, CorpusTier ($0 built-in tier),
  CascadeInferrer (cheapest-first routing, keeps best answer), CascadeOutcome
  + CascadeStats (per-inference + aggregate cost accounting), and
  suggest_threshold/self_tune that learn from recorded outcomes to self-tune
  the escalation threshold over runs (closes the self-learning loop).
- src/lib.rs: pub mod cascade + opt-in infer_names_cascade(); default
  decompile() path is untouched.
- examples/cost_cascade.rs: deterministic, model-free ($0) demo.
- 13 new unit tests; all 59 pre-existing tests stay green (72 total).
- Default behavior unchanged: a single-tier cascade == existing inferrer.

No real model calls. No provider hardcoded — bring your own frontier tier
via the NameInferrer trait (e.g. neural::NeuralInferrer or an external
sidecar such as @metaharness/router).

Co-Authored-By: claude-flow <[email protected]>
Claude-Session: https://claude.ai/code/session_019rVRYrRDKyxYK18kuVrDSf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant