feat: confidence-gated cost cascade for AI name recovery (metaharness thesis)#5
Open
ruvnet wants to merge 1 commit into
Open
feat: confidence-gated cost cascade for AI name recovery (metaharness thesis)#5ruvnet wants to merge 1 commit into
ruvnet wants to merge 1 commit into
Conversation
Wire the metaharness cost-cascade thesis onto rudevolution's existing per-inference confidence score: run a cheap model tier first and escalate to a frontier tier ONLY when confidence < threshold. Most names are recovered cheaply; pay frontier cost only for the hard, low-confidence ones. - src/cascade.rs: NameInferrer trait, CorpusTier ($0 built-in tier), CascadeInferrer (cheapest-first routing, keeps best answer), CascadeOutcome + CascadeStats (per-inference + aggregate cost accounting), and suggest_threshold/self_tune that learn from recorded outcomes to self-tune the escalation threshold over runs (closes the self-learning loop). - src/lib.rs: pub mod cascade + opt-in infer_names_cascade(); default decompile() path is untouched. - examples/cost_cascade.rs: deterministic, model-free ($0) demo. - 13 new unit tests; all 59 pre-existing tests stay green (72 total). - Default behavior unchanged: a single-tier cascade == existing inferrer. No real model calls. No provider hardcoded — bring your own frontier tier via the NameInferrer trait (e.g. neural::NeuralInferrer or an external sidecar such as @metaharness/router). Co-Authored-By: claude-flow <[email protected]> Claude-Session: https://claude.ai/code/session_019rVRYrRDKyxYK18kuVrDSf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires the metaharness cost-cascade thesis onto ruDevolution's existing
per-inference confidence score: run a cheap model tier first, escalate
to a frontier tier only when
confidence < threshold. Most names getrecovered cheaply ($0 corpus tier); you pay the frontier price only for the
hard, low-confidence ones. Pure cost-Pareto — no accuracy lost, because the
cheap answer is kept whenever it already clears the bar.
Default behavior is unchanged — the cascade is strictly opt-in. The
standard
decompile()pipeline is untouched and all 59 pre-existing tests staygreen.
Part 1 — Honest architecture review (overlap with metaharness)
ruDevolution already implements much of the metaharness thesis natively:
inferrer::learn_from_ground_truthextractsLearnedPatterns from ground-truth comparisons;TrainingCorpus(210+ patterns) feeds inference; real_world tests run "with learning".witness.rs: SHA3-256 content hashes + binary Merkle root,verify_witness_chainself-check, serializableWitnessChainData, self-verified insidedecompile().partitioner.rs: exact MinCut viaruvector-mincut::GraphPartitionerfor <5K nodes, Louvain (rayon-parallel) for ≥5K.ruvector-mincutcrate.InferredName.confidence(0..1) per inference, 5-strategy ladder (corpus → string patterns → property correlation → multi-literal → structural),Confidence::{High,Medium,Low}thresholds.neural.rs(featureneural):NeuralInferrerwith 3 backends (pure-Rust transformer.bin, ONNX.onnx, GGUF/RVF stub).infer_names_neuralalready does a crude "neural first, fall back to corpus" at a hardcoded 0.8 cutoff.Where metaharness genuinely adds value: the AI inference path. ruDevolution
has a confidence gate but spends it on a fallback decision (neural → corpus),
not a cost-routing decision (cheap → frontier, pay only on escalation). That
is exactly the cascade thesis, and it slots onto the existing
confidencefieldwith zero redundant machinery. Self-learning, witnesses, and MinCut are not
touched — they already exist.
Part 2 — What was implemented (the genuine fit)
New
src/cascade.rs:NameInferrertrait — one method (infer) +label()/cost(). A tier isanything from the $0 corpus inferrer to a neural model to a remote frontier API.
CorpusTier— wraps the existinginferrer::infer_declaration_name+TrainingCorpusas the cheap ($0) tier. No new inference logic.CascadeInferrer— cheapest-first tiers + escalation threshold (default0.9, mirroringConfidence::High). Runs tiers in order, stops as soon as atier clears the threshold (never invokes more expensive tiers), keeps the
highest-confidence answer if none clear it.
CascadeOutcome/CascadeStats— per-inference record (winning tier,confidence, tiers tried, escalated?, did escalation change the answer?, cost)
and aggregates (cheap-win rate, cost spent vs frontier-only baseline, savings).
suggest_threshold/self_tune— read recorded outcomes and adjust thethreshold for the next run: lower it when the frontier keeps confirming
the cheap tier (stop paying for confirmations), raise it when the frontier
keeps overturning it. This is the self-tuning loop that closes with
ruDevolution's existing "gets smarter every run" learning.
src/lib.rs:pub mod cascade+ opt-ininfer_names_cascade(modules, tiers, threshold).The default
decompile()path is not changed.examples/cost_cascade.rs: deterministic, model-free, $0 demo. Output:How it maps to the existing confidence score
The cascade reads nothing new — it routes purely on
InferredName.confidence,the score the crate has always produced. The cheap tier is the existing
5-strategy inferrer verbatim; escalation fires exactly when that score is below
the bar.
Default-unchanged confirmation
decompile()/decompile_default()still callinferrer::infer_names— no change.(test
single_tier_never_escalates_default_unchanged,single_corpus_matches_existing_inferrer).Part 3 — Interop notes (design-level, no heavy cross-language dep)
CascadeStats(cheap-win rate, cost saved, escalation overturn rate) are exactly the fitness
signals a darwin/metaharness self-improvement loop optimizes. Expose them as a
run summary and darwin can evolve the threshold + tier mix.
@metaharness/routeras an optional sidecar: it fits behindNameInferreras a frontier tier — implement
infer()to POST theInferenceContextto therouter over HTTP and map the response into
InferredName{confidence}. Nocross-language build dependency; the router is just one more
Box<dyn NameInferrer>.Kept as a documented bridge, not wired in (no real model calls in this PR).
ADR-011 uses Ed25519 signatures. They compose rather than conflict — the
Merkle
chain_rootis the natural payload to Ed25519-sign for authorship,giving "these bytes derive from that bundle" (have) + "signed by this agent"
(add). Noted as a clean future bridge, not implemented here.
Test results
cargo test(built against the realruvector-mincutdep chain):cargo clippy --all-targets: zero new warnings fromcascade.rs/lib.rs(the repo's pre-existing clippy warnings in other files are untouched).
Notes / honest status
best-answer keeping, outcome recording, stats, and self-tuning are all real and
tested. The example's "frontier" tier is an intentional deterministic mock so
the demo is $0 and reproducible; production plugs a real model into the same trait.
version.workspace/serde.workspaceand a
../ruvector-mincutpath dep, so it builds as a member of theruvectorworkspace (where
ruvector-mincutlives). This PR was built/tested against thatreal dependency; the committed
Cargo.tomlkeeps the workspace inheritanceintact and only adds the
[[example]]entry.🤖 Generated with claude-flow
https://claude.ai/code/session_019rVRYrRDKyxYK18kuVrDSf