perf!: delta-aware BnbMetric via SelectionView/compute_view by evanlinjin · Pull Request #53 · bitcoindevkit/coin-select

evanlinjin · 2026-07-02T17:03:54Z

Summary

Makes the branch-and-bound metric evaluator delta-aware, eliminating the dominant cost in run_bnb. A flamegraph showed ~65% of run_bnb_lowest_fee time was spent in cs.selected_value() (47.5%) and cs.input_weight() (17.3%) — both O(|selected|) walks recomputed many times per branch via excess/is_target_met/drain.

Now BnB maintains a per-branch SelectionCache (running aggregates: value_sum, weight_sum, input_count, segwit_count, candidate_count). Inclusion expansions call cache.add(c) — O(1). Metrics receive a &SelectionView<'_> (a handle over &CoinSelector + Cow<SelectionCache>) whose read methods (selected_value, input_weight, excess, is_target_met, drain, …) are all O(1).

Outside the hot path, cs.compute_view() builds a fresh cache once on demand, collapsing the ~15 duplicated O(|selected|) read methods that previously lived on CoinSelector into a single entry point.

Benchmarks

Criterion median; delta-aware vs the bitset baseline, vs the pre-bitset master:

bench	master	bitset	delta-aware	total speedup
clone/4096	5.68 µs	58 ns	52 ns	109×
run_bnb_lowest_fee/20	470 µs	417 µs	158 µs	3.0×
run_bnb_lowest_fee/50	26.0 ms	10.3 ms	5.4 ms	4.8×
run_bnb_lowest_fee/100	73.2 ms	40.2 ms	14.2 ms	5.2×
run_bnb_lowest_fee/200	1.27 s	1.10 s	196 ms	6.5×

The gain is largest at large n, where selected_value/input_weight recomputation dominated. (The n=200 row measures the round-capped exploration path — see the bench commit for the explicit solution-finding vs cap-exhaustion group split.)

Commits

Each commit independently passes the full CI gate (fmt, clippy, doc -D warnings, --release tests, --no-default-features build), so the branch is bisectable.

perf!: delta-aware BnbMetric via SelectionView/compute_view — the core change. Introduces SelectionCache/SelectionView, moves the read methods off CoinSelector behind compute_view(), and makes the BnbMetric trait take &SelectionView. Adapted to the current trait API (target passed as a parameter; metric owns its change decision via drain(); Changeless<M>). The cache proptest validates against an independent O(n) oracle (the original input_weight walk), not just a from-scratch cache rebuild.
bench: extend pool sizes to wallet (1k) and exchange (10M) scale — also splits the BnB bench into run_bnb_lowest_fee (sizes that find solutions) and run_bnb_lowest_fee_exhaust_cap (sizes where best-first search exhausts the 100k-round cap), with startup assertions pinning which path each group measures.
perf: skip unselected scan in BnbIter via per-branch cursor — each Branch carries a cursor into candidate_order, so insert_new_branches jumps straight to the next undecided candidate. The exclusion branch's duplicate-dedup run explicitly skips already-decided candidates (matching the old unselected() semantics) — covered by a regression test with a pre-selected duplicate.

API changes (breaking)

New public SelectionView<'_>; SelectionCache is internal.
BnbMetric::{score,bound,drain} now take &SelectionView<'_> instead of &CoinSelector<'_>.
The read methods (selected_value, input_weight, weight, excess + variants, is_target_met(+_with_drain), fee, implied_fee, implied_feerate, missing, drain_value, drain, effective_value, waste) move from CoinSelector to SelectionView; reach them via cs.compute_view(). is_selection_possible is renamed SelectionView::is_target_reachable.

Review

Beyond CI, this branch went through a 60-agent adversarial review (10 dimensions incl. differential execution of branch vs master vs brute-force exhaustive search over seeded random scenarios; every finding independently verified by 3 reviewers). All confirmed findings are fixed in the commits above. Two pre-existing issues (present on master, not regressions) surfaced and are not addressed here:

LowestFee::bound's assert!(ideal_fee >= 0.0) can panic on valid inputs due to f32 rounding.
BnB's exclusion dedup keys on (value, weight) only, treating candidates differing in is_segwit/input_count as equivalent, which can yield slightly suboptimal results vs exhaustive search.

🤖 Generated with Claude Code

Squash of the two-commit perf series (cd1017a "delta-aware BnbMetric via SelectionView/SelectionCache" + 536a03a "replace duplicated CoinSelector read methods with compute_view()"), rebased onto the new BnbMetric API (target passed as a parameter; metric decides its own change output). Squashed because the first commit's intermediate state (src/selection_cache.rs, SelectionView-by-value) is fully superseded by the second (src/selection_view.rs, Cow-backed cache, &SelectionView, compute_view()); replaying both would mean resolving the same BnbMetric merge against master twice. The flamegraph on fix/better-memory showed ~65% of run_bnb_lowest_fee time was in cs.selected_value() (47.5%) and cs.input_weight() (17.3%) -- both O(|selected|) walks recomputed many times per branch via excess/is_target_met/drain. This makes the metric evaluator delta-aware. BnB maintains a SelectionCache (running aggregates over the selection: value_sum, weight_sum, input_count, segwit_count, candidate_count) per Branch; inclusion expansions call cache.add(c), which is O(1). The metric trait now takes a `&SelectionView<'_>` -- a handle over (&CoinSelector, Cow<SelectionCache>) -- with O(1) versions of every read method (selected_value, input_weight, excess, is_target_met, drain, ...). CoinSelector's ~15 duplicated O(|selected|) read methods collapse into a single `cs.compute_view()` entry point (fresh cache built once on demand); the BnB hot path borrows the per-branch cache instead (zero clone). Adapted to the master-side API changes: - score/bound/drain take `target: Target` as a parameter (metrics no longer store target); BnbIter threads its `target` through to the metric. - LowestFee owns the change decision (dust_relay_feerate + drain_weights, no change_policy) and implements `drain`; its delta-aware not-target-met bound loop is preserved. - Changeless<M> wraps an inner metric, using &SelectionView + target. All lib, integration and doc tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

The original benches capped at 4k candidates for `clone` and 200 for `run_bnb_lowest_fee`. Real callers span a much wider range -- a typical wallet has ~1k UTXOs, a large exchange ~10M. For the O(n)-ish operations (`new`, `clone`, `compute_view`) extend the parameter list to 64 / 1k / 16k / 256k / 1M / 10M. These all scale roughly linearly: clone/64 51 ns clone/1024 52 ns clone/16384 260 ns clone/262144 2.0 us clone/1048576 6.3 us clone/10000000 98 us At 10M UTXOs the `Candidate` slice itself is ~320MB and the selector's `candidate_order` Vec is ~80MB -- commented as a heads-up for memory- constrained hosts. Add new groups: - `new`: cost of `CoinSelector::new(candidates)` -- allocations grow with pool size. - `compute_view`: cost of building a SelectionView. Scales with |selected| rather than |pool|; benched against a fixed sparse selection of ~100 candidates regardless of pool size, matching how wallets actually use selection. The BnB bench splits into two explicitly-named groups, because at n >= 200 best-first exploration does not complete any target-meeting selection within the round cap (run_bnb returns NoBnbSolution after exactly 100k rounds): - `run_bnb_lowest_fee` (n = 20/50/100): end-to-end solution finding. - `run_bnb_lowest_fee_exhaust_cap` (n = 200/500/1000): exactly MAX_ROUNDS rounds of frontier expansion (bound() + branch cloning) -- the hot path the delta-aware cache optimizes. Per-round cost grows roughly linearly: run_bnb_lowest_fee_exhaust_cap/200 193 ms run_bnb_lowest_fee_exhaust_cap/500 211 ms run_bnb_lowest_fee_exhaust_cap/1000 377 ms Each group asserts at startup that run_bnb's solution-found outcome matches what the group claims to measure, so a size silently flipping between the two paths (metric change, bound tightening) fails loudly instead of corrupting cross-version comparisons. 10M-scale BnB is intentionally not benchmarked: it's impractical at any finite round budget, and real callers pre-filter / pre-group at that scale. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Co-Authored-By: Claude Fable 5 <[email protected]>

The flamegraph after the delta-aware refactor showed ~32% of run_bnb time spent walking candidate_order and checking Bitset::contains(selected) || Bitset::contains(banned) per element, inlined into insert_new_branches's `cs.unselected().next()`. As BnB descends and more candidates get selected/banned, each .next() call scans further before finding the next viable candidate. But BnB never re-considers a position: each branch's exploration only moves forward in candidate_order. Inclusion advances by 1; exclusion advances past every consecutive same-(value, weight) candidate. So we can store a per-Branch cursor and avoid the scan entirely. Add `Branch::cursor: usize` (the position the branch will expand on next). The init branch starts at 0; insert_new_branches advances past any pre-selected/pre-banned positions on demand, then expands at the located cursor and hands children their new cursors directly. One subtlety: the exclusion branch's same-(value, weight) dedup run now walks raw candidate_order positions, where the old `unselected()` scan skipped already-decided candidates implicitly. The run must do the same explicitly -- skip pre-selected/pre-banned positions (advancing the cursor past them) rather than banning them or letting them end the run. Otherwise a caller-pre-selected candidate that duplicates an excluded one ends up simultaneously selected and banned in the selector that run_bnb hands back, and a pre-decided candidate inside a duplicate run fragments the equivalence class into redundant branches. Covered by `bnb_exclusion_dedup_skips_decided_candidates`. Bench (run_bnb_lowest_fee, n = pool size): n=20 166 us -> 147 us (11%) n=50 5.3 ms -> 4.5 ms (16%) n=100 12.6 ms -> 11.5 ms (9%) n=500 200 ms -> 171 ms (14%) n=1000 365 ms -> 248 ms (32%) (n >= 200 rows are the exhaust-cap group: fixed 100k rounds of frontier expansion, so they measure pure per-round cost.) Largest win at large n where the unselected scan was burning the most time. Flamegraph confirms the unselected-scan hot spot is gone; new top is LowestFee::bound itself (the metric's float math + lookahead). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Co-Authored-By: Claude Fable 5 <[email protected]>

evanlinjin force-pushed the perf/delta-aware-metric branch 3 times, most recently from 89b4255 to ec6a713 Compare July 2, 2026 17:21

evanlinjin and others added 3 commits July 3, 2026 10:19

evanlinjin force-pushed the perf/delta-aware-metric branch from ec6a713 to 07e2a36 Compare July 3, 2026 10:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf!: delta-aware BnbMetric via SelectionView/compute_view#53

perf!: delta-aware BnbMetric via SelectionView/compute_view#53
evanlinjin wants to merge 3 commits into
bitcoindevkit:masterfrom
evanlinjin:perf/delta-aware-metric

evanlinjin commented Jul 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

evanlinjin commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks

Commits

API changes (breaking)

Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

evanlinjin commented Jul 2, 2026 •

edited

Loading