Skip to content

perf!: delta-aware BnbMetric via SelectionView/compute_view#53

Draft
evanlinjin wants to merge 3 commits into
bitcoindevkit:masterfrom
evanlinjin:perf/delta-aware-metric
Draft

perf!: delta-aware BnbMetric via SelectionView/compute_view#53
evanlinjin wants to merge 3 commits into
bitcoindevkit:masterfrom
evanlinjin:perf/delta-aware-metric

Conversation

@evanlinjin

@evanlinjin evanlinjin commented Jul 2, 2026

Copy link
Copy Markdown
Member

Summary

Makes the branch-and-bound metric evaluator delta-aware, eliminating the dominant cost in run_bnb. A flamegraph showed ~65% of run_bnb_lowest_fee time was spent in cs.selected_value() (47.5%) and cs.input_weight() (17.3%) — both O(|selected|) walks recomputed many times per branch via excess/is_target_met/drain.

Now BnB maintains a per-branch SelectionCache (running aggregates: value_sum, weight_sum, input_count, segwit_count, candidate_count). Inclusion expansions call cache.add(c)O(1). Metrics receive a &SelectionView<'_> (a handle over &CoinSelector + Cow<SelectionCache>) whose read methods (selected_value, input_weight, excess, is_target_met, drain, …) are all O(1).

Outside the hot path, cs.compute_view() builds a fresh cache once on demand, collapsing the ~15 duplicated O(|selected|) read methods that previously lived on CoinSelector into a single entry point.

Benchmarks

Criterion median; delta-aware vs the bitset baseline, vs the pre-bitset master:

bench master bitset delta-aware total speedup
clone/4096 5.68 µs 58 ns 52 ns 109×
run_bnb_lowest_fee/20 470 µs 417 µs 158 µs 3.0×
run_bnb_lowest_fee/50 26.0 ms 10.3 ms 5.4 ms 4.8×
run_bnb_lowest_fee/100 73.2 ms 40.2 ms 14.2 ms 5.2×
run_bnb_lowest_fee/200 1.27 s 1.10 s 196 ms 6.5×

The gain is largest at large n, where selected_value/input_weight recomputation dominated. (The n=200 row measures the round-capped exploration path — see the bench commit for the explicit solution-finding vs cap-exhaustion group split.)

Commits

Each commit independently passes the full CI gate (fmt, clippy, doc -D warnings, --release tests, --no-default-features build), so the branch is bisectable.

  1. perf!: delta-aware BnbMetric via SelectionView/compute_view — the core change. Introduces SelectionCache/SelectionView, moves the read methods off CoinSelector behind compute_view(), and makes the BnbMetric trait take &SelectionView. Adapted to the current trait API (target passed as a parameter; metric owns its change decision via drain(); Changeless<M>). The cache proptest validates against an independent O(n) oracle (the original input_weight walk), not just a from-scratch cache rebuild.
  2. bench: extend pool sizes to wallet (1k) and exchange (10M) scale — also splits the BnB bench into run_bnb_lowest_fee (sizes that find solutions) and run_bnb_lowest_fee_exhaust_cap (sizes where best-first search exhausts the 100k-round cap), with startup assertions pinning which path each group measures.
  3. perf: skip unselected scan in BnbIter via per-branch cursor — each Branch carries a cursor into candidate_order, so insert_new_branches jumps straight to the next undecided candidate. The exclusion branch's duplicate-dedup run explicitly skips already-decided candidates (matching the old unselected() semantics) — covered by a regression test with a pre-selected duplicate.

API changes (breaking)

  • New public SelectionView<'_>; SelectionCache is internal.
  • BnbMetric::{score,bound,drain} now take &SelectionView<'_> instead of &CoinSelector<'_>.
  • The read methods (selected_value, input_weight, weight, excess + variants, is_target_met(+_with_drain), fee, implied_fee, implied_feerate, missing, drain_value, drain, effective_value, waste) move from CoinSelector to SelectionView; reach them via cs.compute_view(). is_selection_possible is renamed SelectionView::is_target_reachable.

Review

Beyond CI, this branch went through a 60-agent adversarial review (10 dimensions incl. differential execution of branch vs master vs brute-force exhaustive search over seeded random scenarios; every finding independently verified by 3 reviewers). All confirmed findings are fixed in the commits above. Two pre-existing issues (present on master, not regressions) surfaced and are not addressed here:

  • LowestFee::bound's assert!(ideal_fee >= 0.0) can panic on valid inputs due to f32 rounding.
  • BnB's exclusion dedup keys on (value, weight) only, treating candidates differing in is_segwit/input_count as equivalent, which can yield slightly suboptimal results vs exhaustive search.

🤖 Generated with Claude Code

@evanlinjin evanlinjin force-pushed the perf/delta-aware-metric branch 3 times, most recently from 89b4255 to ec6a713 Compare July 2, 2026 17:21
evanlinjin and others added 3 commits July 3, 2026 10:19
Squash of the two-commit perf series (cd1017a "delta-aware BnbMetric via
SelectionView/SelectionCache" + 536a03a "replace duplicated CoinSelector
read methods with compute_view()"), rebased onto the new BnbMetric API
(target passed as a parameter; metric decides its own change output).

Squashed because the first commit's intermediate state (src/selection_cache.rs,
SelectionView-by-value) is fully superseded by the second (src/selection_view.rs,
Cow-backed cache, &SelectionView, compute_view()); replaying both would mean
resolving the same BnbMetric merge against master twice.

The flamegraph on fix/better-memory showed ~65% of run_bnb_lowest_fee time
was in cs.selected_value() (47.5%) and cs.input_weight() (17.3%) -- both
O(|selected|) walks recomputed many times per branch via
excess/is_target_met/drain.

This makes the metric evaluator delta-aware. BnB maintains a SelectionCache
(running aggregates over the selection: value_sum, weight_sum, input_count,
segwit_count, candidate_count) per Branch; inclusion expansions call
cache.add(c), which is O(1). The metric trait now takes a
`&SelectionView<'_>` -- a handle over (&CoinSelector, Cow<SelectionCache>)
-- with O(1) versions of every read method (selected_value, input_weight,
excess, is_target_met, drain, ...).

CoinSelector's ~15 duplicated O(|selected|) read methods collapse into a
single `cs.compute_view()` entry point (fresh cache built once on demand);
the BnB hot path borrows the per-branch cache instead (zero clone).

Adapted to the master-side API changes:
- score/bound/drain take `target: Target` as a parameter (metrics no longer
  store target); BnbIter threads its `target` through to the metric.
- LowestFee owns the change decision (dust_relay_feerate + drain_weights,
  no change_policy) and implements `drain`; its delta-aware not-target-met
  bound loop is preserved.
- Changeless<M> wraps an inner metric, using &SelectionView + target.

All lib, integration and doc tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
The original benches capped at 4k candidates for `clone` and 200 for
`run_bnb_lowest_fee`. Real callers span a much wider range -- a typical
wallet has ~1k UTXOs, a large exchange ~10M.

For the O(n)-ish operations (`new`, `clone`, `compute_view`) extend the
parameter list to 64 / 1k / 16k / 256k / 1M / 10M. These all scale
roughly linearly:

  clone/64        51 ns
  clone/1024      52 ns
  clone/16384    260 ns
  clone/262144   2.0 us
  clone/1048576  6.3 us
  clone/10000000  98 us

At 10M UTXOs the `Candidate` slice itself is ~320MB and the selector's
`candidate_order` Vec is ~80MB -- commented as a heads-up for memory-
constrained hosts.

Add new groups:

- `new`: cost of `CoinSelector::new(candidates)` -- allocations grow with
  pool size.
- `compute_view`: cost of building a SelectionView. Scales with
  |selected| rather than |pool|; benched against a fixed sparse
  selection of ~100 candidates regardless of pool size, matching how
  wallets actually use selection.

The BnB bench splits into two explicitly-named groups, because at
n >= 200 best-first exploration does not complete any target-meeting
selection within the round cap (run_bnb returns NoBnbSolution after
exactly 100k rounds):

- `run_bnb_lowest_fee` (n = 20/50/100): end-to-end solution finding.
- `run_bnb_lowest_fee_exhaust_cap` (n = 200/500/1000): exactly
  MAX_ROUNDS rounds of frontier expansion (bound() + branch cloning) --
  the hot path the delta-aware cache optimizes. Per-round cost grows
  roughly linearly:

    run_bnb_lowest_fee_exhaust_cap/200   193 ms
    run_bnb_lowest_fee_exhaust_cap/500   211 ms
    run_bnb_lowest_fee_exhaust_cap/1000  377 ms

Each group asserts at startup that run_bnb's solution-found outcome
matches what the group claims to measure, so a size silently flipping
between the two paths (metric change, bound tightening) fails loudly
instead of corrupting cross-version comparisons.

10M-scale BnB is intentionally not benchmarked: it's impractical at any
finite round budget, and real callers pre-filter / pre-group at that
scale.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Co-Authored-By: Claude Fable 5 <[email protected]>
The flamegraph after the delta-aware refactor showed ~32% of run_bnb
time spent walking candidate_order and checking
Bitset::contains(selected) || Bitset::contains(banned) per element,
inlined into insert_new_branches's `cs.unselected().next()`. As BnB
descends and more candidates get selected/banned, each .next() call
scans further before finding the next viable candidate.

But BnB never re-considers a position: each branch's exploration only
moves forward in candidate_order. Inclusion advances by 1; exclusion
advances past every consecutive same-(value, weight) candidate. So we
can store a per-Branch cursor and avoid the scan entirely.

Add `Branch::cursor: usize` (the position the branch will expand on
next). The init branch starts at 0; insert_new_branches advances past
any pre-selected/pre-banned positions on demand, then expands at the
located cursor and hands children their new cursors directly.

One subtlety: the exclusion branch's same-(value, weight) dedup run now
walks raw candidate_order positions, where the old `unselected()` scan
skipped already-decided candidates implicitly. The run must do the same
explicitly -- skip pre-selected/pre-banned positions (advancing the
cursor past them) rather than banning them or letting them end the run.
Otherwise a caller-pre-selected candidate that duplicates an excluded
one ends up simultaneously selected and banned in the selector that
run_bnb hands back, and a pre-decided candidate inside a duplicate run
fragments the equivalence class into redundant branches. Covered by
`bnb_exclusion_dedup_skips_decided_candidates`.

Bench (run_bnb_lowest_fee, n = pool size):

  n=20     166 us -> 147 us  (11%)
  n=50    5.3 ms -> 4.5 ms   (16%)
  n=100   12.6 ms -> 11.5 ms  (9%)
  n=500   200 ms -> 171 ms   (14%)
  n=1000  365 ms -> 248 ms   (32%)

(n >= 200 rows are the exhaust-cap group: fixed 100k rounds of frontier
expansion, so they measure pure per-round cost.)

Largest win at large n where the unselected scan was burning the most
time. Flamegraph confirms the unselected-scan hot spot is gone; new top
is LowestFee::bound itself (the metric's float math + lookahead).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Co-Authored-By: Claude Fable 5 <[email protected]>
@evanlinjin evanlinjin force-pushed the perf/delta-aware-metric branch from ec6a713 to 07e2a36 Compare July 3, 2026 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant