perf!: delta-aware BnbMetric via SelectionView/compute_view#53
Draft
evanlinjin wants to merge 3 commits into
Draft
perf!: delta-aware BnbMetric via SelectionView/compute_view#53evanlinjin wants to merge 3 commits into
evanlinjin wants to merge 3 commits into
Conversation
89b4255 to
ec6a713
Compare
Squash of the two-commit perf series (cd1017a "delta-aware BnbMetric via SelectionView/SelectionCache" + 536a03a "replace duplicated CoinSelector read methods with compute_view()"), rebased onto the new BnbMetric API (target passed as a parameter; metric decides its own change output). Squashed because the first commit's intermediate state (src/selection_cache.rs, SelectionView-by-value) is fully superseded by the second (src/selection_view.rs, Cow-backed cache, &SelectionView, compute_view()); replaying both would mean resolving the same BnbMetric merge against master twice. The flamegraph on fix/better-memory showed ~65% of run_bnb_lowest_fee time was in cs.selected_value() (47.5%) and cs.input_weight() (17.3%) -- both O(|selected|) walks recomputed many times per branch via excess/is_target_met/drain. This makes the metric evaluator delta-aware. BnB maintains a SelectionCache (running aggregates over the selection: value_sum, weight_sum, input_count, segwit_count, candidate_count) per Branch; inclusion expansions call cache.add(c), which is O(1). The metric trait now takes a `&SelectionView<'_>` -- a handle over (&CoinSelector, Cow<SelectionCache>) -- with O(1) versions of every read method (selected_value, input_weight, excess, is_target_met, drain, ...). CoinSelector's ~15 duplicated O(|selected|) read methods collapse into a single `cs.compute_view()` entry point (fresh cache built once on demand); the BnB hot path borrows the per-branch cache instead (zero clone). Adapted to the master-side API changes: - score/bound/drain take `target: Target` as a parameter (metrics no longer store target); BnbIter threads its `target` through to the metric. - LowestFee owns the change decision (dust_relay_feerate + drain_weights, no change_policy) and implements `drain`; its delta-aware not-target-met bound loop is preserved. - Changeless<M> wraps an inner metric, using &SelectionView + target. All lib, integration and doc tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
The original benches capped at 4k candidates for `clone` and 200 for
`run_bnb_lowest_fee`. Real callers span a much wider range -- a typical
wallet has ~1k UTXOs, a large exchange ~10M.
For the O(n)-ish operations (`new`, `clone`, `compute_view`) extend the
parameter list to 64 / 1k / 16k / 256k / 1M / 10M. These all scale
roughly linearly:
clone/64 51 ns
clone/1024 52 ns
clone/16384 260 ns
clone/262144 2.0 us
clone/1048576 6.3 us
clone/10000000 98 us
At 10M UTXOs the `Candidate` slice itself is ~320MB and the selector's
`candidate_order` Vec is ~80MB -- commented as a heads-up for memory-
constrained hosts.
Add new groups:
- `new`: cost of `CoinSelector::new(candidates)` -- allocations grow with
pool size.
- `compute_view`: cost of building a SelectionView. Scales with
|selected| rather than |pool|; benched against a fixed sparse
selection of ~100 candidates regardless of pool size, matching how
wallets actually use selection.
The BnB bench splits into two explicitly-named groups, because at
n >= 200 best-first exploration does not complete any target-meeting
selection within the round cap (run_bnb returns NoBnbSolution after
exactly 100k rounds):
- `run_bnb_lowest_fee` (n = 20/50/100): end-to-end solution finding.
- `run_bnb_lowest_fee_exhaust_cap` (n = 200/500/1000): exactly
MAX_ROUNDS rounds of frontier expansion (bound() + branch cloning) --
the hot path the delta-aware cache optimizes. Per-round cost grows
roughly linearly:
run_bnb_lowest_fee_exhaust_cap/200 193 ms
run_bnb_lowest_fee_exhaust_cap/500 211 ms
run_bnb_lowest_fee_exhaust_cap/1000 377 ms
Each group asserts at startup that run_bnb's solution-found outcome
matches what the group claims to measure, so a size silently flipping
between the two paths (metric change, bound tightening) fails loudly
instead of corrupting cross-version comparisons.
10M-scale BnB is intentionally not benchmarked: it's impractical at any
finite round budget, and real callers pre-filter / pre-group at that
scale.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Co-Authored-By: Claude Fable 5 <[email protected]>
The flamegraph after the delta-aware refactor showed ~32% of run_bnb time spent walking candidate_order and checking Bitset::contains(selected) || Bitset::contains(banned) per element, inlined into insert_new_branches's `cs.unselected().next()`. As BnB descends and more candidates get selected/banned, each .next() call scans further before finding the next viable candidate. But BnB never re-considers a position: each branch's exploration only moves forward in candidate_order. Inclusion advances by 1; exclusion advances past every consecutive same-(value, weight) candidate. So we can store a per-Branch cursor and avoid the scan entirely. Add `Branch::cursor: usize` (the position the branch will expand on next). The init branch starts at 0; insert_new_branches advances past any pre-selected/pre-banned positions on demand, then expands at the located cursor and hands children their new cursors directly. One subtlety: the exclusion branch's same-(value, weight) dedup run now walks raw candidate_order positions, where the old `unselected()` scan skipped already-decided candidates implicitly. The run must do the same explicitly -- skip pre-selected/pre-banned positions (advancing the cursor past them) rather than banning them or letting them end the run. Otherwise a caller-pre-selected candidate that duplicates an excluded one ends up simultaneously selected and banned in the selector that run_bnb hands back, and a pre-decided candidate inside a duplicate run fragments the equivalence class into redundant branches. Covered by `bnb_exclusion_dedup_skips_decided_candidates`. Bench (run_bnb_lowest_fee, n = pool size): n=20 166 us -> 147 us (11%) n=50 5.3 ms -> 4.5 ms (16%) n=100 12.6 ms -> 11.5 ms (9%) n=500 200 ms -> 171 ms (14%) n=1000 365 ms -> 248 ms (32%) (n >= 200 rows are the exhaust-cap group: fixed 100k rounds of frontier expansion, so they measure pure per-round cost.) Largest win at large n where the unselected scan was burning the most time. Flamegraph confirms the unselected-scan hot spot is gone; new top is LowestFee::bound itself (the metric's float math + lookahead). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Co-Authored-By: Claude Fable 5 <[email protected]>
ec6a713 to
07e2a36
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes the branch-and-bound metric evaluator delta-aware, eliminating the dominant cost in
run_bnb. A flamegraph showed ~65% ofrun_bnb_lowest_feetime was spent incs.selected_value()(47.5%) andcs.input_weight()(17.3%) — bothO(|selected|)walks recomputed many times per branch viaexcess/is_target_met/drain.Now BnB maintains a per-branch
SelectionCache(running aggregates:value_sum,weight_sum,input_count,segwit_count,candidate_count). Inclusion expansions callcache.add(c)—O(1). Metrics receive a&SelectionView<'_>(a handle over&CoinSelector+Cow<SelectionCache>) whose read methods (selected_value,input_weight,excess,is_target_met,drain, …) are allO(1).Outside the hot path,
cs.compute_view()builds a fresh cache once on demand, collapsing the ~15 duplicatedO(|selected|)read methods that previously lived onCoinSelectorinto a single entry point.Benchmarks
Criterion median; delta-aware vs the bitset baseline, vs the pre-bitset master:
The gain is largest at large
n, whereselected_value/input_weightrecomputation dominated. (The n=200 row measures the round-capped exploration path — see the bench commit for the explicit solution-finding vs cap-exhaustion group split.)Commits
Each commit independently passes the full CI gate (fmt, clippy, doc
-D warnings,--releasetests,--no-default-featuresbuild), so the branch is bisectable.perf!: delta-aware BnbMetric via SelectionView/compute_view— the core change. IntroducesSelectionCache/SelectionView, moves the read methods offCoinSelectorbehindcompute_view(), and makes theBnbMetrictrait take&SelectionView. Adapted to the current trait API (targetpassed as a parameter; metric owns its change decision viadrain();Changeless<M>). The cache proptest validates against an independent O(n) oracle (the originalinput_weightwalk), not just a from-scratch cache rebuild.bench: extend pool sizes to wallet (1k) and exchange (10M) scale— also splits the BnB bench intorun_bnb_lowest_fee(sizes that find solutions) andrun_bnb_lowest_fee_exhaust_cap(sizes where best-first search exhausts the 100k-round cap), with startup assertions pinning which path each group measures.perf: skip unselected scan in BnbIter via per-branch cursor— eachBranchcarries a cursor intocandidate_order, soinsert_new_branchesjumps straight to the next undecided candidate. The exclusion branch's duplicate-dedup run explicitly skips already-decided candidates (matching the oldunselected()semantics) — covered by a regression test with a pre-selected duplicate.API changes (breaking)
SelectionView<'_>;SelectionCacheis internal.BnbMetric::{score,bound,drain}now take&SelectionView<'_>instead of&CoinSelector<'_>.selected_value,input_weight,weight,excess+ variants,is_target_met(+_with_drain),fee,implied_fee,implied_feerate,missing,drain_value,drain,effective_value,waste) move fromCoinSelectortoSelectionView; reach them viacs.compute_view().is_selection_possibleis renamedSelectionView::is_target_reachable.Review
Beyond CI, this branch went through a 60-agent adversarial review (10 dimensions incl. differential execution of branch vs master vs brute-force exhaustive search over seeded random scenarios; every finding independently verified by 3 reviewers). All confirmed findings are fixed in the commits above. Two pre-existing issues (present on master, not regressions) surfaced and are not addressed here:
LowestFee::bound'sassert!(ideal_fee >= 0.0)can panic on valid inputs due to f32 rounding.(value, weight)only, treating candidates differing inis_segwit/input_countas equivalent, which can yield slightly suboptimal results vs exhaustive search.🤖 Generated with Claude Code