Fix O(n^2) scaling by ncborcherding · Pull Request #11 · BorchLab/bHIVE

ncborcherding · 2026-06-17T16:47:55Z

Fix O(n^2) scaling in clonal selection inner loop

clonal_selection_iteration_cpp precomputed the full n x m affinity matrix once per iteration, then recomputed an entire n-length affinity column on every accepted mutation. Because accepted mutations scale with the number of cells, that column recompute nested inside the per-point loop made each iteration O(n^2) in n, the dominant cost at CyTOF and scRNA scale.

Recompute each point's 1 x m affinity row against the live antibody set instead, and drop the column patching. This is the same O(nmd) total work per iteration but linear in n, and it is numerically identical to the previous algorithm: a point always sees every mutation made by earlier points in the same pass, which is exactly what the column patch guaranteed.

Results are unchanged to 4 decimals (clustering silhouette and classification accuracy match the prior build at n = 500/1000/2000) and the full test suite passes. Timing at n=2000 drops from ~101s to ~1s, and scaling is now linear: n=16000 clustering runs in ~8s versus an extrapolated ~1.8h before.

Fix O(n^2) scaling in clonal selection inner loop clonal_selection_iteration_cpp precomputed the full n x m affinity matrix once per iteration, then recomputed an entire n-length affinity column on every accepted mutation. Because accepted mutations scale with the number of cells, that column recompute nested inside the per-point loop made each iteration O(n^2) in n, the dominant cost at CyTOF and scRNA scale. Recompute each point's 1 x m affinity row against the live antibody set instead, and drop the column patching. This is the same O(n*m*d) total work per iteration but linear in n, and it is numerically identical to the previous algorithm: a point always sees every mutation made by earlier points in the same pass, which is exactly what the column patch guaranteed. Results are unchanged to 4 decimals (clustering silhouette and classification accuracy match the prior build at n = 500/1000/2000) and the full test suite passes. Timing at n=2000 drops from ~101s to ~1s, and scaling is now linear: n=16000 clustering runs in ~8s versus an extrapolated ~1.8h before.

ncborcherding merged commit fae4ae7 into main Jun 17, 2026
3 checks passed

ncborcherding deleted the inner-loop-selection branch June 17, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix O(n^2) scaling#11

Fix O(n^2) scaling#11
ncborcherding merged 1 commit into
mainfrom
inner-loop-selection

ncborcherding commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ncborcherding commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant