Fix O(n^2) scaling#11
Merged
Merged
Conversation
Fix O(n^2) scaling in clonal selection inner loop clonal_selection_iteration_cpp precomputed the full n x m affinity matrix once per iteration, then recomputed an entire n-length affinity column on every accepted mutation. Because accepted mutations scale with the number of cells, that column recompute nested inside the per-point loop made each iteration O(n^2) in n, the dominant cost at CyTOF and scRNA scale. Recompute each point's 1 x m affinity row against the live antibody set instead, and drop the column patching. This is the same O(n*m*d) total work per iteration but linear in n, and it is numerically identical to the previous algorithm: a point always sees every mutation made by earlier points in the same pass, which is exactly what the column patch guaranteed. Results are unchanged to 4 decimals (clustering silhouette and classification accuracy match the prior build at n = 500/1000/2000) and the full test suite passes. Timing at n=2000 drops from ~101s to ~1s, and scaling is now linear: n=16000 clustering runs in ~8s versus an extrapolated ~1.8h before.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix O(n^2) scaling in clonal selection inner loop
clonal_selection_iteration_cppprecomputed the full n x m affinity matrix once per iteration, then recomputed an entire n-length affinity column on every accepted mutation. Because accepted mutations scale with the number of cells, that column recompute nested inside the per-point loop made each iteration O(n^2) in n, the dominant cost at CyTOF and scRNA scale.Recompute each point's 1 x m affinity row against the live antibody set instead, and drop the column patching. This is the same O(nmd) total work per iteration but linear in n, and it is numerically identical to the previous algorithm: a point always sees every mutation made by earlier points in the same pass, which is exactly what the column patch guaranteed.
Results are unchanged to 4 decimals (clustering silhouette and classification accuracy match the prior build at n = 500/1000/2000) and the full test suite passes. Timing at n=2000 drops from ~101s to ~1s, and scaling is now linear: n=16000 clustering runs in ~8s versus an extrapolated ~1.8h before.