Add metric guard#12
Merged
Merged
Conversation
AINet matured antibodies by maximizing affinityFunc but read clusters out with distFunc. When the two used different geometries the model optimized one space and assigned in another. The worst case paired cosine affinity, which scores only angle and leaves antibody magnitude a free random walk, with a Euclidean distance. Antibodies drifted far off the data manifold (L2 norm ~88 against a data norm of ~5), every point collapsed onto one antibody, and clustering returned a single cluster with ARI 0. Two changes address this: - Metric guard in initialize(). A cosine vs non-cosine mismatch between affinityFunc and distFunc now warns and aligns the distance to the affinity's natural metric, so clustering no longer collapses silently. - Consolidation M-step for clustering (consolidate=TRUE, consolidationSteps=10). After affinity maturation, seed the assignment by affinity, which stays robust to off-manifold antibodies, then run Euclidean Lloyd refinement so prototypes become true data centroids. The repertoire keeps the matured antibodies and their metadata, so memory archiving is unchanged. Only result$antibodies reports the consolidated prototypes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two coupled failure modes in AINet clustering: a silent collapse to a single cluster when affinity and distance use different geometries, and prototypes that drift far off the data manifold and stop corresponding to the
data.
Problem
AINet matures antibodies to maximize
affinityFuncbut assigns clusters withdistFunc. With cosine affinity (which scores angle only and never constrainsmagnitude) paired with a Euclidean distance, antibodies random-walk to ~18x thedata scale. On WDBC this collapsed every point onto one antibody (k=1, ARI 0, antibody L2 norm ~88 vs data ~5). The grid still ranked these configs best by silhouette, so selection rewarded the degenerate runs and the interpretability
figure showed prototypes nowhere near the data.
Changes
initialize): a cosine vs non-cosine mismatch betweenaffinityFuncanddistFuncwarns and aligns the distance to the affinity's natural metric (cosine to cosine, otherwise Euclidean-family). Prevents the collapse.consolidate,consolidationSteps, clustering only, default on): seeds an assignment by affinity, then runs Euclidean Lloyd refinement so prototypes become true data centroids. The repertoire keeps the matured antibodies and metadata, so memory and isotype state are untouched. Onlyresult$antibodiesreports the consolidated prototypes.Validation
testthatsuite passes.Notes
consolidate=TRUE, guard may overridedistFunc),so downstream benchmarks must be regenerated.
consolidate=FALSEto recover the raw affinity-assignment behavior.