Skip to content

Add metric guard#12

Merged
ncborcherding merged 2 commits into
mainfrom
metric-guard
Jun 17, 2026
Merged

Add metric guard#12
ncborcherding merged 2 commits into
mainfrom
metric-guard

Conversation

@ncborcherding

@ncborcherding ncborcherding commented Jun 17, 2026

Copy link
Copy Markdown
Member

Summary

Fixes two coupled failure modes in AINet clustering: a silent collapse to a single cluster when affinity and distance use different geometries, and prototypes that drift far off the data manifold and stop corresponding to the
data.

Problem

AINet matures antibodies to maximize affinityFunc but assigns clusters with distFunc. With cosine affinity (which scores angle only and never constrainsmagnitude) paired with a Euclidean distance, antibodies random-walk to ~18x the
data scale. On WDBC this collapsed every point onto one antibody (k=1, ARI 0, antibody L2 norm ~88 vs data ~5). The grid still ranked these configs best by silhouette, so selection rewarded the degenerate runs and the interpretability
figure showed prototypes nowhere near the data.

Changes

  • Metric guard (initialize): a cosine vs non-cosine mismatch between affinityFunc and distFunc warns and aligns the distance to the affinity's natural metric (cosine to cosine, otherwise Euclidean-family). Prevents the collapse.
  • Consolidation M-step (consolidate, consolidationSteps, clustering only, default on): seeds an assignment by affinity, then runs Euclidean Lloyd refinement so prototypes become true data centroids. The repertoire keeps the matured antibodies and metadata, so memory and isotype state are untouched. Only result$antibodies reports the consolidated prototypes.

Validation

  • WDBC prototypes now sit on the data (antibody norm ~5 vs data ~5, nearest-data distance ~2 vs data NN ~2.5).
  • Gaussian affinity remains the stronger clustering choice (ARI 0.33 vs 0.14 for cosine under the same setup); the guard makes cosine safe rather than best.
  • Full testthat suite passes.

Notes

  • Default behavior changes (consolidate=TRUE, guard may override distFunc),
    so downstream benchmarks must be regenerated.
  • Set consolidate=FALSE to recover the raw affinity-assignment behavior.

AINet matured antibodies by maximizing affinityFunc but read clusters out
with distFunc. When the two used different geometries the model optimized
one space and assigned in another. The worst case paired cosine affinity,
which scores only angle and leaves antibody magnitude a free random walk,
with a Euclidean distance. Antibodies drifted far off the data manifold
(L2 norm ~88 against a data norm of ~5), every point collapsed onto one
antibody, and clustering returned a single cluster with ARI 0.

Two changes address this:
- Metric guard in initialize(). A cosine vs non-cosine mismatch between
  affinityFunc and distFunc now warns and aligns the distance to the
  affinity's natural metric, so clustering no longer collapses silently.
- Consolidation M-step for clustering (consolidate=TRUE,
  consolidationSteps=10). After affinity maturation, seed the assignment by
  affinity, which stays robust to off-manifold antibodies, then run Euclidean
  Lloyd refinement so prototypes become true data centroids. The repertoire
  keeps the matured antibodies and their metadata, so memory archiving is
  unchanged. Only result$antibodies reports the consolidated prototypes.
@ncborcherding ncborcherding merged commit 536bc34 into main Jun 17, 2026
5 checks passed
@ncborcherding ncborcherding deleted the metric-guard branch June 17, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant