Skip to content

fix: audit-remediation — address all 15 code review findings#25

Merged
OldCrow merged 3 commits into
mainfrom
fix/audit-remediation
Jun 15, 2026
Merged

fix: audit-remediation — address all 15 code review findings#25
OldCrow merged 3 commits into
mainfrom
fix/audit-remediation

Conversation

@OldCrow

@OldCrow OldCrow commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Summary

Implements all 15 findings from the June 2026 independent code review of the v3.8.0 → v4.0.0 diff.


Findings addressed

Critical

  • F1 — Data race in updateCache(): Added mutable std::mutex cache_mutex_ with double-checked locking to FullCovarianceGaussianDistribution and DiagonalGaussianDistribution. Explicit copy/move operations defined (std::mutex is non-copyable/non-movable).
  • F2 — Null dereference in BasicHmm<ObservationVectorView>::getDistribution(): Both overloads now throw std::runtime_error with a descriptive message instead of unconditionally dereferencing null emission slots.

High

  • F3 — Positional JSON parsing: from_json and from_json_mv now validate each key name before consuming its value. Reordered JSON is rejected with a clear error.
  • F4 — VonMises CDF at −π: Changed x <= −π to x < −π in wrap_angle so getCumulativeProbability(−π) returns ~0.0 rather than ~1.0.
  • F5 — Regularisation accumulation in setCovariance: cov_ now stores the unregularised matrix; reg_*I is applied only to a scratch copy for Cholesky factorisation. getCovariance() round-trips are idempotent.
  • F6 — VonMises weighted fit() sumW inflation: sumW is now accumulated inside the finite-data filter loop, not over all weights.
  • F7 — NaN pseudo_count passes guard: Changed if (c < 0.0) to if (!(c >= 0.0)) in both the constructor and setPseudoCount.

Medium

  • F8 — k-means M < K produces duplicate centroids: seed_kmeanspp now throws std::invalid_argument when pts.size() < K.
  • F9 — Ragged data in fit(): Both fit() overloads in FullCovarianceGaussianDistribution and DiagonalGaussianDistribution now check per-observation dimensionality.
  • F10 — convergenceWindow <= 1 premature convergence: BasicViterbiTrainer constructors and setConfig throw when convergenceWindow < 2.

Low

  • F11 — StudentTDistribution::getCumulativeProbability(NaN): Returns 0.0, consistent with getProbability(NaN).
  • F12 — dimensions not cross-validated in from_json_mv: The manifest dimensions field is now cross-checked against each distribution's getDimension().
  • F13 — Unbounded dim in distribution from_json: FullCovarianceGaussianDistribution::from_json and DiagonalGaussianDistribution::from_json now bound dim to [1, 1024].
  • F14 — MAP discrete smoothing unnormalised: apply_discrete_smoothing re-normalises after the smoothing pass.
  • F15 — Dead code in lloyd_update: Empty clusters now retain their previous centroid instead of collapsing to the zero vector.

Tests

  • Updated 2 existing tests that asserted old broken behaviour (StudentT CDF NaN, FullCovariance default log_det).
  • Added 24 new regression tests across 7 test files covering all previously-untested fix paths.

All 47 tests pass (0 failures).


Conversation: https://app.warp.dev/conversation/411a43b6-94f9-42bf-bc37-3987216f6798

Co-Authored-By: Oz [email protected]

OldCrow and others added 3 commits June 15, 2026 01:51
Findings 1–15 from the June 2026 independent code review.

Thread safety (F1)
- FullCovarianceGaussianDistribution, DiagonalGaussianDistribution:
  add mutable std::mutex cache_mutex_ with double-checked locking in
  updateCache() to prevent concurrent writers on chol_L_/log_det_/
  log_var_/inv_var_. Explicit copy/move operations defined since
  std::mutex is non-copyable.

Null safety (F2)
- BasicHmm<Obs>::getDistribution() now throws std::runtime_error when
  the emission slot is null, replacing a silent null dereference.

JSON correctness (F3, F12, F13)
- from_json / from_json_mv: validate each key name before consuming
  its value; reordered JSON now throws instead of silently scrambling
  parameters.
- from_json_mv: cross-validate per-distribution dim against the
  manifest dimensions field.
- FullCovarianceGaussianDistribution::from_json,
  DiagonalGaussianDistribution::from_json: bound dim to [1, 1024].

Statistical correctness (F4, F5, F6)
- VonMisesDistribution::wrap_angle: change x <= -pi to x < -pi.
- FullCovarianceGaussianDistribution: store unregularised cov_;
  apply reg*I only to a scratch copy for factorisation.
- VonMisesDistribution::fit(weighted): accumulate sumW inside the
  finite-data filter loop.

Input validation (F7, F8, F9, F10)
- MapBaumWelchTrainer: use !(c >= 0.0) guard to reject NaN.
- kmeans_init / seed_kmeanspp: throw when pts.size() < K.
- MV distribution fit(): throw on ragged observations.
- BasicViterbiTrainer: throw when convergenceWindow < 2.

API consistency (F11, F14, F15)
- StudentTDistribution::getCumulativeProbability(NaN): return 0.0.
- MapBaumWelchTrainer::apply_discrete_smoothing: re-normalise.
- kmeans_init::lloyd_update: retain previous centroid for empty clusters.

Tests: update two tests asserting old broken behaviour; add 24 new
regression tests covering all previously-untested fix paths.

Co-Authored-By: Oz <[email protected]>
EXPECT_THROW/EXPECT_NO_THROW discard the return value of
getDistribution(), which is [[nodiscard]]. Cast to (void) inside
each macro call to suppress -Wunused-result on CI.

Co-Authored-By: Oz <[email protected]>
test_hmm_core.cpp: cast getDistribution() return to (void) inside
EXPECT_THROW/EXPECT_NO_THROW to suppress -Wunused-result on the
[[nodiscard]] attribute (GCC and Clang, both Linux runners).

test_von_mises_distribution.cpp: add braces around EXPECT_NEAR inside
bare if (p > 0.0) to suppress -Wdangling-else. EXPECT_NEAR expands
to an if/else chain, making the else ambiguous without braces.

Co-Authored-By: Oz <[email protected]>
@OldCrow OldCrow merged commit 800f02e into main Jun 15, 2026
6 checks passed
@OldCrow OldCrow deleted the fix/audit-remediation branch June 15, 2026 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant