feat: v3.8.0 — sample(), StudentT CDF fix, fitting validation tests#23
Merged
Conversation
…cy tests Fixes #16: StudentTDistribution::getCumulativeProbability() used a bogus rational approximation for moderate nu that produced errors up to 0.06. Changes: - Promote BetaDistribution::incompleteBeta() to DistributionBase::incompleteBeta() (protected static), alongside gammap. Continued-fraction algorithm with symmetry swap; kTol=1e-12 convergence. - Remove private BetaDistribution::incompleteBeta(); getCumulativeProbability() now calls the base-class static directly. - Rewrite StudentTDistribution::getCumulativeProbability() with the exact formula for all nu > 0: P(T <= t; nu) = 1 - 0.5 * I_{nu/(nu+t^2)}(nu/2, 1/2) for t > 0 No special cases for moderate vs large nu; the incomplete beta is exact. New accuracy tests (verify the math primitives directly): - StudentTDistributionTest::CDFAccuracy: Cauchy exact (arctan), nu=2 closed-form, nu=5/10 approximate, symmetry, boundaries - GammaDistributionTest::CDFAccuracy: exercises gammap (and gcf/gser) via Poisson-series exact CDFs: k=1 (Exp), k=2, k=3, scale parameter - ChiSquaredDistributionTest::CDFAccuracy: second gammap code path; df=2/4 exact, df=1 via std::erf (half-integer case), boundaries - BetaDistributionTest::CDFAccuracy: exercises incompleteBeta via polynomial exact CDFs: Beta(1,1), (2,1), (1,2), (2,2), (3,2), symmetry, boundaries Note: errorf_inv in DistributionBase has no call sites and is dead code. All 41 tests pass. Co-Authored-By: Oz <[email protected]>
Closes #22. Interface change: - EmissionDistribution: add pure virtual [[nodiscard]] virtual double sample(std::mt19937_64& rng) const = 0 includes <random> in the header; all existing callers compile unchanged since they do not call sample() Implementations (16 distributions): GaussianDistribution: std::normal_distribution DiscreteDistribution: std::discrete_distribution (returns integer as double) ExponentialDistribution: std::exponential_distribution GammaDistribution: std::gamma_distribution PoissonDistribution: std::poisson_distribution (integer -> double) BinomialDistribution: std::binomial_distribution (integer -> double) NegativeBinomialDistribution: Gamma-Poisson mixture (supports real-valued r) ChiSquaredDistribution: std::chi_squared_distribution StudentTDistribution: std::student_t_distribution + location/scale shift LogNormalDistribution: std::lognormal_distribution WeibullDistribution: std::weibull_distribution BetaDistribution: Gamma-ratio method (X/(X+Y), X,Y~Gamma) UniformDistribution: std::uniform_real_distribution ParetoDistribution: inverse-CDF: xm * U^(-1/k) RayleighDistribution: inverse-CDF: sigma * sqrt(-2 * log(U)) VonMisesDistribution: Best (1979) rejection sampler Tests (3 new, added to CommonDistributionTest fixture): SampleProducesFiniteValues: 100 draws per distribution, all must be finite SampleDiscreteReturnsIntegers: discrete distributions return non-negative integers SampleMeanConverges: 2000 samples, Gaussian/Exp/Poisson/Uniform/Beta/ VonMises means within 4-sigma of analytic values All 41 tests pass. Co-Authored-By: Oz <[email protected]>
Closes #21. New file: tests/distributions/test_distribution_fitting.cpp Tests 5 distributions (Gaussian, Exponential, Poisson, Gamma, Discrete) with 17 test cases across 5 suites: Approach: - Unweighted tests use data whose exact sample statistics are known from simple arithmetic so tolerances are tight (1e-9 for closed-form MLE, ±0.15 for Newton-Raphson Gamma). - Weighted-uniform tests verify that fit(data, uniform_weights) == fit(data). - Weighted-concentrated tests verify that non-uniform weights genuinely steer fitted parameters. - Edge-case tests verify single-point data and near-zero weights do not crash. Tolerance rationale documented in comments per distribution. 42/42 tests pass. Co-Authored-By: Oz <[email protected]>
- CMakeLists.txt: VERSION 3.7.0 -> 3.8.0 - CHANGELOG.md: add [3.8.0] entry documenting #16, #21, #22 - README.md: update version/test badges; add v4.0.0 roadmap notice 42/42 tests pass. Co-Authored-By: Oz <[email protected]>
All 40 files reformatted by clang-format 19.1.7 (--style=file). No logic changes. Pre-commit hooks now pass on all platforms. Co-Authored-By: Oz <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v3.8.0 — Final v3.x release
Closes out the three open issues on the v3.x line before v4.0.0 work begins.
Changes
Fixes #16 —
StudentTDistribution::getCumulativeProbability()wrong for 1 < ν < 30P(T ≤ t; ν) = 1 − ½·I_{ν/(ν+t²)}(ν/2, ½)BetaDistribution::incompleteBetatoDistributionBase::incompleteBeta(protected static), alongsidegammapgammap(Gamma/ChiSquared CDFs) andincompleteBeta(polynomial Beta CDFs)Closes #22 —
sample(std::mt19937_64& rng)pure virtual onEmissionDistribution, implemented for all 16 distributions<random>adapters for most distributionsNegativeBinomial: Gamma-Poisson mixture (supports real-valued r)Beta: Gamma-ratio methodPareto,Rayleigh: inverse-CDFVonMises: Best (1979) wrapped-Cauchy rejection sampler (exact, ~1.06–1.32 iterations expected)Closes #21 — Distribution fitting validation tests
tests/distributions/test_distribution_fitting.cpp(17 test cases)v3.8.0 release
Test results
42/42 tests pass (was 41/41 in v3.7.0).
v4.0.0 development is on
feature/v4-multivariate-emissionsand raises the minimum platform floor to macOS 13+, GCC 12+, Apple Clang 14+, MSVC 2022 17.x for full C++20 and multivariate emission distributions.