Skip to content

PARALLEL TRACK 3: Comparative benchmark suite for PhD (12 formats Γ— 4 benchmarks)Β #65

@gHashTag

Description

@gHashTag

πŸ“Š PARALLEL TRACK 3 β€” Comparative benchmark suite (PhD artifact)

Owner: any code agent Β· Repo: zig-golden-float + trios-trainer-igla Β· Time: ~60 min Β· NOT BLOCKED on bisect

This delivers reusable PhD-grade comparison data for the dissertation. Independent of training collapse β€” uses pure roundtrip + Ο†-distance metrics.

Benchmarks to add in benches/comparative.rs:

  1. Roundtrip MSE on sacred constants:

    • Inputs: [1.0, Ο†, φ², 1/Ο†, 1/φ², √5, e, Ο€, 0.0, 1e-10, 1e10]
    • For each format: f64 β†’ fmtN β†’ f64, log MSE
    • Output: CSV with columns format, exp_bits, mant_bits, phi_bias, mse, max_abs_err, sacred_const
  2. Ο†-distance benchmark:

    • For each format: phi_distance = |exp_bits/mant_bits - 1/Ο†| per spec
    • Cross-check actual ratio matches PHI_BIAS_SSOT table
    • Output: leaderboard sorted by Ο†-distance ASC
  3. ML-typical value distribution:

    • 10000 random Xavier-initialized weights N(0, 1/sqrt(h)) for h ∈ {384, 512, 1024}
    • Roundtrip through each format β†’ measure cumulative distortion
    • Plot histogram of error per format
  4. Gradient quantization stress test:

    • Synthetic gradients: g ~ N(0, sigma=1e-4) (typical Adam moments scale)
    • Measure update precision after f64 β†’ format β†’ f64
    • Output: CSV with format, sigma, mean_rel_err, max_rel_err

Output artifacts:

  • benches/results/comparative_<date>.csv (publish-ready data)
  • docs/PHD_BENCHMARK_REPORT.md with markdown tables + observations
  • docs/figures/*.png (matplotlib via Python script tools/plot_bench.py)

Acceptance:

  • 9 GF + 3 IEEE formats covered (12 cells Γ— 4 benchmarks)
  • All R5-honest, no fake numbers β€” CSV is publishable raw data
  • Coq-style invariant φ² + φ⁻² = 3 preserved through quantization (sanity check)

Triplet R7 (per benchmark row):
MSE=<v> format=<f> sigma=<s> sha=<7c> jsonl_row=phd-bench gate_status=BENCH-EVIDENCE

This Track is the PhD's main deliverable. Sergeant prioritizes it independent of trainer status.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions