π PARALLEL TRACK 3 β Comparative benchmark suite (PhD artifact)
Owner: any code agent Β· Repo: zig-golden-float + trios-trainer-igla Β· Time: ~60 min Β· NOT BLOCKED on bisect
This delivers reusable PhD-grade comparison data for the dissertation. Independent of training collapse β uses pure roundtrip + Ο-distance metrics.
Benchmarks to add in benches/comparative.rs:
-
Roundtrip MSE on sacred constants:
- Inputs:
[1.0, Ο, ΟΒ², 1/Ο, 1/ΟΒ², β5, e, Ο, 0.0, 1e-10, 1e10]
- For each format:
f64 β fmtN β f64, log MSE
- Output: CSV with columns
format, exp_bits, mant_bits, phi_bias, mse, max_abs_err, sacred_const
-
Ο-distance benchmark:
- For each format:
phi_distance = |exp_bits/mant_bits - 1/Ο| per spec
- Cross-check actual ratio matches PHI_BIAS_SSOT table
- Output: leaderboard sorted by Ο-distance ASC
-
ML-typical value distribution:
- 10000 random Xavier-initialized weights
N(0, 1/sqrt(h)) for h β {384, 512, 1024}
- Roundtrip through each format β measure cumulative distortion
- Plot histogram of error per format
-
Gradient quantization stress test:
- Synthetic gradients:
g ~ N(0, sigma=1e-4) (typical Adam moments scale)
- Measure update precision after
f64 β format β f64
- Output: CSV with
format, sigma, mean_rel_err, max_rel_err
Output artifacts:
benches/results/comparative_<date>.csv (publish-ready data)
docs/PHD_BENCHMARK_REPORT.md with markdown tables + observations
docs/figures/*.png (matplotlib via Python script tools/plot_bench.py)
Acceptance:
- 9 GF + 3 IEEE formats covered (12 cells Γ 4 benchmarks)
- All R5-honest, no fake numbers β CSV is publishable raw data
- Coq-style invariant
ΟΒ² + Οβ»Β² = 3 preserved through quantization (sanity check)
Triplet R7 (per benchmark row):
MSE=<v> format=<f> sigma=<s> sha=<7c> jsonl_row=phd-bench gate_status=BENCH-EVIDENCE
This Track is the PhD's main deliverable. Sergeant prioritizes it independent of trainer status.
π PARALLEL TRACK 3 β Comparative benchmark suite (PhD artifact)
Owner: any code agent Β· Repo: zig-golden-float + trios-trainer-igla Β· Time: ~60 min Β· NOT BLOCKED on bisect
This delivers reusable PhD-grade comparison data for the dissertation. Independent of training collapse β uses pure roundtrip + Ο-distance metrics.
Benchmarks to add in
benches/comparative.rs:Roundtrip MSE on sacred constants:
[1.0, Ο, ΟΒ², 1/Ο, 1/ΟΒ², β5, e, Ο, 0.0, 1e-10, 1e10]f64 β fmtN β f64, log MSEformat, exp_bits, mant_bits, phi_bias, mse, max_abs_err, sacred_constΟ-distance benchmark:
phi_distance = |exp_bits/mant_bits - 1/Ο|per specML-typical value distribution:
N(0, 1/sqrt(h))forh β {384, 512, 1024}Gradient quantization stress test:
g ~ N(0, sigma=1e-4)(typical Adam moments scale)f64 β format β f64format, sigma, mean_rel_err, max_rel_errOutput artifacts:
benches/results/comparative_<date>.csv(publish-ready data)docs/PHD_BENCHMARK_REPORT.mdwith markdown tables + observationsdocs/figures/*.png(matplotlib via Python scripttools/plot_bench.py)Acceptance:
ΟΒ² + Οβ»Β² = 3preserved through quantization (sanity check)Triplet R7 (per benchmark row):
MSE=<v> format=<f> sigma=<s> sha=<7c> jsonl_row=phd-bench gate_status=BENCH-EVIDENCEThis Track is the PhD's main deliverable. Sergeant prioritizes it independent of trainer status.