PARALLEL TRACK 3: Comparative benchmark suite for PhD (12 formats × 4 benchmarks)

# 📊 PARALLEL TRACK 3 — Comparative benchmark suite (PhD artifact)

**Owner:** any code agent · **Repo:** [zig-golden-float](https://github.com/gHashTag/zig-golden-float) + [trios-trainer-igla](https://github.com/gHashTag/trios-trainer-igla) · **Time:** ~60 min · **NOT BLOCKED** on bisect

This delivers reusable PhD-grade comparison data for the dissertation. Independent of training collapse — uses pure roundtrip + φ-distance metrics.

**Benchmarks to add in `benches/comparative.rs`:**

1. **Roundtrip MSE on sacred constants:**
   - Inputs: `[1.0, φ, φ², 1/φ, 1/φ², √5, e, π, 0.0, 1e-10, 1e10]`
   - For each format: `f64 → fmtN → f64`, log MSE
   - Output: CSV with columns `format, exp_bits, mant_bits, phi_bias, mse, max_abs_err, sacred_const`

2. **φ-distance benchmark:**
   - For each format: `phi_distance = |exp_bits/mant_bits - 1/φ|` per spec
   - Cross-check actual ratio matches PHI_BIAS_SSOT table
   - Output: leaderboard sorted by φ-distance ASC

3. **ML-typical value distribution:**
   - 10000 random Xavier-initialized weights `N(0, 1/sqrt(h))` for `h ∈ {384, 512, 1024}`
   - Roundtrip through each format → measure cumulative distortion
   - Plot histogram of error per format

4. **Gradient quantization stress test:**
   - Synthetic gradients: `g ~ N(0, sigma=1e-4)` (typical Adam moments scale)
   - Measure update precision after `f64 → format → f64`
   - Output: CSV with `format, sigma, mean_rel_err, max_rel_err`

**Output artifacts:**
- `benches/results/comparative_<date>.csv` (publish-ready data)
- `docs/PHD_BENCHMARK_REPORT.md` with markdown tables + observations
- `docs/figures/*.png` (matplotlib via Python script `tools/plot_bench.py`)

**Acceptance:**
- 9 GF + 3 IEEE formats covered (12 cells × 4 benchmarks)
- All R5-honest, no fake numbers — CSV is publishable raw data
- Coq-style invariant `φ² + φ⁻² = 3` preserved through quantization (sanity check)

Triplet R7 (per benchmark row):
`MSE=<v> format=<f> sigma=<s> sha=<7c> jsonl_row=phd-bench gate_status=BENCH-EVIDENCE`

This Track is the PhD's main deliverable. Sergeant prioritizes it independent of trainer status.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARALLEL TRACK 3: Comparative benchmark suite for PhD (12 formats × 4 benchmarks) #65

📊 PARALLEL TRACK 3 — Comparative benchmark suite (PhD artifact)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PARALLEL TRACK 3: Comparative benchmark suite for PhD (12 formats × 4 benchmarks) #65

Description

📊 PARALLEL TRACK 3 — Comparative benchmark suite (PhD artifact)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions