This repository accompanies the paper:
Kendiukhov, I. (2026). What topological and geometric structure do biological foundation models learn? Evidence from 141 hypotheses. PLOS ONE (under revision). Manuscript PDF:
paper/Manuscript.pdf.
It contains the full autonomous executor–brainstormer hypothesis-screening campaign that produced and tested 141 geometric / topological hypotheses about scGPT and Geneformer V2-316M gene representations across 52 productive iterations, plus the six revision experiments added in response to PLOS ONE peer review.
| Path | What it contains |
|---|---|
paper/ |
LaTeX source (main_revised.tex), figures (Fig1.png–Fig8.png and .tiff masters), compiled PDFs (Manuscript.pdf, Revised_Manuscript_with_Track_Changes.pdf, Response_to_Reviewers.pdf) and the cover letter. |
autoloop/ |
The autonomous executor–brainstormer driver scripts (run_codex_topology_autoloop.py, run_claude_topology_autoloop.py). |
prompts/ |
Versioned brainstormer + executor prompt templates that defined the loop's behaviour. |
planning/ |
Initial design notes for the autoloop. |
iterations/ |
One subdirectory per iteration (iter_0001/–iter_0081/). Each contains the executor's experiment script (run_iter*.py), structured iteration report (executor_iteration_report.md), parsed hypothesis-screen JSON (executor_hypothesis_screen.json), brainstormer reasoning (brainstormer_last_message.md, brainstormer_hypothesis_roadmap.md), and all CSV/JSON artifacts produced by the iteration. The 141 hypotheses analysed in the manuscript span iter_0001–iter_0054. |
revision_experiments/ |
Six new analyses added during the PLOS ONE revision: kidney external-tissue replication (r1_*), family-wise BH/Bonferroni multiple-comparison correction (r2_*), H123 component ablation (r3_*), 60-combination CV+hyperparameter sweep (r4_*), coexpression-residual analysis for headline findings (r5_*), and autoloop scalability profile (r6_*). Includes citation_evaluation.md. |
reports/autoloop_master_log.md |
Running iteration-by-iteration master log (the "lab notebook" of the campaign). |
docs/ |
Reproducibility instructions, data-availability statement, changelog, and full revision plan. |
| ID | Finding | Effect | Status under strict max-null audit |
|---|---|---|---|
| H24 | Cross-model CCA alignment (scGPT ↔ Geneformer V2) |
|
Robust |
| H123 | Signed motif–community hardening | $\Delta$AUROC |
Robust (composite of geometric + annotation features; see revision_experiments/r3_h123_ablation/) |
| H91 | Stability-selected geometric descriptors | $\Delta$AUROC |
Robust |
| H70 | Triangle-defect spectrum | $\Delta$AUROC |
Robust; 93% coexpression-independent |
| H01/H03 | Persistent homology (H1 loops) | 11–12/12 layers |
Robust under feature-shuffle; vanishes under degree-preserving rewiring |
| H141 | Strict max-null audit | 3/9 splits positive | Signal concentrates in immune tissue |
See docs/reproducibility.md. In short:
- Install Python ≥ 3.11 and the dependencies in
requirements.txt(or use the conda envsubproject40-topologyreferenced in iteration scripts). - Obtain the precomputed scGPT and Geneformer V2-316M residual-stream gene embeddings (Tabula Sapiens immune / lung / external-lung / kidney). Sources and instructions are in
docs/data_availability.md. The current scripts reference absolute paths under/Volumes/Crucial X6/...; reproduction requires editing the input-path constants at the top of each script (or symlinking). - Re-run any single iteration with
python iterations/iter_NNNN/run_iterNNNN_screen.py. Dependencies between iterations (when one builds on another) are documented in each iteration'sexecutor_iteration_report.md. - Re-run the revision experiments with
python revision_experiments/rN_*/run_*.py.
A lightweight smoke test that re-creates the multiple-comparison correction (no embeddings needed) is:
python revision_experiments/r2_multiple_comparisons/aggregate_pvalues.pyAn LLM-driven brainstormer reads the cumulative state of all prior
iterations and proposes 2–4 new geometric/topological hypotheses about the
gene-embedding space. An LLM-driven executor receives the hypothesis spec,
writes a self-contained Python experiment that operates on cached
foundation-model embeddings, runs it, and produces a structured
executor_hypothesis_screen.json with effect sizes, null-calibrated
This repository is the read-only artifact of a published study; major changes
will not be accepted, but bug reports and reproducibility issues are welcome
via GitHub issues. See CONTRIBUTING.md.
If you use this code or build on its findings, please cite the paper:
@article{kendiukhov2026topology,
author = {Kendiukhov, Ihor},
title = {What topological and geometric structure do biological foundation models learn? {E}vidence from 141 hypotheses},
journal = {PLOS ONE},
year = {2026},
note = {Under revision; preprint and code: \url{https://github.com/Biodyn-AI/hypotheses}}
}A machine-readable CITATION.cff is also provided.
MIT — see LICENSE.
The autonomous loop was driven by OpenAI Codex 5.3 (executor and brainstormer
agents). The manuscript was prepared with the assistance of Claude
(Anthropic). The author is grateful to the PLOS ONE editor and reviewer for
the thoughtful revision feedback that shaped the additional analyses in
revision_experiments/.