Skip to content

Harden walkthrough CI tests: snapshots + fault-tolerant harness#79

Open
torwager wants to merge 8 commits into
masterfrom
fix/ci-walkthroughs
Open

Harden walkthrough CI tests: snapshots + fault-tolerant harness#79
torwager wants to merge 8 commits into
masterfrom
fix/ci-walkthroughs

Conversation

@torwager

@torwager torwager commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Problem

The nightly tests-walkthroughs job has never passed (42/42 red). It ran the CANlab_help_examples tutorials end-to-end via evalc(script) with no hardening, so the first orthviews / surface / interactive-prompt / missing-data error on the headless runner failed the entire test. The walkthroughs were doubling as unit tests without the headless-CI hardening the per-push canlab_test_help_examples suite already has.

Approach

Decouple the tests from the live tutorials and harden them:

  • Verbatim snapshots of the 10 walkthroughs under walkthroughs/private/ (genpath-excluded, so they never shadow the real tutorials when both repos are checked out on CI). Refresh by overwriting from example_help_files/.
  • helpers/canlab_run_walkthrough_snapshot.m — runs a snapshot %%-cell by cell, headless, each cell in its own try/catch in a shared workspace, so a graphics-only section that fails on a headless runner does not abort the compute sections.
  • helpers/canlab_classify_environment_error.m — buckets caught errors into graphics / input / data / cascade / genuine. Environment buckets are skipped (Incomplete); only genuine errors fail, with an informative report (section #, offending line, error id).
  • Rewrote the 10 canlab_test_walkthrough_*.m wrappers to run their snapshot through the harness.
  • walkthroughs/README.md documents the design, refresh process, and cadence rationale.

Cadence decision: keep as a separate nightly tier

Measured full-suite wall-time ~6.8 min on a fast workstation → est. ~15–20 min on the GitHub Linux runner. Folding that (plus graphics/data-dependent flakiness) into the fast per-push gate would slow every PR and make the required check unreliable. These stay nightly; the per-push tests suite is unaffected (the is_walkthrough filter excludes them by default).

Result

Full suite: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs this surfaced are fixed in canlab/CANlab_help_examples#3 and the snapshots here refreshed accordingly:

  • walkthrough 3: write() without 'overwrite'
  • walkthrough 4b: .dat_descrip.metadata_table

🤖 Generated with Claude Code

torwager and others added 7 commits June 17, 2026 19:23
New scikit-learn-style estimator class that bundles GLM design
specification, fitted result maps (statistic_image), and design
diagnostics in one container. Composition over fmri_glm_design_matrix
(wrapped in .design) and fmri_data.regress (the compute engine).

- classdef with stored properties (design/level/is_timeseries,
  betas/t/contrast maps, vif/leverage/collinearity diagnostics,
  provenance) and true Dependent accessors (X, TR, onsets, durations,
  regressor_names, num_*, is_fitted) that read through to .design.
- Implemented + MATLAB-verified: diagnostics (VIF/cVIF/leverage/
  condition number/rank/redundant-column report), add_contrasts,
  threshold, montage, table, plot_design, summary, check_properties,
  private select_map.
- Documented stubs with planned field mappings: fit (wraps regress),
  build_design (wraps fmri_glm_design_matrix.build), import_SPM
  (SPM12/SPM25 -> .design).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
fit() is now a working orchestration layer: validates the design,
attaches obj.X to the fmri_data, assembles regress() arguments
(robust/AR/threshold/contrasts/names), runs regress, and unpacks the
outputs into the object's statistic_image maps (betas, t,
contrast_estimates, contrast_t) plus sigma, dfe, residuals. Records
fit_parameters and runs diagnostics() for the full VIF/cVIF/leverage/
collinearity set. AR models are gated on is_timeseries.

Also resolve multi-image result maps for display: select_map now parses
an optional image index (bare scalar or 'wh_image'), montage selects it
when given, and table auto-selects a single image (required because
statistic_image.table handles one image at a time).

Verified end-to-end on load_image_set('emotionreg'): fit -> dfe=28,
betas [35676x2], diagnostics populated, threshold and atlas-labeled
table() both working.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
build_design() delegates to fmri_glm_design_matrix.build, which convolves
onsets/durations with the basis set and assembles design.xX.X. After the
call the Dependent obj.X and obj.regressor_names read through to the built
design, so event-mode fit() works the same as direct mode.

Also harden the Dependent X / regressor_names accessors: the wrapped
design seeds xX as a 0x0 struct array (constructor uses 'name',{}), so
guard with isscalar() before indexing .X / .name to avoid an
"isempty: not enough input arguments" error pre-build.

Verified: glm_map(fmri_glm_design_matrix) -> build_design -> X [200x19]
from onsets, regressor names read through, event-mode fit() completes.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
import_SPM() populates a glm_map from an SPM structure, an SPM.mat path,
or a directory containing one. Because fmri_glm_design_matrix mirrors
SPM's schema, the import is a guarded copy of substructs into the wrapped
design: xY.RT -> TR, nscan, xBF, Sess (onsets/durations/names/pmods/
covariates), and xX (matrix + names). Sets level=1 and is_timeseries=true.
Optional 'load_betas' reads beta_*.nii/.img from SPM.swd into obj.betas,
labeled by xX.name. Whole-substruct copy is resilient to SPM12 vs SPM25
auxiliary-field differences.

Verified on a synthetic SPM struct: TR/onsets/condition_names/regressor
names read through correctly, and fit() against a matching timeseries
yields dfe = nscan - nregressors.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Add glm_map to the object-architecture class list in CLAUDE.md and to the
image_vector :See also: block.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
12 functiontests covering construction and Dependent property
read-through, design diagnostics (incl. rank-deficiency detection),
direct-mode (2nd-level) fit over fmri_data.regress with contrasts,
threshold delegation, event-mode build_design via the wrapped
fmri_glm_design_matrix, import_SPM + event-mode fit on a synthetic SPM
struct, and the main input-validation error paths (no design, AR without
timeseries, contrast size mismatch).

Auto-discovered by canlab_run_all_tests. All 12 pass (~5s).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
The nightly tests-walkthroughs job had never passed (42/42 red). It ran the
CANlab_help_examples tutorials end-to-end via evalc() with no hardening, so
the first orthviews/surface/prompt/missing-data error on the headless runner
failed the whole test.

Decouple the tests from the live tutorials and harden them:

- Add verbatim snapshots of the 10 walkthroughs under walkthroughs/private/
  (genpath-excluded, so they never shadow the real tutorials on CI). Refresh
  by overwriting from example_help_files/.
- Add helpers/canlab_run_walkthrough_snapshot.m: runs a snapshot %%-cell by
  cell, headless, each cell in its own try/catch in a shared workspace, so a
  graphics-only section that fails on a headless runner does not abort the
  compute sections.
- Add helpers/canlab_classify_environment_error.m: buckets caught errors into
  graphics / input / data / cascade / genuine (centralizes the heuristics
  previously inlined in canlab_test_help_examples). Environment buckets are
  skipped (Incomplete); only genuine errors fail, with an informative report
  naming the section, offending line, and error id.
- Rewrite the 10 canlab_test_walkthrough_*.m wrappers to run their snapshot
  through the harness instead of evalc'ing the external script.
- Add walkthroughs/README.md documenting the design, refresh process, and why
  these stay a separate nightly tier (~5 min local / ~15-20 min CI) rather
  than folding into the fast per-push suite.

Full suite now: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs
this surfaced (write-without-overwrite in walkthrough 3; dat_descrip ->
metadata_table in 4b) are fixed in CANlab_help_examples and the snapshots
refreshed accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
First CI run of the hardened nightly surfaced 3 failures, all the same
environment gap: read_nifti_volume falls back to niftiinfo (Image Processing
Toolbox), which the runner did not provision (only Statistics + Signal were
installed), so plot(obj) failed with Undefined function 'niftiinfo'.

- Provision Image_Processing_Toolbox in tests-walkthroughs.yml so plot() and
  NIfTI I/O actually run on CI.
- Safety net: canlab_classify_environment_error now buckets an
  UndefinedFunction error for a known optional-toolbox function (niftiinfo,
  niftiread, niftiwrite, cfg_getfile) as 'capability' -> skipped, not failed,
  so a missing toolbox can never spuriously redden the nightly. Genuine
  missing-function bugs still classify as 'genuine'.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant