Harden walkthrough CI tests: snapshots + fault-tolerant harness by torwager · Pull Request #79 · canlab/CanlabCore

torwager · 2026-06-19T22:07:41Z

Problem

The nightly tests-walkthroughs job has never passed (42/42 red). It ran the CANlab_help_examples tutorials end-to-end via evalc(script) with no hardening, so the first orthviews / surface / interactive-prompt / missing-data error on the headless runner failed the entire test. The walkthroughs were doubling as unit tests without the headless-CI hardening the per-push canlab_test_help_examples suite already has.

Approach

Decouple the tests from the live tutorials and harden them:

Verbatim snapshots of the 10 walkthroughs under walkthroughs/private/ (genpath-excluded, so they never shadow the real tutorials when both repos are checked out on CI). Refresh by overwriting from example_help_files/.
helpers/canlab_run_walkthrough_snapshot.m — runs a snapshot %%-cell by cell, headless, each cell in its own try/catch in a shared workspace, so a graphics-only section that fails on a headless runner does not abort the compute sections.
helpers/canlab_classify_environment_error.m — buckets caught errors into graphics / input / data / cascade / genuine. Environment buckets are skipped (Incomplete); only genuine errors fail, with an informative report (section #, offending line, error id).
Rewrote the 10 canlab_test_walkthrough_*.m wrappers to run their snapshot through the harness.
walkthroughs/README.md documents the design, refresh process, and cadence rationale.

Cadence decision: keep as a separate nightly tier

Measured full-suite wall-time ~6.8 min on a fast workstation → est. ~15–20 min on the GitHub Linux runner. Folding that (plus graphics/data-dependent flakiness) into the fast per-push gate would slow every PR and make the required check unreliable. These stay nightly; the per-push tests suite is unaffected (the is_walkthrough filter excludes them by default).

Result

Full suite: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs this surfaced are fixed in canlab/CANlab_help_examples#3 and the snapshots here refreshed accordingly:

walkthrough 3: write() without 'overwrite'
walkthrough 4b: .dat_descrip → .metadata_table

🤖 Generated with Claude Code

New scikit-learn-style estimator class that bundles GLM design specification, fitted result maps (statistic_image), and design diagnostics in one container. Composition over fmri_glm_design_matrix (wrapped in .design) and fmri_data.regress (the compute engine). - classdef with stored properties (design/level/is_timeseries, betas/t/contrast maps, vif/leverage/collinearity diagnostics, provenance) and true Dependent accessors (X, TR, onsets, durations, regressor_names, num_*, is_fitted) that read through to .design. - Implemented + MATLAB-verified: diagnostics (VIF/cVIF/leverage/ condition number/rank/redundant-column report), add_contrasts, threshold, montage, table, plot_design, summary, check_properties, private select_map. - Documented stubs with planned field mappings: fit (wraps regress), build_design (wraps fmri_glm_design_matrix.build), import_SPM (SPM12/SPM25 -> .design). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

fit() is now a working orchestration layer: validates the design, attaches obj.X to the fmri_data, assembles regress() arguments (robust/AR/threshold/contrasts/names), runs regress, and unpacks the outputs into the object's statistic_image maps (betas, t, contrast_estimates, contrast_t) plus sigma, dfe, residuals. Records fit_parameters and runs diagnostics() for the full VIF/cVIF/leverage/ collinearity set. AR models are gated on is_timeseries. Also resolve multi-image result maps for display: select_map now parses an optional image index (bare scalar or 'wh_image'), montage selects it when given, and table auto-selects a single image (required because statistic_image.table handles one image at a time). Verified end-to-end on load_image_set('emotionreg'): fit -> dfe=28, betas [35676x2], diagnostics populated, threshold and atlas-labeled table() both working. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

build_design() delegates to fmri_glm_design_matrix.build, which convolves onsets/durations with the basis set and assembles design.xX.X. After the call the Dependent obj.X and obj.regressor_names read through to the built design, so event-mode fit() works the same as direct mode. Also harden the Dependent X / regressor_names accessors: the wrapped design seeds xX as a 0x0 struct array (constructor uses 'name',{}), so guard with isscalar() before indexing .X / .name to avoid an "isempty: not enough input arguments" error pre-build. Verified: glm_map(fmri_glm_design_matrix) -> build_design -> X [200x19] from onsets, regressor names read through, event-mode fit() completes. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

import_SPM() populates a glm_map from an SPM structure, an SPM.mat path, or a directory containing one. Because fmri_glm_design_matrix mirrors SPM's schema, the import is a guarded copy of substructs into the wrapped design: xY.RT -> TR, nscan, xBF, Sess (onsets/durations/names/pmods/ covariates), and xX (matrix + names). Sets level=1 and is_timeseries=true. Optional 'load_betas' reads beta_*.nii/.img from SPM.swd into obj.betas, labeled by xX.name. Whole-substruct copy is resilient to SPM12 vs SPM25 auxiliary-field differences. Verified on a synthetic SPM struct: TR/onsets/condition_names/regressor names read through correctly, and fit() against a matching timeseries yields dfe = nscan - nregressors. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Add glm_map to the object-architecture class list in CLAUDE.md and to the image_vector :See also: block. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

12 functiontests covering construction and Dependent property read-through, design diagnostics (incl. rank-deficiency detection), direct-mode (2nd-level) fit over fmri_data.regress with contrasts, threshold delegation, event-mode build_design via the wrapped fmri_glm_design_matrix, import_SPM + event-mode fit on a synthetic SPM struct, and the main input-validation error paths (no design, AR without timeseries, contrast size mismatch). Auto-discovered by canlab_run_all_tests. All 12 pass (~5s). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

The nightly tests-walkthroughs job had never passed (42/42 red). It ran the CANlab_help_examples tutorials end-to-end via evalc() with no hardening, so the first orthviews/surface/prompt/missing-data error on the headless runner failed the whole test. Decouple the tests from the live tutorials and harden them: - Add verbatim snapshots of the 10 walkthroughs under walkthroughs/private/ (genpath-excluded, so they never shadow the real tutorials on CI). Refresh by overwriting from example_help_files/. - Add helpers/canlab_run_walkthrough_snapshot.m: runs a snapshot %%-cell by cell, headless, each cell in its own try/catch in a shared workspace, so a graphics-only section that fails on a headless runner does not abort the compute sections. - Add helpers/canlab_classify_environment_error.m: buckets caught errors into graphics / input / data / cascade / genuine (centralizes the heuristics previously inlined in canlab_test_help_examples). Environment buckets are skipped (Incomplete); only genuine errors fail, with an informative report naming the section, offending line, and error id. - Rewrite the 10 canlab_test_walkthrough_*.m wrappers to run their snapshot through the harness instead of evalc'ing the external script. - Add walkthroughs/README.md documenting the design, refresh process, and why these stay a separate nightly tier (~5 min local / ~15-20 min CI) rather than folding into the fast per-push suite. Full suite now: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs this surfaced (write-without-overwrite in walkthrough 3; dat_descrip -> metadata_table in 4b) are fixed in CANlab_help_examples and the snapshots refreshed accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

First CI run of the hardened nightly surfaced 3 failures, all the same environment gap: read_nifti_volume falls back to niftiinfo (Image Processing Toolbox), which the runner did not provision (only Statistics + Signal were installed), so plot(obj) failed with Undefined function 'niftiinfo'. - Provision Image_Processing_Toolbox in tests-walkthroughs.yml so plot() and NIfTI I/O actually run on CI. - Safety net: canlab_classify_environment_error now buckets an UndefinedFunction error for a known optional-toolbox function (niftiinfo, niftiread, niftiwrite, cfg_getfile) as 'capability' -> skipped, not failed, so a missing toolbox can never spuriously redden the nightly. Genuine missing-function bugs still classify as 'genuine'. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

torwager and others added 7 commits June 17, 2026 19:23

Document glm_map in CLAUDE.md and image_vector see-also

1025764

Add glm_map to the object-architecture class list in CLAUDE.md and to the image_vector :See also: block. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

torwager mentioned this pull request Jun 19, 2026

Fix two walkthroughs for current CanlabCore API canlab/CANlab_help_examples#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden walkthrough CI tests: snapshots + fault-tolerant harness#79

Harden walkthrough CI tests: snapshots + fault-tolerant harness#79
torwager wants to merge 8 commits into
masterfrom
fix/ci-walkthroughs

torwager commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

torwager commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Approach

Cadence decision: keep as a separate nightly tier

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

torwager commented Jun 19, 2026 •

edited

Loading