Harden walkthrough CI tests: snapshots + fault-tolerant harness#79
Open
torwager wants to merge 8 commits into
Open
Harden walkthrough CI tests: snapshots + fault-tolerant harness#79torwager wants to merge 8 commits into
torwager wants to merge 8 commits into
Conversation
New scikit-learn-style estimator class that bundles GLM design specification, fitted result maps (statistic_image), and design diagnostics in one container. Composition over fmri_glm_design_matrix (wrapped in .design) and fmri_data.regress (the compute engine). - classdef with stored properties (design/level/is_timeseries, betas/t/contrast maps, vif/leverage/collinearity diagnostics, provenance) and true Dependent accessors (X, TR, onsets, durations, regressor_names, num_*, is_fitted) that read through to .design. - Implemented + MATLAB-verified: diagnostics (VIF/cVIF/leverage/ condition number/rank/redundant-column report), add_contrasts, threshold, montage, table, plot_design, summary, check_properties, private select_map. - Documented stubs with planned field mappings: fit (wraps regress), build_design (wraps fmri_glm_design_matrix.build), import_SPM (SPM12/SPM25 -> .design). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
fit() is now a working orchestration layer: validates the design,
attaches obj.X to the fmri_data, assembles regress() arguments
(robust/AR/threshold/contrasts/names), runs regress, and unpacks the
outputs into the object's statistic_image maps (betas, t,
contrast_estimates, contrast_t) plus sigma, dfe, residuals. Records
fit_parameters and runs diagnostics() for the full VIF/cVIF/leverage/
collinearity set. AR models are gated on is_timeseries.
Also resolve multi-image result maps for display: select_map now parses
an optional image index (bare scalar or 'wh_image'), montage selects it
when given, and table auto-selects a single image (required because
statistic_image.table handles one image at a time).
Verified end-to-end on load_image_set('emotionreg'): fit -> dfe=28,
betas [35676x2], diagnostics populated, threshold and atlas-labeled
table() both working.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
build_design() delegates to fmri_glm_design_matrix.build, which convolves
onsets/durations with the basis set and assembles design.xX.X. After the
call the Dependent obj.X and obj.regressor_names read through to the built
design, so event-mode fit() works the same as direct mode.
Also harden the Dependent X / regressor_names accessors: the wrapped
design seeds xX as a 0x0 struct array (constructor uses 'name',{}), so
guard with isscalar() before indexing .X / .name to avoid an
"isempty: not enough input arguments" error pre-build.
Verified: glm_map(fmri_glm_design_matrix) -> build_design -> X [200x19]
from onsets, regressor names read through, event-mode fit() completes.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
import_SPM() populates a glm_map from an SPM structure, an SPM.mat path, or a directory containing one. Because fmri_glm_design_matrix mirrors SPM's schema, the import is a guarded copy of substructs into the wrapped design: xY.RT -> TR, nscan, xBF, Sess (onsets/durations/names/pmods/ covariates), and xX (matrix + names). Sets level=1 and is_timeseries=true. Optional 'load_betas' reads beta_*.nii/.img from SPM.swd into obj.betas, labeled by xX.name. Whole-substruct copy is resilient to SPM12 vs SPM25 auxiliary-field differences. Verified on a synthetic SPM struct: TR/onsets/condition_names/regressor names read through correctly, and fit() against a matching timeseries yields dfe = nscan - nregressors. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Add glm_map to the object-architecture class list in CLAUDE.md and to the image_vector :See also: block. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
12 functiontests covering construction and Dependent property read-through, design diagnostics (incl. rank-deficiency detection), direct-mode (2nd-level) fit over fmri_data.regress with contrasts, threshold delegation, event-mode build_design via the wrapped fmri_glm_design_matrix, import_SPM + event-mode fit on a synthetic SPM struct, and the main input-validation error paths (no design, AR without timeseries, contrast size mismatch). Auto-discovered by canlab_run_all_tests. All 12 pass (~5s). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
The nightly tests-walkthroughs job had never passed (42/42 red). It ran the CANlab_help_examples tutorials end-to-end via evalc() with no hardening, so the first orthviews/surface/prompt/missing-data error on the headless runner failed the whole test. Decouple the tests from the live tutorials and harden them: - Add verbatim snapshots of the 10 walkthroughs under walkthroughs/private/ (genpath-excluded, so they never shadow the real tutorials on CI). Refresh by overwriting from example_help_files/. - Add helpers/canlab_run_walkthrough_snapshot.m: runs a snapshot %%-cell by cell, headless, each cell in its own try/catch in a shared workspace, so a graphics-only section that fails on a headless runner does not abort the compute sections. - Add helpers/canlab_classify_environment_error.m: buckets caught errors into graphics / input / data / cascade / genuine (centralizes the heuristics previously inlined in canlab_test_help_examples). Environment buckets are skipped (Incomplete); only genuine errors fail, with an informative report naming the section, offending line, and error id. - Rewrite the 10 canlab_test_walkthrough_*.m wrappers to run their snapshot through the harness instead of evalc'ing the external script. - Add walkthroughs/README.md documenting the design, refresh process, and why these stay a separate nightly tier (~5 min local / ~15-20 min CI) rather than folding into the fast per-push suite. Full suite now: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs this surfaced (write-without-overwrite in walkthrough 3; dat_descrip -> metadata_table in 4b) are fixed in CANlab_help_examples and the snapshots refreshed accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
First CI run of the hardened nightly surfaced 3 failures, all the same environment gap: read_nifti_volume falls back to niftiinfo (Image Processing Toolbox), which the runner did not provision (only Statistics + Signal were installed), so plot(obj) failed with Undefined function 'niftiinfo'. - Provision Image_Processing_Toolbox in tests-walkthroughs.yml so plot() and NIfTI I/O actually run on CI. - Safety net: canlab_classify_environment_error now buckets an UndefinedFunction error for a known optional-toolbox function (niftiinfo, niftiread, niftiwrite, cfg_getfile) as 'capability' -> skipped, not failed, so a missing toolbox can never spuriously redden the nightly. Genuine missing-function bugs still classify as 'genuine'. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The nightly
tests-walkthroughsjob has never passed (42/42 red). It ran the CANlab_help_examples tutorials end-to-end viaevalc(script)with no hardening, so the firstorthviews/surface/ interactive-prompt / missing-data error on the headless runner failed the entire test. The walkthroughs were doubling as unit tests without the headless-CI hardening the per-pushcanlab_test_help_examplessuite already has.Approach
Decouple the tests from the live tutorials and harden them:
walkthroughs/private/(genpath-excluded, so they never shadow the real tutorials when both repos are checked out on CI). Refresh by overwriting fromexample_help_files/.helpers/canlab_run_walkthrough_snapshot.m— runs a snapshot%%-cell by cell, headless, each cell in its owntry/catchin a shared workspace, so a graphics-only section that fails on a headless runner does not abort the compute sections.helpers/canlab_classify_environment_error.m— buckets caught errors intographics/input/data/cascade/genuine. Environment buckets are skipped (Incomplete); only genuine errors fail, with an informative report (section #, offending line, error id).canlab_test_walkthrough_*.mwrappers to run their snapshot through the harness.walkthroughs/README.mddocuments the design, refresh process, and cadence rationale.Cadence decision: keep as a separate nightly tier
Measured full-suite wall-time ~6.8 min on a fast workstation → est. ~15–20 min on the GitHub Linux runner. Folding that (plus graphics/data-dependent flakiness) into the fast per-push gate would slow every PR and make the required check unreliable. These stay nightly; the per-push
testssuite is unaffected (theis_walkthroughfilter excludes them by default).Result
Full suite: 10 passed, 0 failed, 0 incomplete. Two genuine tutorial bugs this surfaced are fixed in canlab/CANlab_help_examples#3 and the snapshots here refreshed accordingly:
write()without'overwrite'.dat_descrip→.metadata_table🤖 Generated with Claude Code