Add glm_map object class: scikit-learn-style mass-univariate GLM estimator by torwager · Pull Request #77 · canlab/CanlabCore

torwager · 2026-06-18T21:39:15Z

Summary

Adds a new object class glm_map — a scikit-learn-style estimator for mass-univariate GLM / multiple regression on neuroimaging data. One object bundles:

Design specification — wraps an fmri_glm_design_matrix in .design (1st-level / event mode) or holds a direct .X matrix (2nd-level / group mode), plus contrasts.
Fitted result maps — betas, t, contrast estimates, and contrast t as statistic_image objects, plus sigma, dfe, residuals.
Design diagnostics — VIF, contrast VIF (cVIF), leverage, condition number, rank, and a redundant/near-collinear column report.

glm_map is a standalone container (composition, not an image_vector subclass); fmri_data.regress remains the compute engine. Several attributes (X, TR, onsets, durations, regressor_names, num_*, is_fitted) are true MATLAB Dependent properties that read through to the wrapped design, so there is a single source of truth.

Workflow

% 2nd-level / group (direct design)
g = glm_map('X', X, 'level', 2, 'regressor_names', {'intercept' 'cov'});
g = add_contrasts(g, [0 1], {'cov_effect'});
g = fit(g, fmri_data_obj);          % wraps fmri_data.regress
diagnostics(g); table(g, 'contrast'); montage(g, 'contrast_t');

% 1st-level / event design
g = glm_map(fmri_glm_design_matrix(TR, 'onsets', ons, 'condition_names', names));
g = build_design(g);                % onsets -> X via basis convolution
g = fit(g, timeseries_obj);

% Import an SPM12/SPM25 first-level model
g = import_SPM(glm_map, '/path/SPM.mat');

Commits

Scaffold: classdef, stored + Dependent properties, constructor, disp; implemented diagnostics, add_contrasts, threshold, montage, table, plot_design, summary, check_properties, private select_map.
fit: orchestration over fmri_data.regress (robust/AR/threshold/contrasts/names), unpacks outputs into the result maps, runs full diagnostics; AR gated on is_timeseries. Plus multi-image display resolution.
build_design: event/1st-level design via fmri_glm_design_matrix.build; hardened Dependent accessors against the pre-build 0×0 xX struct.
import_SPM: SPM12/SPM25 first-level import (struct / SPM.mat path / directory), optional beta loading.
Docs: glm_map added to CLAUDE.md class list and @image_vector see-also.

Testing

All methods were exercised live in MATLAB (static analysis clean):

Direct/group fit on load_image_set('emotionreg') → dfe=28, betas [35676×2], diagnostics populated, atlas-labeled table output.
Event-mode: glm_map(fmri_glm_design_matrix) → build_design → X [200×19] from onsets → fit.
SPM import: synthetic SPM struct → onsets/condition-names/regressor-names read through → fit → dfe = nscan − nregressors.

Notes: fit, build_design, and import_SPM are fully implemented. Known rough edges are inherited from downstream code (not glm_map): statistic_image.threshold warns on multi-image cluster-extent thresholding, and region/table errors when zero regions survive a threshold.

🤖 Generated with Claude Code

New scikit-learn-style estimator class that bundles GLM design specification, fitted result maps (statistic_image), and design diagnostics in one container. Composition over fmri_glm_design_matrix (wrapped in .design) and fmri_data.regress (the compute engine). - classdef with stored properties (design/level/is_timeseries, betas/t/contrast maps, vif/leverage/collinearity diagnostics, provenance) and true Dependent accessors (X, TR, onsets, durations, regressor_names, num_*, is_fitted) that read through to .design. - Implemented + MATLAB-verified: diagnostics (VIF/cVIF/leverage/ condition number/rank/redundant-column report), add_contrasts, threshold, montage, table, plot_design, summary, check_properties, private select_map. - Documented stubs with planned field mappings: fit (wraps regress), build_design (wraps fmri_glm_design_matrix.build), import_SPM (SPM12/SPM25 -> .design). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

fit() is now a working orchestration layer: validates the design, attaches obj.X to the fmri_data, assembles regress() arguments (robust/AR/threshold/contrasts/names), runs regress, and unpacks the outputs into the object's statistic_image maps (betas, t, contrast_estimates, contrast_t) plus sigma, dfe, residuals. Records fit_parameters and runs diagnostics() for the full VIF/cVIF/leverage/ collinearity set. AR models are gated on is_timeseries. Also resolve multi-image result maps for display: select_map now parses an optional image index (bare scalar or 'wh_image'), montage selects it when given, and table auto-selects a single image (required because statistic_image.table handles one image at a time). Verified end-to-end on load_image_set('emotionreg'): fit -> dfe=28, betas [35676x2], diagnostics populated, threshold and atlas-labeled table() both working. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

build_design() delegates to fmri_glm_design_matrix.build, which convolves onsets/durations with the basis set and assembles design.xX.X. After the call the Dependent obj.X and obj.regressor_names read through to the built design, so event-mode fit() works the same as direct mode. Also harden the Dependent X / regressor_names accessors: the wrapped design seeds xX as a 0x0 struct array (constructor uses 'name',{}), so guard with isscalar() before indexing .X / .name to avoid an "isempty: not enough input arguments" error pre-build. Verified: glm_map(fmri_glm_design_matrix) -> build_design -> X [200x19] from onsets, regressor names read through, event-mode fit() completes. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

import_SPM() populates a glm_map from an SPM structure, an SPM.mat path, or a directory containing one. Because fmri_glm_design_matrix mirrors SPM's schema, the import is a guarded copy of substructs into the wrapped design: xY.RT -> TR, nscan, xBF, Sess (onsets/durations/names/pmods/ covariates), and xX (matrix + names). Sets level=1 and is_timeseries=true. Optional 'load_betas' reads beta_*.nii/.img from SPM.swd into obj.betas, labeled by xX.name. Whole-substruct copy is resilient to SPM12 vs SPM25 auxiliary-field differences. Verified on a synthetic SPM struct: TR/onsets/condition_names/regressor names read through correctly, and fit() against a matching timeseries yields dfe = nscan - nregressors. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Add glm_map to the object-architecture class list in CLAUDE.md and to the image_vector :See also: block. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

12 functiontests covering construction and Dependent property read-through, design diagnostics (incl. rank-deficiency detection), direct-mode (2nd-level) fit over fmri_data.regress with contrasts, threshold delegation, event-mode build_design via the wrapped fmri_glm_design_matrix, import_SPM + event-mode fit on a synthetic SPM struct, and the main input-validation error paths (no design, AR without timeseries, contrast size mismatch). Auto-discovered by canlab_run_all_tests. All 12 pass (~5s). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Restructure glm_map around fmri_data.regress's results structure and make regress return a glm_map object instead of a plain struct. - glm_map properties now mirror the regress `out` fields, with nested .input_parameters / .input_image_metadata / .diagnostics structs, .df and .sigma kept as fmri_data, and out-struct-name aliases (.b, .con_t, .contrast_images, .resid, .variable_names, .C). Constructor re-casts a regress-style struct; disp lists all properties. - fmri_data.regress returns glm_map(out) (struct-compatible via aliases), replaces deprecated getvif with VIF.m, and adds contrast VIFs (cVIF) to out.diagnostics.Contrast_variance_inflation_factors. - diagnostics: consolidate on Variance_inflation_factors / Leverages names and add per-observation Cook's distance (Cooks_distance); fit auto-computes residuals for it and drops them unless requested. - summary: print a narrative report (analysis, model, diagnostics, and per-predictor/contrast significant-voxel counts at threshold). - Add docs/workflows/regression_with_glm_map.m walkthrough; update unit tests and CLAUDE.md. Co-Authored-By: Claude Fable 5 <[email protected]>

…ss examples Use a real predictor (Reappraisal_Success) from the emotionreg metadata_table and demonstrate the full set of fmri_data.regress example options against the returned glm_map object: regressor/analysis naming, displaying result maps (montage/surface/orthviews/get_wh_image), nested .diagnostics with a VIF/ leverage plot, a 2nd-level CSF nuisance covariate, gray-matter masking + FDR, outlier detection and exclusion, robust regression, residual output, brain-predicts-behavior and map-writing (commented), the estimator path with fit/summary, re-thresholding, and a first-level event design. Fix find/replace artifacts (.obj -> .dat) from the prior revision. Verified all computational sections run end-to-end on the sample dataset. Co-Authored-By: Claude Fable 5 <[email protected]>

Add @glm_map/validate_object, which fills any missing fields in the nested option/diagnostic structs (.input_parameters, .input_image_metadata, .diagnostics incl. .collinearity_report, and .fit_parameters) from canonical field templates without overwriting existing values. The templates are the single source of truth for the valid field names produced by fmri_data.regress and glm_map.diagnostics. - glm_map constructor now calls validate_object before returning, so even a freshly constructed (or struct-recast) object always exposes the complete schema for every nested struct. - fmri_data.regress calls validate_object on the glm_map output before returning it. Also picks up a docs/workflows tweak (run diagnostics(g) before reading .diagnostics fields). glm_map unit tests pass (14/14). Co-Authored-By: Claude Fable 5 <[email protected]>

…_map.replace_basis_set Rebuild the design-matrix convolution on a high-resolution (16 samples/sec, TR-independent) pipeline and audit/fix the class methods. get_session_X: build a 16 Hz onset/epoch indicator per condition, convolve with the per-condition basis (resampled to 16 Hz via xBF.dt), and downsample to TR with getPredictors('dsrate', res*TR) -- the same engine onsets2fmridesign uses. Matches onsets2fmridesign to corr >= 0.999 across TR = 0.8/1/1.3/2/2.5 (incl. fractional TRs), handles event durations (epochs), parametric modulators, multiple sessions, and per-condition basis sets. Default basis is now the canonical SPM HRF sampled at 16 Hz. Audit fixes: - build_single_trial: same hi-res timing fix; index xX(1) (dot-assign to an empty struct failed); correct off-by-one in the intercept (iB) indices. - build/build_single_trial check_model: make the " for Condition N" basis-name suffix idempotent so repeated build() calls don't keep appending. - add: condition_names is now correctly shared across sessions (or accepts session-major names), per the documented behavior. - replace_basis_set: harmonize struct fields between basis builders (spm_get_bf vs fmri_spline_basis) so assignment no longer errors, and expand xBF to one-per-condition so replacing one condition keeps the others. - import_onsets: implement (was a non-functional stub) reading onset/duration/ name columns from a CSV/Excel file or table. - plot: drop a dead `bf = obj.xBF.bf` line that errored for multi-condition xBF. glm_map: add replace_basis_set, which delegates to the design's replace_basis_set, rebuilds X, clears fit results/contrasts/diagnostics that no longer apply (restoring the empty schema via validate_object), and either re-fits (if data supplied) or refreshes design diagnostics. Tests: add event-design timing checks (canonical HRF -> 3 cols; match onsets2fmridesign across TRs) and a replace_basis_set test. 16/16 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>

…lotDesign hook; import_onsets FSL/SPM inputs Regressor roles: add wh_interest / wh_nuisance / wh_intercept Dependent indicators to glm_map. Event designs read the partition from design.xX (iH = of interest, iC/iG = nuisance, iB = intercept); direct/group designs detect the intercept (constant columns) and use a new nuisance_columns property to mark covariates of no interest. diagnostics(): now computes VIFs/cVIFs and the condition number for the full design AND for the regressors of interest only (nuisance covariates removed), storing both (full values stay in the main fields). By default it prints a narrative report that explains each metric's range/interpretation, warns on problematic cases (VIF/cVIF > threshold, condition number > 30/100, high Cook's distance, rank deficiency), and flags when nuisance covariates substantially inflate the of-interest VIFs (indicating correlation between events of interest and nuisance variables). plot_design(): hooks into plotDesign -- when an event design has <= 12 event types of interest, draws color-coded regressor line plots with event/duration boxes for the events of interest, alongside a heat map of the full design matrix (incl. nuisance covariates). All design heat maps use 'YDir','reverse' (observation 0 at top) and axis tight. import_onsets(): handles FSL/tabular files (CSV/Excel/table) with onset, duration, and a condition column that may be a string name OR an integer event-type code, and SPM-style cell arrays of onsets, durations, and parametric modulators (one cell per condition / event type). validate_object: diagnostics template extended with the new fields. Tests: add role-indicator, interest-only-diagnostics, and import_onsets variant tests. 19/19 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>

…nt type Replace the plotDesign call (which re-convolved onsets with its own canonical HRF and ignored the model's basis set) with direct rendering of the object's built design matrix. Each event type of interest gets its own panel showing its actual regressor(s) -- one line per basis function, so multi-basis designs (e.g. spline or HRF+derivatives) display correctly -- with filled boxes marking each event and its duration (box-drawing borrowed from drawbox). The full design-matrix heat map (incl. nuisance covariates, YDir reverse + axis tight) is shown alongside. Per-condition columns are located by basis-function counts at the front of the design (verified A->cols 1:4, B->col 5 for a 4-spline + canonical mix). 19/19 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>

…nterest/nuisance flagging create_orthogonal_contrast_set: new method on @glm_map and @fmri_glm_design_matrix. Uses create_orthogonal_contrast_set.m to assign a set of mutually orthogonal, zero-sum contrasts spanning the regressors of interest (glm_map.wh_interest / design xX.iH), with placeholder names; nuisance and intercept columns get zero weight. If no regressors of interest are defined, it explains how to set them and errors gracefully. diagnostics(): now reports design efficiency for contrasts via calcEfficiency.m -- using entered contrasts, or an auto orthogonal set spanning the regressors of interest when none are entered. When no regressors of interest are defined and no contrasts exist, efficiency is skipped with an informative note (other diagnostics still run). Efficiency fields added to the diagnostics struct / validate_object schema. Import interest/nuisance flagging: events entered via import_onsets become the of-interest regressors (the design's H partition); covariates of no interest become nuisance. import_SPM now derives the interest/nuisance/intercept partition from SPM event semantics (task regressors named '...*bf(...)', session 'constant', everything else nuisance) -- SPM's own xX.iH is typically empty for event-related first-level models -- preserving SPM's original indices under xX.spm_*. glm_map.import_onsets: new pass-through method that imports into the wrapped fmri_glm_design_matrix (FSL/tabular or SPM-style cells), bootstrapping the design from 'TR'/'nscan' if needed, then builds when enough info is available or prints what is missing and how to attach it. Tests: add orthogonal-contrast-set, efficiency, and import/SPM-flagging tests. 22/22 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>

…ummary, time axis Fix two reported issues and improve summaries: - diagnostics: the condition number is now computed after scaling each column to unit L2 norm (the Belsley scaled condition index). It is scale-invariant like VIF, so a well-conditioned design no longer reports a huge condition number that was purely an artifact of the tiny event regressors vs the unit intercept (e.g. 329 -> 1.74, consistent with VIFs ~1). - plot_design heat map: scale each regressor column to unit norm for display, so small-amplitude event regressors are visible instead of being washed out to black by the large-norm intercept. Event-panel x-axis labelled 'Time (seconds)' (time base is always seconds). - disp() and summary() now report the number of observations and regressors with the of-interest / nuisance / intercept breakdown, and list each regressor's role. (g.X already exposes the design matrix, reading through to design.xX.X; verified by test.) Tests: add a scaled-condition-number regression test (raw cond large, scaled small, VIFs low, g.X == design.xX.X). 23/23 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>

…ty/method name clash) glm_map had both a .diagnostics property (the results struct, mirroring fmri_data.regress out.diagnostics) and a diagnostics() method that computes it. MATLAB tolerates this (property via dot-access, method via function syntax, as statistic_image does with threshold), but g.diagnostics(args) silently resolves to property indexing and errors, which is a footgun. Rename the verb to run_diagnostics and keep the noun .diagnostics property (the load-bearing name: it mirrors regress's out.diagnostics and is read in 60+ places). Update the internal callers (fit, replace_basis_set), help examples, the workflow doc, CLAUDE.md, and tests. Call as: g = run_diagnostics(g, ...); results are stored in g.diagnostics. 23/23 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>

…n run_diagnostics, workflow sec 13 plot_design: when diagnostics have been computed, draw a VIF figure in the scn_spm_design_check style -- orange markers with severity reference lines at VIF = 1/2/4/8 -- showing VIFs of the regressors of interest in the full design and (when nuisance indicators exist) without nuisance covariates, plus contrast VIFs (cVIF) when present. Falls back to full-design VIFs only. run_diagnostics: for timeseries designs (is_timeseries) only, compute the cumulative power by frequency for each regressor of interest and each contrast (borrowing cumulative_power / fft_calc from scn_spm_choose_hpfilter), and recommend a high-pass filter cutoff that keeps the maximum low-frequency variance loss below 5% -- separately for regressors and for contrasts. Stored in .diagnostics.hpfilter and printed in the report (with the more conservative of the two as the suggested cutoff). The scaled condition number is unchanged. docs/workflows section 13: add a simple [1 -1] A-vs-B contrast and show plot_design before vs after run_diagnostics (the VIF figure appears after), and display the recommended HP filter. All method calls use run_diagnostics(). Tests: add an HP-filter / cumulative-power test (timeseries vs group). 24/24 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>

…adows canlab_pad) CanlabCore's pad.m was renamed to canlab_pad.m, but onsets2fmridesign still called pad() in the TR-resolution delta fallback. In R2024b+, 'pad' resolves to MATLAB's builtin string pad, which errors ("First argument must be text") on the numeric onset vector. On CI (no Image Processing Toolbox), padarray is unavailable so the fallback was taken and crashed -- failing canlab_test_glm_map/test_event_design_matches_onsets2fmridesign, which calls onsets2fmridesign as a reference (the whole function runs even though the test uses only the X output). Replace the padarray/pad try-catch with an explicit one-sided zero-pad to the run length (matching canlab_pad), removing the toolbox/version-fragile dependency. X is unaffected (computed before this block). 24/24 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>

…renamed pad.m on R2024b+) CanlabCore's pad.m was renamed to canlab_pad.m; three callers still used pad(), which now resolves to MATLAB's builtin string pad and errors on numeric input in R2024b+ (same root cause as the onsets2fmridesign fix). - smooth_timeseries.m: pad -> canlab_pad (count semantics unchanged; correct). - nonlin_parammod_predfun.m (hrf): pad -> canlab_pad; the 2nd arg is a vector, so canlab_pad pads hrf to its length (unchanged behavior). - nonlin_parammod_predfun.m (yhi): the call passed a *target* length to pad, but canlab_pad takes a *count* of zeros; under the surrounding `if length(yhi) < runDuration` the intent is to pad yhi to runDuration, so use canlab_pad(yhi, runDuration - length(yhi)). Smoke-tested: smooth_timeseries (50->50) and nonlin_parammod_predfun (yhi length == runDuration) both run without the pad crash. Co-Authored-By: Claude Fable 5 <[email protected]>

torwager and others added 18 commits June 17, 2026 19:23

Document glm_map in CLAUDE.md and image_vector see-also

1025764

Add glm_map to the object-architecture class list in CLAUDE.md and to the image_vector :See also: block. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

torwager merged commit b85ff7c into master Jun 19, 2026
1 check passed

torwager deleted the glm_map_object branch June 19, 2026 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add glm_map object class: scikit-learn-style mass-univariate GLM estimator#77

Add glm_map object class: scikit-learn-style mass-univariate GLM estimator#77
torwager merged 18 commits into
masterfrom
glm_map_object

torwager commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

torwager commented Jun 18, 2026

Summary

Workflow

Commits

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant