Add glm_map object class: scikit-learn-style mass-univariate GLM estimator#77
Merged
Conversation
New scikit-learn-style estimator class that bundles GLM design specification, fitted result maps (statistic_image), and design diagnostics in one container. Composition over fmri_glm_design_matrix (wrapped in .design) and fmri_data.regress (the compute engine). - classdef with stored properties (design/level/is_timeseries, betas/t/contrast maps, vif/leverage/collinearity diagnostics, provenance) and true Dependent accessors (X, TR, onsets, durations, regressor_names, num_*, is_fitted) that read through to .design. - Implemented + MATLAB-verified: diagnostics (VIF/cVIF/leverage/ condition number/rank/redundant-column report), add_contrasts, threshold, montage, table, plot_design, summary, check_properties, private select_map. - Documented stubs with planned field mappings: fit (wraps regress), build_design (wraps fmri_glm_design_matrix.build), import_SPM (SPM12/SPM25 -> .design). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
fit() is now a working orchestration layer: validates the design,
attaches obj.X to the fmri_data, assembles regress() arguments
(robust/AR/threshold/contrasts/names), runs regress, and unpacks the
outputs into the object's statistic_image maps (betas, t,
contrast_estimates, contrast_t) plus sigma, dfe, residuals. Records
fit_parameters and runs diagnostics() for the full VIF/cVIF/leverage/
collinearity set. AR models are gated on is_timeseries.
Also resolve multi-image result maps for display: select_map now parses
an optional image index (bare scalar or 'wh_image'), montage selects it
when given, and table auto-selects a single image (required because
statistic_image.table handles one image at a time).
Verified end-to-end on load_image_set('emotionreg'): fit -> dfe=28,
betas [35676x2], diagnostics populated, threshold and atlas-labeled
table() both working.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
build_design() delegates to fmri_glm_design_matrix.build, which convolves
onsets/durations with the basis set and assembles design.xX.X. After the
call the Dependent obj.X and obj.regressor_names read through to the built
design, so event-mode fit() works the same as direct mode.
Also harden the Dependent X / regressor_names accessors: the wrapped
design seeds xX as a 0x0 struct array (constructor uses 'name',{}), so
guard with isscalar() before indexing .X / .name to avoid an
"isempty: not enough input arguments" error pre-build.
Verified: glm_map(fmri_glm_design_matrix) -> build_design -> X [200x19]
from onsets, regressor names read through, event-mode fit() completes.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
import_SPM() populates a glm_map from an SPM structure, an SPM.mat path, or a directory containing one. Because fmri_glm_design_matrix mirrors SPM's schema, the import is a guarded copy of substructs into the wrapped design: xY.RT -> TR, nscan, xBF, Sess (onsets/durations/names/pmods/ covariates), and xX (matrix + names). Sets level=1 and is_timeseries=true. Optional 'load_betas' reads beta_*.nii/.img from SPM.swd into obj.betas, labeled by xX.name. Whole-substruct copy is resilient to SPM12 vs SPM25 auxiliary-field differences. Verified on a synthetic SPM struct: TR/onsets/condition_names/regressor names read through correctly, and fit() against a matching timeseries yields dfe = nscan - nregressors. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Add glm_map to the object-architecture class list in CLAUDE.md and to the image_vector :See also: block. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
12 functiontests covering construction and Dependent property read-through, design diagnostics (incl. rank-deficiency detection), direct-mode (2nd-level) fit over fmri_data.regress with contrasts, threshold delegation, event-mode build_design via the wrapped fmri_glm_design_matrix, import_SPM + event-mode fit on a synthetic SPM struct, and the main input-validation error paths (no design, AR without timeseries, contrast size mismatch). Auto-discovered by canlab_run_all_tests. All 12 pass (~5s). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Restructure glm_map around fmri_data.regress's results structure and make regress return a glm_map object instead of a plain struct. - glm_map properties now mirror the regress `out` fields, with nested .input_parameters / .input_image_metadata / .diagnostics structs, .df and .sigma kept as fmri_data, and out-struct-name aliases (.b, .con_t, .contrast_images, .resid, .variable_names, .C). Constructor re-casts a regress-style struct; disp lists all properties. - fmri_data.regress returns glm_map(out) (struct-compatible via aliases), replaces deprecated getvif with VIF.m, and adds contrast VIFs (cVIF) to out.diagnostics.Contrast_variance_inflation_factors. - diagnostics: consolidate on Variance_inflation_factors / Leverages names and add per-observation Cook's distance (Cooks_distance); fit auto-computes residuals for it and drops them unless requested. - summary: print a narrative report (analysis, model, diagnostics, and per-predictor/contrast significant-voxel counts at threshold). - Add docs/workflows/regression_with_glm_map.m walkthrough; update unit tests and CLAUDE.md. Co-Authored-By: Claude Fable 5 <[email protected]>
…ss examples Use a real predictor (Reappraisal_Success) from the emotionreg metadata_table and demonstrate the full set of fmri_data.regress example options against the returned glm_map object: regressor/analysis naming, displaying result maps (montage/surface/orthviews/get_wh_image), nested .diagnostics with a VIF/ leverage plot, a 2nd-level CSF nuisance covariate, gray-matter masking + FDR, outlier detection and exclusion, robust regression, residual output, brain-predicts-behavior and map-writing (commented), the estimator path with fit/summary, re-thresholding, and a first-level event design. Fix find/replace artifacts (.obj -> .dat) from the prior revision. Verified all computational sections run end-to-end on the sample dataset. Co-Authored-By: Claude Fable 5 <[email protected]>
Add @glm_map/validate_object, which fills any missing fields in the nested option/diagnostic structs (.input_parameters, .input_image_metadata, .diagnostics incl. .collinearity_report, and .fit_parameters) from canonical field templates without overwriting existing values. The templates are the single source of truth for the valid field names produced by fmri_data.regress and glm_map.diagnostics. - glm_map constructor now calls validate_object before returning, so even a freshly constructed (or struct-recast) object always exposes the complete schema for every nested struct. - fmri_data.regress calls validate_object on the glm_map output before returning it. Also picks up a docs/workflows tweak (run diagnostics(g) before reading .diagnostics fields). glm_map unit tests pass (14/14). Co-Authored-By: Claude Fable 5 <[email protected]>
…_map.replace_basis_set
Rebuild the design-matrix convolution on a high-resolution (16 samples/sec,
TR-independent) pipeline and audit/fix the class methods.
get_session_X: build a 16 Hz onset/epoch indicator per condition, convolve
with the per-condition basis (resampled to 16 Hz via xBF.dt), and downsample
to TR with getPredictors('dsrate', res*TR) -- the same engine onsets2fmridesign
uses. Matches onsets2fmridesign to corr >= 0.999 across TR = 0.8/1/1.3/2/2.5
(incl. fractional TRs), handles event durations (epochs), parametric
modulators, multiple sessions, and per-condition basis sets. Default basis is
now the canonical SPM HRF sampled at 16 Hz.
Audit fixes:
- build_single_trial: same hi-res timing fix; index xX(1) (dot-assign to an
empty struct failed); correct off-by-one in the intercept (iB) indices.
- build/build_single_trial check_model: make the " for Condition N" basis-name
suffix idempotent so repeated build() calls don't keep appending.
- add: condition_names is now correctly shared across sessions (or accepts
session-major names), per the documented behavior.
- replace_basis_set: harmonize struct fields between basis builders
(spm_get_bf vs fmri_spline_basis) so assignment no longer errors, and expand
xBF to one-per-condition so replacing one condition keeps the others.
- import_onsets: implement (was a non-functional stub) reading onset/duration/
name columns from a CSV/Excel file or table.
- plot: drop a dead `bf = obj.xBF.bf` line that errored for multi-condition xBF.
glm_map: add replace_basis_set, which delegates to the design's
replace_basis_set, rebuilds X, clears fit results/contrasts/diagnostics that no
longer apply (restoring the empty schema via validate_object), and either
re-fits (if data supplied) or refreshes design diagnostics.
Tests: add event-design timing checks (canonical HRF -> 3 cols; match
onsets2fmridesign across TRs) and a replace_basis_set test. 16/16 glm_map
tests pass.
Co-Authored-By: Claude Fable 5 <[email protected]>
…lotDesign hook; import_onsets FSL/SPM inputs Regressor roles: add wh_interest / wh_nuisance / wh_intercept Dependent indicators to glm_map. Event designs read the partition from design.xX (iH = of interest, iC/iG = nuisance, iB = intercept); direct/group designs detect the intercept (constant columns) and use a new nuisance_columns property to mark covariates of no interest. diagnostics(): now computes VIFs/cVIFs and the condition number for the full design AND for the regressors of interest only (nuisance covariates removed), storing both (full values stay in the main fields). By default it prints a narrative report that explains each metric's range/interpretation, warns on problematic cases (VIF/cVIF > threshold, condition number > 30/100, high Cook's distance, rank deficiency), and flags when nuisance covariates substantially inflate the of-interest VIFs (indicating correlation between events of interest and nuisance variables). plot_design(): hooks into plotDesign -- when an event design has <= 12 event types of interest, draws color-coded regressor line plots with event/duration boxes for the events of interest, alongside a heat map of the full design matrix (incl. nuisance covariates). All design heat maps use 'YDir','reverse' (observation 0 at top) and axis tight. import_onsets(): handles FSL/tabular files (CSV/Excel/table) with onset, duration, and a condition column that may be a string name OR an integer event-type code, and SPM-style cell arrays of onsets, durations, and parametric modulators (one cell per condition / event type). validate_object: diagnostics template extended with the new fields. Tests: add role-indicator, interest-only-diagnostics, and import_onsets variant tests. 19/19 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>
…nt type Replace the plotDesign call (which re-convolved onsets with its own canonical HRF and ignored the model's basis set) with direct rendering of the object's built design matrix. Each event type of interest gets its own panel showing its actual regressor(s) -- one line per basis function, so multi-basis designs (e.g. spline or HRF+derivatives) display correctly -- with filled boxes marking each event and its duration (box-drawing borrowed from drawbox). The full design-matrix heat map (incl. nuisance covariates, YDir reverse + axis tight) is shown alongside. Per-condition columns are located by basis-function counts at the front of the design (verified A->cols 1:4, B->col 5 for a 4-spline + canonical mix). 19/19 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>
…nterest/nuisance flagging create_orthogonal_contrast_set: new method on @glm_map and @fmri_glm_design_matrix. Uses create_orthogonal_contrast_set.m to assign a set of mutually orthogonal, zero-sum contrasts spanning the regressors of interest (glm_map.wh_interest / design xX.iH), with placeholder names; nuisance and intercept columns get zero weight. If no regressors of interest are defined, it explains how to set them and errors gracefully. diagnostics(): now reports design efficiency for contrasts via calcEfficiency.m -- using entered contrasts, or an auto orthogonal set spanning the regressors of interest when none are entered. When no regressors of interest are defined and no contrasts exist, efficiency is skipped with an informative note (other diagnostics still run). Efficiency fields added to the diagnostics struct / validate_object schema. Import interest/nuisance flagging: events entered via import_onsets become the of-interest regressors (the design's H partition); covariates of no interest become nuisance. import_SPM now derives the interest/nuisance/intercept partition from SPM event semantics (task regressors named '...*bf(...)', session 'constant', everything else nuisance) -- SPM's own xX.iH is typically empty for event-related first-level models -- preserving SPM's original indices under xX.spm_*. glm_map.import_onsets: new pass-through method that imports into the wrapped fmri_glm_design_matrix (FSL/tabular or SPM-style cells), bootstrapping the design from 'TR'/'nscan' if needed, then builds when enough info is available or prints what is missing and how to attach it. Tests: add orthogonal-contrast-set, efficiency, and import/SPM-flagging tests. 22/22 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>
…ummary, time axis Fix two reported issues and improve summaries: - diagnostics: the condition number is now computed after scaling each column to unit L2 norm (the Belsley scaled condition index). It is scale-invariant like VIF, so a well-conditioned design no longer reports a huge condition number that was purely an artifact of the tiny event regressors vs the unit intercept (e.g. 329 -> 1.74, consistent with VIFs ~1). - plot_design heat map: scale each regressor column to unit norm for display, so small-amplitude event regressors are visible instead of being washed out to black by the large-norm intercept. Event-panel x-axis labelled 'Time (seconds)' (time base is always seconds). - disp() and summary() now report the number of observations and regressors with the of-interest / nuisance / intercept breakdown, and list each regressor's role. (g.X already exposes the design matrix, reading through to design.xX.X; verified by test.) Tests: add a scaled-condition-number regression test (raw cond large, scaled small, VIFs low, g.X == design.xX.X). 23/23 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>
…ty/method name clash) glm_map had both a .diagnostics property (the results struct, mirroring fmri_data.regress out.diagnostics) and a diagnostics() method that computes it. MATLAB tolerates this (property via dot-access, method via function syntax, as statistic_image does with threshold), but g.diagnostics(args) silently resolves to property indexing and errors, which is a footgun. Rename the verb to run_diagnostics and keep the noun .diagnostics property (the load-bearing name: it mirrors regress's out.diagnostics and is read in 60+ places). Update the internal callers (fit, replace_basis_set), help examples, the workflow doc, CLAUDE.md, and tests. Call as: g = run_diagnostics(g, ...); results are stored in g.diagnostics. 23/23 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>
…n run_diagnostics, workflow sec 13 plot_design: when diagnostics have been computed, draw a VIF figure in the scn_spm_design_check style -- orange markers with severity reference lines at VIF = 1/2/4/8 -- showing VIFs of the regressors of interest in the full design and (when nuisance indicators exist) without nuisance covariates, plus contrast VIFs (cVIF) when present. Falls back to full-design VIFs only. run_diagnostics: for timeseries designs (is_timeseries) only, compute the cumulative power by frequency for each regressor of interest and each contrast (borrowing cumulative_power / fft_calc from scn_spm_choose_hpfilter), and recommend a high-pass filter cutoff that keeps the maximum low-frequency variance loss below 5% -- separately for regressors and for contrasts. Stored in .diagnostics.hpfilter and printed in the report (with the more conservative of the two as the suggested cutoff). The scaled condition number is unchanged. docs/workflows section 13: add a simple [1 -1] A-vs-B contrast and show plot_design before vs after run_diagnostics (the VIF figure appears after), and display the recommended HP filter. All method calls use run_diagnostics(). Tests: add an HP-filter / cumulative-power test (timeseries vs group). 24/24 glm_map tests pass. Co-Authored-By: Claude Fable 5 <[email protected]>
…adows canlab_pad)
CanlabCore's pad.m was renamed to canlab_pad.m, but onsets2fmridesign still
called pad() in the TR-resolution delta fallback. In R2024b+, 'pad' resolves to
MATLAB's builtin string pad, which errors ("First argument must be text") on the
numeric onset vector. On CI (no Image Processing Toolbox), padarray is
unavailable so the fallback was taken and crashed -- failing
canlab_test_glm_map/test_event_design_matches_onsets2fmridesign, which calls
onsets2fmridesign as a reference (the whole function runs even though the test
uses only the X output).
Replace the padarray/pad try-catch with an explicit one-sided zero-pad to the
run length (matching canlab_pad), removing the toolbox/version-fragile
dependency. X is unaffected (computed before this block). 24/24 glm_map tests
pass.
Co-Authored-By: Claude Fable 5 <[email protected]>
…renamed pad.m on R2024b+) CanlabCore's pad.m was renamed to canlab_pad.m; three callers still used pad(), which now resolves to MATLAB's builtin string pad and errors on numeric input in R2024b+ (same root cause as the onsets2fmridesign fix). - smooth_timeseries.m: pad -> canlab_pad (count semantics unchanged; correct). - nonlin_parammod_predfun.m (hrf): pad -> canlab_pad; the 2nd arg is a vector, so canlab_pad pads hrf to its length (unchanged behavior). - nonlin_parammod_predfun.m (yhi): the call passed a *target* length to pad, but canlab_pad takes a *count* of zeros; under the surrounding `if length(yhi) < runDuration` the intent is to pad yhi to runDuration, so use canlab_pad(yhi, runDuration - length(yhi)). Smoke-tested: smooth_timeseries (50->50) and nonlin_parammod_predfun (yhi length == runDuration) both run without the pad crash. Co-Authored-By: Claude Fable 5 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new object class
glm_map— a scikit-learn-style estimator for mass-univariate GLM / multiple regression on neuroimaging data. One object bundles:fmri_glm_design_matrixin.design(1st-level / event mode) or holds a direct.Xmatrix (2nd-level / group mode), plus contrasts.statistic_imageobjects, plussigma,dfe, residuals.glm_mapis a standalone container (composition, not animage_vectorsubclass);fmri_data.regressremains the compute engine. Several attributes (X,TR,onsets,durations,regressor_names,num_*,is_fitted) are true MATLAB Dependent properties that read through to the wrapped design, so there is a single source of truth.Workflow
Commits
disp; implementeddiagnostics,add_contrasts,threshold,montage,table,plot_design,summary,check_properties, privateselect_map.fit: orchestration overfmri_data.regress(robust/AR/threshold/contrasts/names), unpacks outputs into the result maps, runs full diagnostics; AR gated onis_timeseries. Plus multi-image display resolution.build_design: event/1st-level design viafmri_glm_design_matrix.build; hardened Dependent accessors against the pre-build 0×0xXstruct.import_SPM: SPM12/SPM25 first-level import (struct /SPM.matpath / directory), optional beta loading.glm_mapadded toCLAUDE.mdclass list and@image_vectorsee-also.Testing
All methods were exercised live in MATLAB (static analysis clean):
load_image_set('emotionreg')→dfe=28, betas[35676×2], diagnostics populated, atlas-labeledtableoutput.glm_map(fmri_glm_design_matrix)→build_design→ X[200×19]from onsets →fit.fit→dfe = nscan − nregressors.Notes:
fit,build_design, andimport_SPMare fully implemented. Known rough edges are inherited from downstream code (notglm_map):statistic_image.thresholdwarns on multi-image cluster-extent thresholding, andregion/tableerrors when zero regions survive a threshold.🤖 Generated with Claude Code