Skip to content

Add glm_map object class: scikit-learn-style mass-univariate GLM estimator#77

Merged
torwager merged 18 commits into
masterfrom
glm_map_object
Jun 19, 2026
Merged

Add glm_map object class: scikit-learn-style mass-univariate GLM estimator#77
torwager merged 18 commits into
masterfrom
glm_map_object

Conversation

@torwager

Copy link
Copy Markdown
Contributor

Summary

Adds a new object class glm_map — a scikit-learn-style estimator for mass-univariate GLM / multiple regression on neuroimaging data. One object bundles:

  1. Design specification — wraps an fmri_glm_design_matrix in .design (1st-level / event mode) or holds a direct .X matrix (2nd-level / group mode), plus contrasts.
  2. Fitted result maps — betas, t, contrast estimates, and contrast t as statistic_image objects, plus sigma, dfe, residuals.
  3. Design diagnostics — VIF, contrast VIF (cVIF), leverage, condition number, rank, and a redundant/near-collinear column report.

glm_map is a standalone container (composition, not an image_vector subclass); fmri_data.regress remains the compute engine. Several attributes (X, TR, onsets, durations, regressor_names, num_*, is_fitted) are true MATLAB Dependent properties that read through to the wrapped design, so there is a single source of truth.

Workflow

% 2nd-level / group (direct design)
g = glm_map('X', X, 'level', 2, 'regressor_names', {'intercept' 'cov'});
g = add_contrasts(g, [0 1], {'cov_effect'});
g = fit(g, fmri_data_obj);          % wraps fmri_data.regress
diagnostics(g); table(g, 'contrast'); montage(g, 'contrast_t');

% 1st-level / event design
g = glm_map(fmri_glm_design_matrix(TR, 'onsets', ons, 'condition_names', names));
g = build_design(g);                % onsets -> X via basis convolution
g = fit(g, timeseries_obj);

% Import an SPM12/SPM25 first-level model
g = import_SPM(glm_map, '/path/SPM.mat');

Commits

  • Scaffold: classdef, stored + Dependent properties, constructor, disp; implemented diagnostics, add_contrasts, threshold, montage, table, plot_design, summary, check_properties, private select_map.
  • fit: orchestration over fmri_data.regress (robust/AR/threshold/contrasts/names), unpacks outputs into the result maps, runs full diagnostics; AR gated on is_timeseries. Plus multi-image display resolution.
  • build_design: event/1st-level design via fmri_glm_design_matrix.build; hardened Dependent accessors against the pre-build 0×0 xX struct.
  • import_SPM: SPM12/SPM25 first-level import (struct / SPM.mat path / directory), optional beta loading.
  • Docs: glm_map added to CLAUDE.md class list and @image_vector see-also.

Testing

All methods were exercised live in MATLAB (static analysis clean):

  • Direct/group fit on load_image_set('emotionreg')dfe=28, betas [35676×2], diagnostics populated, atlas-labeled table output.
  • Event-mode: glm_map(fmri_glm_design_matrix)build_design → X [200×19] from onsets → fit.
  • SPM import: synthetic SPM struct → onsets/condition-names/regressor-names read through → fitdfe = nscan − nregressors.

Notes: fit, build_design, and import_SPM are fully implemented. Known rough edges are inherited from downstream code (not glm_map): statistic_image.threshold warns on multi-image cluster-extent thresholding, and region/table errors when zero regions survive a threshold.

🤖 Generated with Claude Code

torwager and others added 18 commits June 17, 2026 19:23
New scikit-learn-style estimator class that bundles GLM design
specification, fitted result maps (statistic_image), and design
diagnostics in one container. Composition over fmri_glm_design_matrix
(wrapped in .design) and fmri_data.regress (the compute engine).

- classdef with stored properties (design/level/is_timeseries,
  betas/t/contrast maps, vif/leverage/collinearity diagnostics,
  provenance) and true Dependent accessors (X, TR, onsets, durations,
  regressor_names, num_*, is_fitted) that read through to .design.
- Implemented + MATLAB-verified: diagnostics (VIF/cVIF/leverage/
  condition number/rank/redundant-column report), add_contrasts,
  threshold, montage, table, plot_design, summary, check_properties,
  private select_map.
- Documented stubs with planned field mappings: fit (wraps regress),
  build_design (wraps fmri_glm_design_matrix.build), import_SPM
  (SPM12/SPM25 -> .design).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
fit() is now a working orchestration layer: validates the design,
attaches obj.X to the fmri_data, assembles regress() arguments
(robust/AR/threshold/contrasts/names), runs regress, and unpacks the
outputs into the object's statistic_image maps (betas, t,
contrast_estimates, contrast_t) plus sigma, dfe, residuals. Records
fit_parameters and runs diagnostics() for the full VIF/cVIF/leverage/
collinearity set. AR models are gated on is_timeseries.

Also resolve multi-image result maps for display: select_map now parses
an optional image index (bare scalar or 'wh_image'), montage selects it
when given, and table auto-selects a single image (required because
statistic_image.table handles one image at a time).

Verified end-to-end on load_image_set('emotionreg'): fit -> dfe=28,
betas [35676x2], diagnostics populated, threshold and atlas-labeled
table() both working.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
build_design() delegates to fmri_glm_design_matrix.build, which convolves
onsets/durations with the basis set and assembles design.xX.X. After the
call the Dependent obj.X and obj.regressor_names read through to the built
design, so event-mode fit() works the same as direct mode.

Also harden the Dependent X / regressor_names accessors: the wrapped
design seeds xX as a 0x0 struct array (constructor uses 'name',{}), so
guard with isscalar() before indexing .X / .name to avoid an
"isempty: not enough input arguments" error pre-build.

Verified: glm_map(fmri_glm_design_matrix) -> build_design -> X [200x19]
from onsets, regressor names read through, event-mode fit() completes.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
import_SPM() populates a glm_map from an SPM structure, an SPM.mat path,
or a directory containing one. Because fmri_glm_design_matrix mirrors
SPM's schema, the import is a guarded copy of substructs into the wrapped
design: xY.RT -> TR, nscan, xBF, Sess (onsets/durations/names/pmods/
covariates), and xX (matrix + names). Sets level=1 and is_timeseries=true.
Optional 'load_betas' reads beta_*.nii/.img from SPM.swd into obj.betas,
labeled by xX.name. Whole-substruct copy is resilient to SPM12 vs SPM25
auxiliary-field differences.

Verified on a synthetic SPM struct: TR/onsets/condition_names/regressor
names read through correctly, and fit() against a matching timeseries
yields dfe = nscan - nregressors.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Add glm_map to the object-architecture class list in CLAUDE.md and to the
image_vector :See also: block.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
12 functiontests covering construction and Dependent property
read-through, design diagnostics (incl. rank-deficiency detection),
direct-mode (2nd-level) fit over fmri_data.regress with contrasts,
threshold delegation, event-mode build_design via the wrapped
fmri_glm_design_matrix, import_SPM + event-mode fit on a synthetic SPM
struct, and the main input-validation error paths (no design, AR without
timeseries, contrast size mismatch).

Auto-discovered by canlab_run_all_tests. All 12 pass (~5s).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Restructure glm_map around fmri_data.regress's results structure and make
regress return a glm_map object instead of a plain struct.

- glm_map properties now mirror the regress `out` fields, with nested
  .input_parameters / .input_image_metadata / .diagnostics structs, .df and
  .sigma kept as fmri_data, and out-struct-name aliases (.b, .con_t,
  .contrast_images, .resid, .variable_names, .C). Constructor re-casts a
  regress-style struct; disp lists all properties.
- fmri_data.regress returns glm_map(out) (struct-compatible via aliases),
  replaces deprecated getvif with VIF.m, and adds contrast VIFs (cVIF) to
  out.diagnostics.Contrast_variance_inflation_factors.
- diagnostics: consolidate on Variance_inflation_factors / Leverages names
  and add per-observation Cook's distance (Cooks_distance); fit auto-computes
  residuals for it and drops them unless requested.
- summary: print a narrative report (analysis, model, diagnostics, and
  per-predictor/contrast significant-voxel counts at threshold).
- Add docs/workflows/regression_with_glm_map.m walkthrough; update unit
  tests and CLAUDE.md.

Co-Authored-By: Claude Fable 5 <[email protected]>
…ss examples

Use a real predictor (Reappraisal_Success) from the emotionreg metadata_table
and demonstrate the full set of fmri_data.regress example options against the
returned glm_map object: regressor/analysis naming, displaying result maps
(montage/surface/orthviews/get_wh_image), nested .diagnostics with a VIF/
leverage plot, a 2nd-level CSF nuisance covariate, gray-matter masking + FDR,
outlier detection and exclusion, robust regression, residual output,
brain-predicts-behavior and map-writing (commented), the estimator path with
fit/summary, re-thresholding, and a first-level event design. Fix find/replace
artifacts (.obj -> .dat) from the prior revision. Verified all computational
sections run end-to-end on the sample dataset.

Co-Authored-By: Claude Fable 5 <[email protected]>
Add @glm_map/validate_object, which fills any missing fields in the nested
option/diagnostic structs (.input_parameters, .input_image_metadata,
.diagnostics incl. .collinearity_report, and .fit_parameters) from canonical
field templates without overwriting existing values. The templates are the
single source of truth for the valid field names produced by
fmri_data.regress and glm_map.diagnostics.

- glm_map constructor now calls validate_object before returning, so even a
  freshly constructed (or struct-recast) object always exposes the complete
  schema for every nested struct.
- fmri_data.regress calls validate_object on the glm_map output before
  returning it.

Also picks up a docs/workflows tweak (run diagnostics(g) before reading
.diagnostics fields). glm_map unit tests pass (14/14).

Co-Authored-By: Claude Fable 5 <[email protected]>
…_map.replace_basis_set

Rebuild the design-matrix convolution on a high-resolution (16 samples/sec,
TR-independent) pipeline and audit/fix the class methods.

get_session_X: build a 16 Hz onset/epoch indicator per condition, convolve
with the per-condition basis (resampled to 16 Hz via xBF.dt), and downsample
to TR with getPredictors('dsrate', res*TR) -- the same engine onsets2fmridesign
uses. Matches onsets2fmridesign to corr >= 0.999 across TR = 0.8/1/1.3/2/2.5
(incl. fractional TRs), handles event durations (epochs), parametric
modulators, multiple sessions, and per-condition basis sets. Default basis is
now the canonical SPM HRF sampled at 16 Hz.

Audit fixes:
- build_single_trial: same hi-res timing fix; index xX(1) (dot-assign to an
  empty struct failed); correct off-by-one in the intercept (iB) indices.
- build/build_single_trial check_model: make the " for Condition N" basis-name
  suffix idempotent so repeated build() calls don't keep appending.
- add: condition_names is now correctly shared across sessions (or accepts
  session-major names), per the documented behavior.
- replace_basis_set: harmonize struct fields between basis builders
  (spm_get_bf vs fmri_spline_basis) so assignment no longer errors, and expand
  xBF to one-per-condition so replacing one condition keeps the others.
- import_onsets: implement (was a non-functional stub) reading onset/duration/
  name columns from a CSV/Excel file or table.
- plot: drop a dead `bf = obj.xBF.bf` line that errored for multi-condition xBF.

glm_map: add replace_basis_set, which delegates to the design's
replace_basis_set, rebuilds X, clears fit results/contrasts/diagnostics that no
longer apply (restoring the empty schema via validate_object), and either
re-fits (if data supplied) or refreshes design diagnostics.

Tests: add event-design timing checks (canonical HRF -> 3 cols; match
onsets2fmridesign across TRs) and a replace_basis_set test. 16/16 glm_map
tests pass.

Co-Authored-By: Claude Fable 5 <[email protected]>
…lotDesign hook; import_onsets FSL/SPM inputs

Regressor roles: add wh_interest / wh_nuisance / wh_intercept Dependent
indicators to glm_map. Event designs read the partition from design.xX
(iH = of interest, iC/iG = nuisance, iB = intercept); direct/group designs
detect the intercept (constant columns) and use a new nuisance_columns
property to mark covariates of no interest.

diagnostics(): now computes VIFs/cVIFs and the condition number for the full
design AND for the regressors of interest only (nuisance covariates removed),
storing both (full values stay in the main fields). By default it prints a
narrative report that explains each metric's range/interpretation, warns on
problematic cases (VIF/cVIF > threshold, condition number > 30/100, high
Cook's distance, rank deficiency), and flags when nuisance covariates
substantially inflate the of-interest VIFs (indicating correlation between
events of interest and nuisance variables).

plot_design(): hooks into plotDesign -- when an event design has <= 12 event
types of interest, draws color-coded regressor line plots with event/duration
boxes for the events of interest, alongside a heat map of the full design
matrix (incl. nuisance covariates). All design heat maps use
'YDir','reverse' (observation 0 at top) and axis tight.

import_onsets(): handles FSL/tabular files (CSV/Excel/table) with onset,
duration, and a condition column that may be a string name OR an integer
event-type code, and SPM-style cell arrays of onsets, durations, and
parametric modulators (one cell per condition / event type).

validate_object: diagnostics template extended with the new fields.
Tests: add role-indicator, interest-only-diagnostics, and import_onsets
variant tests. 19/19 glm_map tests pass.

Co-Authored-By: Claude Fable 5 <[email protected]>
…nt type

Replace the plotDesign call (which re-convolved onsets with its own canonical
HRF and ignored the model's basis set) with direct rendering of the object's
built design matrix. Each event type of interest gets its own panel showing
its actual regressor(s) -- one line per basis function, so multi-basis designs
(e.g. spline or HRF+derivatives) display correctly -- with filled boxes marking
each event and its duration (box-drawing borrowed from drawbox). The full
design-matrix heat map (incl. nuisance covariates, YDir reverse + axis tight)
is shown alongside. Per-condition columns are located by basis-function counts
at the front of the design (verified A->cols 1:4, B->col 5 for a 4-spline +
canonical mix). 19/19 glm_map tests pass.

Co-Authored-By: Claude Fable 5 <[email protected]>
…nterest/nuisance flagging

create_orthogonal_contrast_set: new method on @glm_map and
@fmri_glm_design_matrix. Uses create_orthogonal_contrast_set.m to assign a set
of mutually orthogonal, zero-sum contrasts spanning the regressors of interest
(glm_map.wh_interest / design xX.iH), with placeholder names; nuisance and
intercept columns get zero weight. If no regressors of interest are defined,
it explains how to set them and errors gracefully.

diagnostics(): now reports design efficiency for contrasts via
calcEfficiency.m -- using entered contrasts, or an auto orthogonal set
spanning the regressors of interest when none are entered. When no regressors
of interest are defined and no contrasts exist, efficiency is skipped with an
informative note (other diagnostics still run). Efficiency fields added to the
diagnostics struct / validate_object schema.

Import interest/nuisance flagging: events entered via import_onsets become the
of-interest regressors (the design's H partition); covariates of no interest
become nuisance. import_SPM now derives the interest/nuisance/intercept
partition from SPM event semantics (task regressors named '...*bf(...)',
session 'constant', everything else nuisance) -- SPM's own xX.iH is typically
empty for event-related first-level models -- preserving SPM's original
indices under xX.spm_*.

glm_map.import_onsets: new pass-through method that imports into the wrapped
fmri_glm_design_matrix (FSL/tabular or SPM-style cells), bootstrapping the
design from 'TR'/'nscan' if needed, then builds when enough info is available
or prints what is missing and how to attach it.

Tests: add orthogonal-contrast-set, efficiency, and import/SPM-flagging tests.
22/22 glm_map tests pass.

Co-Authored-By: Claude Fable 5 <[email protected]>
…ummary, time axis

Fix two reported issues and improve summaries:

- diagnostics: the condition number is now computed after scaling each column
  to unit L2 norm (the Belsley scaled condition index). It is scale-invariant
  like VIF, so a well-conditioned design no longer reports a huge condition
  number that was purely an artifact of the tiny event regressors vs the unit
  intercept (e.g. 329 -> 1.74, consistent with VIFs ~1).

- plot_design heat map: scale each regressor column to unit norm for display,
  so small-amplitude event regressors are visible instead of being washed out
  to black by the large-norm intercept. Event-panel x-axis labelled
  'Time (seconds)' (time base is always seconds).

- disp() and summary() now report the number of observations and regressors
  with the of-interest / nuisance / intercept breakdown, and list each
  regressor's role. (g.X already exposes the design matrix, reading through to
  design.xX.X; verified by test.)

Tests: add a scaled-condition-number regression test (raw cond large, scaled
small, VIFs low, g.X == design.xX.X). 23/23 glm_map tests pass.

Co-Authored-By: Claude Fable 5 <[email protected]>
…ty/method name clash)

glm_map had both a .diagnostics property (the results struct, mirroring
fmri_data.regress out.diagnostics) and a diagnostics() method that computes it.
MATLAB tolerates this (property via dot-access, method via function syntax,
as statistic_image does with threshold), but g.diagnostics(args) silently
resolves to property indexing and errors, which is a footgun.

Rename the verb to run_diagnostics and keep the noun .diagnostics property
(the load-bearing name: it mirrors regress's out.diagnostics and is read in 60+
places). Update the internal callers (fit, replace_basis_set), help examples,
the workflow doc, CLAUDE.md, and tests. Call as: g = run_diagnostics(g, ...);
results are stored in g.diagnostics. 23/23 glm_map tests pass.

Co-Authored-By: Claude Fable 5 <[email protected]>
…n run_diagnostics, workflow sec 13

plot_design: when diagnostics have been computed, draw a VIF figure in the
scn_spm_design_check style -- orange markers with severity reference lines at
VIF = 1/2/4/8 -- showing VIFs of the regressors of interest in the full design
and (when nuisance indicators exist) without nuisance covariates, plus contrast
VIFs (cVIF) when present. Falls back to full-design VIFs only.

run_diagnostics: for timeseries designs (is_timeseries) only, compute the
cumulative power by frequency for each regressor of interest and each contrast
(borrowing cumulative_power / fft_calc from scn_spm_choose_hpfilter), and
recommend a high-pass filter cutoff that keeps the maximum low-frequency
variance loss below 5% -- separately for regressors and for contrasts. Stored
in .diagnostics.hpfilter and printed in the report (with the more conservative
of the two as the suggested cutoff). The scaled condition number is unchanged.

docs/workflows section 13: add a simple [1 -1] A-vs-B contrast and show
plot_design before vs after run_diagnostics (the VIF figure appears after),
and display the recommended HP filter. All method calls use run_diagnostics().

Tests: add an HP-filter / cumulative-power test (timeseries vs group). 24/24
glm_map tests pass.

Co-Authored-By: Claude Fable 5 <[email protected]>
…adows canlab_pad)

CanlabCore's pad.m was renamed to canlab_pad.m, but onsets2fmridesign still
called pad() in the TR-resolution delta fallback. In R2024b+, 'pad' resolves to
MATLAB's builtin string pad, which errors ("First argument must be text") on the
numeric onset vector. On CI (no Image Processing Toolbox), padarray is
unavailable so the fallback was taken and crashed -- failing
canlab_test_glm_map/test_event_design_matches_onsets2fmridesign, which calls
onsets2fmridesign as a reference (the whole function runs even though the test
uses only the X output).

Replace the padarray/pad try-catch with an explicit one-sided zero-pad to the
run length (matching canlab_pad), removing the toolbox/version-fragile
dependency. X is unaffected (computed before this block). 24/24 glm_map tests
pass.

Co-Authored-By: Claude Fable 5 <[email protected]>
…renamed pad.m on R2024b+)

CanlabCore's pad.m was renamed to canlab_pad.m; three callers still used pad(),
which now resolves to MATLAB's builtin string pad and errors on numeric input
in R2024b+ (same root cause as the onsets2fmridesign fix).

- smooth_timeseries.m: pad -> canlab_pad (count semantics unchanged; correct).
- nonlin_parammod_predfun.m (hrf): pad -> canlab_pad; the 2nd arg is a vector,
  so canlab_pad pads hrf to its length (unchanged behavior).
- nonlin_parammod_predfun.m (yhi): the call passed a *target* length to pad,
  but canlab_pad takes a *count* of zeros; under the surrounding
  `if length(yhi) < runDuration` the intent is to pad yhi to runDuration, so
  use canlab_pad(yhi, runDuration - length(yhi)).

Smoke-tested: smooth_timeseries (50->50) and nonlin_parammod_predfun
(yhi length == runDuration) both run without the pad crash.

Co-Authored-By: Claude Fable 5 <[email protected]>
@torwager torwager merged commit b85ff7c into master Jun 19, 2026
1 check passed
@torwager torwager deleted the glm_map_object branch June 19, 2026 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant