Skip to content

[ENH] add online update to BaggingRegressor#1064

Open
patelchaitany wants to merge 1 commit into
sktime:mainfrom
patelchaitany:enh/bagging-regressor-update
Open

[ENH] add online update to BaggingRegressor#1064
patelchaitany wants to merge 1 commit into
sktime:mainfrom
patelchaitany:enh/bagging-regressor-update

Conversation

@patelchaitany

Copy link
Copy Markdown
Member

Reference Issues/PRs

Partially addresses #1049.

What does this implement/fix? Explain your changes.

Adds online update support to BaggingRegressor.

On update, each fitted bagged clone is updated on a row subsample of the incoming batch, using the same n_samples and bootstrap settings as in fit. Feature subsets cols_[i] from fit are reused (column sampling is not repeated). Sets capability:update=True on the meta-estimator so the public update path runs; meaningful incremental learning still depends on the inner regressor (batch-only inners effectively no-op). When bootstrap=False, subsample size is capped at the update batch size so small batches do not error.

Docstring updated to describe update behaviour.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

  • Whether _update row subsampling matches fit semantics (including bootstrap=False on small update batches).

Did you add any tests for the change?

No dedicated tests added. Covered by existing test_online_update in test_all_regressors.py for BaggingRegressor via get_test_params().

Any other comments?

Follow-up for #1049 may include EnbpiRegressor and a River wrapper for online inner estimators.

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured dependency isolation, see the estimator dependencies guide.

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

For the algorithm to make sense, in the case where n_samples is an integer, a fraction n_samples / n should be computed in _fit, and applied to the sample in _update.

That is, the row selection probability should remain the same for _fit and _update.

With this insight, it should also be possible to merge the two "resolve size" methods into a single one.

Implement _update so each bagged clone is updated on a row subsample of
the incoming batch (same n_samples and bootstrap as fit) with cols_[i]
fixed from fit. Set capability:update on the meta-estimator.

Partially addresses sktime#1049.
@patelchaitany patelchaitany force-pushed the enh/bagging-regressor-update branch from 1520b1e to 4e1b46f Compare June 8, 2026 07:16
@patelchaitany patelchaitany requested a review from fkiraly June 8, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement module:regression probabilistic regression module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants