Skip to content

[ENH] add _normalize_dist_str utility for consistent distribution string aliases#1045

Open
paramsureliya wants to merge 4 commits into
sktime:mainfrom
paramsureliya:ENH/normalize-dist-str
Open

[ENH] add _normalize_dist_str utility for consistent distribution string aliases#1045
paramsureliya wants to merge 4 commits into
sktime:mainfrom
paramsureliya:ENH/normalize-dist-str

Conversation

@paramsureliya

Copy link
Copy Markdown

Part of #1023

Reference Issues/PRs

Part of #1023 first step toward uniformizing distribution string handling across all probabilistic regressors.

What does this implement/fix? Explain your changes.

Adds a shared _normalize_dist_str utility (skpro/regression/_dist_utils.py) that maps case-insensitive distribution string aliases to the canonical capitalized class name used internally in skpro, e.g. "gaussian""Normal", "lognormal""LogNormal", "t""TDistribution".

Integrates this into NGBoostAdapter (inherited by both NGBoostRegressor and NGBoostSurvival) as a proof of concept. Normalization happens at point-of-use inside the adapter methods, not in __init__, so get_params()/clone() compatibility is preserved.

The remaining regressors from #1023 (XGBoostLSS, ResidualDouble, CyclicBoosting, GAMRegressor, GLMRegressor, GlumRegressor) will be addressed in follow-up PRs.

Does your contribution introduce a new dependency? If yes, which one?

No new dependencies.

What should a reviewer concentrate their feedback on?

  • The alias map in _dist_utils.py any missing or incorrectly mapped aliases?
  • The integration pattern in NGBoostAdapter normalization into a local dist variable at point-of-use, leaving self.dist unchanged for sklearn compatibility.

Did you add any tests for the change?

Yes skpro/regression/tests/test_dist_utils.py with 57 tests across 3 classes:

  • TestNormalizeDistStr: unit tests for every alias, canonical passthrough, non-string passthrough, unknown string warning, and idempotency
  • TestCrossRegressorAliasConsistency: invariant tests ensuring the same alias resolves identically regardless of which regressor calls the function
  • TestNGBoostRegressorAliases: end-to-end integration tests for NGBoostRegressor (auto-skipped if ngboost not installed)

Any other comments?

None.

- Add skpro/regression/_dist_utils.py with _normalize_dist_str that maps
  case-insensitive aliases to canonical distribution class names
  (e.g. 'gaussian' -> 'Normal', 'lognormal' -> 'LogNormal')
- Integrate into NGBoostAdapter so NGBoostRegressor and NGBoostSurvival
  both accept all aliases transparently
- Update docstrings in NGBoostRegressor and NGBoostSurvival
- Add test_dist_utils.py with 57 unit and integration tests

Part of  sktime#1023

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

If it is already done though, we should apply it to all places where distribution (string) aliasing happens, consistently.

Further, I would only advertise the "canonical" string in the docstrings, in order to uniformize future usage. As "canonical" strings, I would use the class names as in skpro. Alternative aliases for downwards compatibility can still function but be unadvertised.

@paramsureliya

Copy link
Copy Markdown
Author

Hi @fkiraly I've applied _normalize_dist_str to all remaining regressors. can you please review it.

@paramsureliya paramsureliya requested a review from fkiraly May 6, 2026 14:57

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Question, why are you reformatting or changing docstrings that looked like they aere good? E.g., why are you changing NGBoostRegressor, and why are you introducing newlines in places unrelated? Looks like AI use.

@paramsureliya

Copy link
Copy Markdown
Author

Hi @fkiraly fixed the isort failure.
Regarding the NGBoostRegressor docstring change: I updated it based on your earlier feedback to "only advertise the canonical string in the docstrings." Happy to revert it if you prefer to keep the original format there just let me know.

@paramsureliya paramsureliya requested a review from fkiraly May 15, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants