[ENH] add _normalize_dist_str utility for consistent distribution string aliases#1045
[ENH] add _normalize_dist_str utility for consistent distribution string aliases#1045paramsureliya wants to merge 4 commits into
Conversation
- Add skpro/regression/_dist_utils.py with _normalize_dist_str that maps case-insensitive aliases to canonical distribution class names (e.g. 'gaussian' -> 'Normal', 'lognormal' -> 'LogNormal') - Integrate into NGBoostAdapter so NGBoostRegressor and NGBoostSurvival both accept all aliases transparently - Update docstrings in NGBoostRegressor and NGBoostSurvival - Add test_dist_utils.py with 57 unit and integration tests Part of sktime#1023
fkiraly
left a comment
There was a problem hiding this comment.
This is great!
If it is already done though, we should apply it to all places where distribution (string) aliasing happens, consistently.
Further, I would only advertise the "canonical" string in the docstrings, in order to uniformize future usage. As "canonical" strings, I would use the class names as in skpro. Alternative aliases for downwards compatibility can still function but be unadvertised.
|
Hi @fkiraly I've applied _normalize_dist_str to all remaining regressors. can you please review it. |
fkiraly
left a comment
There was a problem hiding this comment.
Thanks!
Question, why are you reformatting or changing docstrings that looked like they aere good? E.g., why are you changing NGBoostRegressor, and why are you introducing newlines in places unrelated? Looks like AI use.
|
Hi @fkiraly fixed the isort failure. |
Part of #1023
Reference Issues/PRs
Part of #1023 first step toward uniformizing distribution string handling across all probabilistic regressors.
What does this implement/fix? Explain your changes.
Adds a shared
_normalize_dist_strutility (skpro/regression/_dist_utils.py) that maps case-insensitive distribution string aliases to the canonical capitalized class name used internally in skpro, e.g."gaussian"→"Normal","lognormal"→"LogNormal","t"→"TDistribution".Integrates this into
NGBoostAdapter(inherited by bothNGBoostRegressorandNGBoostSurvival) as a proof of concept. Normalization happens at point-of-use inside the adapter methods, not in__init__, soget_params()/clone()compatibility is preserved.The remaining regressors from #1023 (XGBoostLSS, ResidualDouble, CyclicBoosting, GAMRegressor, GLMRegressor, GlumRegressor) will be addressed in follow-up PRs.
Does your contribution introduce a new dependency? If yes, which one?
No new dependencies.
What should a reviewer concentrate their feedback on?
_dist_utils.pyany missing or incorrectly mapped aliases?NGBoostAdapternormalization into a localdistvariable at point-of-use, leavingself.distunchanged for sklearn compatibility.Did you add any tests for the change?
Yes
skpro/regression/tests/test_dist_utils.pywith 57 tests across 3 classes:TestNormalizeDistStr: unit tests for every alias, canonical passthrough, non-string passthrough, unknown string warning, and idempotencyTestCrossRegressorAliasConsistency: invariant tests ensuring the same alias resolves identically regardless of which regressor calls the functionTestNGBoostRegressorAliases: end-to-end integration tests for NGBoostRegressor (auto-skipped if ngboost not installed)Any other comments?
None.