Describe the bug
When there are too few observations for the CV, SGLCV fails with an uninformative UnboundLocalError. This happens with groupyr 0.2.6, if I recall correctly I didn't have the problem with 0.2.4
Steps/Code to Reproduce
import numpy as np
import groupyr as gr
y = np.array([8.35686197e-01, 7.79143707e-01, 9.68885893e-01, 6.00364059e-01,
8.90818433e-01, 4.50071502e-01, 5.50324868e-04, 3.23702083e-01,
3.26413651e-01])
X = np.array([[0.95834536, 0.24640152, 0.91383425, 0.36952137],
[0.18028435, 0.34682591, 0.43773007, 0.7074315],
[0.54305304, 0.55150522,0.03017366, 0.07321698],
[0.49662785, 0.17114838, 0.61342598, 0.15094963],
[0.66625233, 0.38015984, 0.51422898, 0.66124242],
[0.95193769, 0.10298654, 0.03773045, 0.21904723],
[0.34889582, 0.04983091, 0.13862843, 0.23390294],
[0.05570983, 0.65507907, 0.74365214, 0.99539654],
[0.01563651, 0.75173544, 0.56747472, 0.31385082]]
)
l1_ratio = 0.0008299164840661392
groups = [np.array([0, 1]), np.array([2, 3])]
model = gr.SGLCV(
l1_ratio=l1_ratio,
groups=groups,
scale_l2_by="group_length",
cv=5,
random_state=1234
).fit(X=X, y=y)
Expected Results
A clear error message why it didn't work.
Actual Results
/path/to/lib/python3.8/site-packages/sklearn/metrics/_regression.py:796: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
warnings.warn(msg, UndefinedMetricWarning)
[the UndefinedMetricWarning is repeated several times]
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/path/to/lib/python3.8/site-packages/groupyr/sgl.py", line 1120, in fit
self.l1_ratio_ = best_l1_ratio
UnboundLocalError: local variable 'best_l1_ratio' referenced before assignment
However, if I use less folds, it works:
model = gr.SGLCV(
l1_ratio=l1_ratio,
groups=groups,
scale_l2_by="group_length",
cv=3,
random_state=1234
).fit(X=X, y=y)
Comment
I think the error is because one fold only has 1 observation which I guess leads to a wrong R^2 metric and later on to some uncaught errors in the groupyr code. I'm not well versed with scikit-learn, so I don't know if a fix would be better in the scikit-learn code or in groupyr. However, it would be nice to get an informative error message instead of an error due to groupyr internals.
Versions
groupyr 0.2.6
scikit-learn 1.0.2
scikit-optimize 0.9.0
Describe the bug
When there are too few observations for the CV,
SGLCVfails with an uninformativeUnboundLocalError. This happens withgroupyr 0.2.6, if I recall correctly I didn't have the problem with0.2.4Steps/Code to Reproduce
Expected Results
A clear error message why it didn't work.
Actual Results
However, if I use less folds, it works:
Comment
I think the error is because one fold only has 1 observation which I guess leads to a wrong R^2 metric and later on to some uncaught errors in the
groupyrcode. I'm not well versed withscikit-learn, so I don't know if a fix would be better in thescikit-learncode or ingroupyr. However, it would be nice to get an informative error message instead of an error due togroupyrinternals.Versions