Skip to content

SGLCV fails with too few observations for CV #73

Description

@jonas-hag

Describe the bug

When there are too few observations for the CV, SGLCV fails with an uninformative UnboundLocalError. This happens with groupyr 0.2.6, if I recall correctly I didn't have the problem with 0.2.4

Steps/Code to Reproduce

import numpy as np
import groupyr as gr

y = np.array([8.35686197e-01, 7.79143707e-01, 9.68885893e-01, 6.00364059e-01,
 8.90818433e-01, 4.50071502e-01, 5.50324868e-04, 3.23702083e-01,
 3.26413651e-01])
X = np.array([[0.95834536, 0.24640152, 0.91383425, 0.36952137],
 [0.18028435, 0.34682591, 0.43773007, 0.7074315],
 [0.54305304, 0.55150522,0.03017366, 0.07321698],
 [0.49662785, 0.17114838, 0.61342598, 0.15094963],
 [0.66625233, 0.38015984, 0.51422898, 0.66124242],
 [0.95193769, 0.10298654, 0.03773045, 0.21904723],
 [0.34889582, 0.04983091, 0.13862843, 0.23390294],
 [0.05570983, 0.65507907, 0.74365214, 0.99539654],
 [0.01563651, 0.75173544, 0.56747472, 0.31385082]]
)
l1_ratio = 0.0008299164840661392
groups = [np.array([0, 1]), np.array([2, 3])]

model = gr.SGLCV(
            l1_ratio=l1_ratio,
            groups=groups,
            scale_l2_by="group_length",
            cv=5,
            random_state=1234
        ).fit(X=X, y=y)

Expected Results

A clear error message why it didn't work.

Actual Results

/path/to/lib/python3.8/site-packages/sklearn/metrics/_regression.py:796: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
[the UndefinedMetricWarning is repeated several times]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/path/to/lib/python3.8/site-packages/groupyr/sgl.py", line 1120, in fit
    self.l1_ratio_ = best_l1_ratio
UnboundLocalError: local variable 'best_l1_ratio' referenced before assignment

However, if I use less folds, it works:

model = gr.SGLCV(
            l1_ratio=l1_ratio,
            groups=groups,
            scale_l2_by="group_length",
            cv=3,
            random_state=1234
        ).fit(X=X, y=y)

Comment

I think the error is because one fold only has 1 observation which I guess leads to a wrong R^2 metric and later on to some uncaught errors in the groupyr code. I'm not well versed with scikit-learn, so I don't know if a fix would be better in the scikit-learn code or in groupyr. However, it would be nice to get an informative error message instead of an error due to groupyr internals.

Versions

groupyr 0.2.6
scikit-learn 1.0.2
scikit-optimize 0.9.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions