Skip to content

Fix randomized PCA transform after fit#1024

Open
lanarkite99 wants to merge 1 commit into
dask:mainfrom
lanarkite99:resume/dask-ml-pca-randomized-transform
Open

Fix randomized PCA transform after fit#1024
lanarkite99 wants to merge 1 commit into
dask:mainfrom
lanarkite99:resume/dask-ml-pca-randomized-transform

Conversation

@lanarkite99

Copy link
Copy Markdown

Fixes #1023

This PR fixes dask_ml.decomposition.PCA.transform after fit when svd_solver="randomized" is used with newer scikit-learn versions.

  • Dask-ML's PCA subclasses sklearn's PCA, but its custom initializer did not define power_iteration_normalizer.
  • Newer sklearn tag logic reads this attribute inside PCA.__sklearn_tags__, which is reached by check_is_fitted during transform.
  • As a result, fit(...).transform(...) failed with an AttributeError.

The fix sets power_iteration_normalizer to sklearn's defualt value, "auto", preserving current Dask-ML behavior while keeping the estimator compatible with sklearn's inherited tag logic.

added a regression test covering randomized PCA fit followed by transform.

tested:

python -m pytest tests/test_pca.py::test_pca_randomized_transform_after_fit tests/test_pca.py::test_pca_randomized_solver -q

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PCA.transform fails if svd_solver="randomized"

1 participant