Skip to content

Switch ServiceProvider reads in controllers to cached listers#5686

Open
deads2k wants to merge 1 commit into
Azure:mainfrom
deads2k:cs-192-switch-to-SP-cache
Open

Switch ServiceProvider reads in controllers to cached listers#5686
deads2k wants to merge 1 commit into
Azure:mainfrom
deads2k:cs-192-switch-to-SP-cache

Conversation

@deads2k

@deads2k deads2k commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Consumer controllers used to call database.GetOrCreateServiceProviderCluster / GetOrCreateServiceProviderNodePool on every sync, which both hit Cosmos and scattered the create path across many sites. Replace those calls with cached ServiceProvider{Cluster,NodePool}Lister Gets; when the document is absent the syncer returns nil and is re-enqueued by the existing ServiceProvider* informer event handlers once the document arrives.

A single dedicated controller per kind owns creation: CreateServiceProviderCluster and CreateServiceProviderNodePool consult the lister cache first and call GetOrCreate only on a miss, so the create-with-conflict-fallback pattern lives in exactly one place per resource.

Consumer controllers used to call database.GetOrCreateServiceProviderCluster /
GetOrCreateServiceProviderNodePool on every sync, which both hit Cosmos and
scattered the create path across many sites. Replace those calls with cached
ServiceProvider{Cluster,NodePool}Lister Gets; when the document is absent the
syncer returns nil and is re-enqueued by the existing ServiceProvider*
informer event handlers once the document arrives.

A single dedicated controller per kind owns creation: CreateServiceProviderCluster
and CreateServiceProviderNodePool consult the lister cache first and call
GetOrCreate only on a miss, so the create-with-conflict-fallback pattern lives
in exactly one place per resource.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copilot AI review requested due to automatic review settings June 16, 2026 22:48
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors backend controllers to stop calling database.GetOrCreateServiceProvider{Cluster,NodePool} during steady-state reconciles (which hits Cosmos) and instead read ServiceProvider docs via cached listers. Creation of missing ServiceProvider documents is centralized into two dedicated controllers so the create-with-conflict-fallback pattern lives in one place per resource kind.

Changes:

  • Updated validation/upgrade controllers to read ServiceProvider{Cluster,NodePool} via cached listers and bail out on NotFound.
  • Added dedicated creator controllers for ServiceProviderCluster and ServiceProviderNodePool to own document creation.
  • Updated and extended unit tests and lister test helpers to reflect the new lister-first behavior and resource name constants.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
backend/pkg/listertesting/db_listers.go Adds DB-backed ServiceProviderNodePool lister and switches ServiceProviderCluster Get to use the constant resource name.
backend/pkg/controllers/validationcontrollers/nodepool_validation_controller.go Uses ServiceProviderNodePool lister Get instead of GetOrCreate; deep-copies before Replace.
backend/pkg/controllers/validationcontrollers/nodepool_validation_controller_test.go Seeds ServiceProviderNodePool and updates tests to match lister-driven reads.
backend/pkg/controllers/validationcontrollers/cluster_validation_controller.go Uses ServiceProviderCluster lister Get instead of GetOrCreate; deep-copies before Replace.
backend/pkg/controllers/upgradecontrollers/trigger_node_pool_upgrade_controller.go Uses ServiceProviderNodePool lister Get instead of GetOrCreate.
backend/pkg/controllers/upgradecontrollers/trigger_control_plane_upgrade_controller.go Uses ServiceProviderCluster lister Get instead of GetOrCreate.
backend/pkg/controllers/upgradecontrollers/nodepool_version_controller.go Uses ServiceProvider{Cluster,NodePool} lister Gets; deep-copies SPNP before mutations.
backend/pkg/controllers/upgradecontrollers/nodepool_version_controller_test.go Seeds ServiceProvider docs and switches helper logic to GetOrCreate+Replace patterns.
backend/pkg/controllers/upgradecontrollers/control_plane_desired_version_controller.go Uses SPC lister Get; changes admission node-pool prefetch to use SPNP lister and “ready” gating.
backend/pkg/controllers/upgradecontrollers/control_plane_desired_version_controller_test.go Updates test syncer construction to include DB-backed ServiceProvider listers and seeds SP resources in fixtures.
backend/pkg/controllers/upgradecontrollers/control_plane_active_version_controller.go Uses SPC lister Get and deep-copy before Replace to avoid mutating cached objects.
backend/pkg/controllers/upgradecontrollers/control_plane_active_version_controller_test.go Seeds SPC and wires DB-backed SPC lister into tests.
backend/pkg/controllers/create_service_provider_nodepool_controller.go New controller that creates SPNP documents on cache miss, owning GetOrCreate usage.
backend/pkg/controllers/create_service_provider_nodepool_controller_test.go New tests covering SPNP creator behavior across NotFound/error/idempotency cases.
backend/pkg/controllers/create_service_provider_cluster_controller.go New controller that creates SPC documents on cache miss, owning GetOrCreate usage.
backend/pkg/controllers/create_service_provider_cluster_controller_test.go New tests covering SPC creator behavior across NotFound/error/idempotency cases.
backend/pkg/controllers/create_nodepool_scoped_read_desires_controller.go Switches to SPC lister Get (no longer GetOrCreate) to resolve management cluster placement.
backend/pkg/controllers/create_cluster_scoped_read_desires_controller.go Switches to SPC lister Get (no longer GetOrCreate) to resolve management cluster placement.
backend/pkg/app/backend.go Wires new listers into controller constructors and starts the new creator controllers.

Comment on lines +359 to +362
// Returns (_, false, nil) when any node pool is missing its ServiceProviderNodePool —
// CreateServiceProviderNodePool will populate it and the controller will be
// re-enqueued via the ServiceProviderNodePool informer. Skipping admission
// avoids using stale or missing node-pool version data for skew validation.
Comment on lines +99 to 102
if database.IsNotFoundError(err) {
// CreateServiceProviderCluster will populate it; we'll be re-enqueued via the ServiceProviderCluster informer.
return nil
}
Comment on lines +156 to +159
if database.IsNotFoundError(err) {
// CreateServiceProviderCluster will populate it; we'll be re-enqueued via the ServiceProviderCluster informer.
return nil
}
@openshift-ci

openshift-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown

@deads2k: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify fe1eea6 link true /test verify
ci/prow/lint fe1eea6 link true /test lint
ci/prow/e2e-parallel fe1eea6 link true /test e2e-parallel

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants