Skip to content

OCPBUGS-88531: Restart cloud-network-config-controller on restart-date annotation#8738

Closed
bryan-cox wants to merge 2 commits into
openshift:mainfrom
bryan-cox:OCPBUGS-88531
Closed

OCPBUGS-88531: Restart cloud-network-config-controller on restart-date annotation#8738
bryan-cox wants to merge 2 commits into
openshift:mainfrom
bryan-cox:OCPBUGS-88531

Conversation

@bryan-cox

@bryan-cox bryan-cox commented Jun 15, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it:

Note: The primary fix for this bug is in CNO: openshift/cluster-network-operator#3030

This PR is a companion fix in hypershift that:

  1. Fixes a pre-existing value-copy bug in SetRestartAnnotationAndPatchpodMeta := patch.Spec.Template.ObjectMeta creates a value copy, so when podMeta.Annotations is nil, the new map is lost. Fixed by working directly on patch.Spec.Template.ObjectMeta.Annotations.

  2. Adds cloud-network-config-controller restart as a stopgap in CPO's cleanupClusterNetworkOperatorResources — this follows the existing pattern for multus-admission-controller, network-node-identity, and ovnkube-control-plane. Once the CNO fix lands, all CPO restart calls for CNO operands can be removed (tracked by the existing TODO comment at line 2026).

  3. Documents the four CNO-managed components that were missing from the restart documentation.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-88531

Special notes for your reviewer:

The architecturally correct fix is in CNO (PR #3030), where CNO reads the restart-date annotation from the HCP CR and injects it into all rendered operand pod templates. This hypershift PR provides the stopgap and bug fix.

The cleanupClusterNetworkOperatorResources function in CPO has a long-standing TODO asking "why is this not done in CNO?" — the CNO PR addresses that for the restart-date case.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code via /jira:solve OCPBUGS-88531

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The change extends the restart annotation reconciliation in cleanupClusterNetworkOperatorResources to also target the cloud-network-config-controller deployment on cloud platforms (AWS, Azure, GCP, OpenStack). A new constant cloudNetworkConfigController and an exported helper function CloudNetworkConfigControllerDeployment(namespace string) are added to the CNO manifests package. The controller calls cnov2.SetRestartAnnotationAndPatch on this deployment when RestartDateAnnotation is present, with supporting refactoring to the annotation patching logic for clarity. The restart-control-plane-components documentation is updated to list cloud-network-config-controller and to expand the surrounding component entries.

🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Tests lack meaningful assertion failure messages (violates requirement 4) and discards AddToScheme error (violates requirement 5 and PR review comment). Add assertion messages to all Expect() calls and properly handle the AddToScheme error instead of discarding it with _.
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding restart functionality for the cloud-network-config-controller deployment when the restart-date annotation is set.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Tests use standard Go testing, not Ginkgo. Test names are static and descriptive, containing no dynamic information that could change between runs.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces no deployment specs or scheduling constraints. New function returns bare Deployment object with only name/namespace metadata, matching existing patterns. CNO operator manages actual d...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. The tests added (TestSetRestartAnnotationAndPatch and TestPlatformHasCloudNetworkConfigController) are standard Go unit tests using testing.T, not Ginkgo...
No-Weak-Crypto ✅ Passed No weak cryptographic usage found. PR contains only Kubernetes deployment patching, annotation management, and documentation updates with no cryptographic operations.
Container-Privileges ✅ Passed PR does not introduce privileged containers, hostPID/hostNetwork/hostIPC, SYS_ADMIN capabilities, root without justification, or allowPrivilegeEscalation. Changes only involve restart annotations a...
No-Sensitive-Data-In-Logs ✅ Passed No logging that exposes passwords, tokens, API keys, PII, session IDs, internal hostnames, or customer data was found in the PR changes. Error messages are generic and use standard error wrapping.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8738 June 15, 2026 15:42 Inactive
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 15, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@bryan-cox: This pull request references Jira Issue OCPBUGS-88531, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

When hypershift.openshift.io/restart-date is set on a HostedCluster, cleanupClusterNetworkOperatorResources restarts three CNO-managed deployments (multus-admission-controller, network-node-identity, ovnkube-control-plane) but omits cloud-network-config-controller.

cloud-network-config-controller is a CNO operand deployed on cloud platforms (AWS/Azure/GCP/OpenStack) that uses cloud API credentials and a kubeconfig to the hosted cluster API — both subject to rotation. Without restarting it, the controller continues running with stale credentials after credential rotation.

This follows the same pattern as the ovnkube-control-plane fix in 9e1e73e — adding a SetRestartAnnotationAndPatch call with a corresponding manifest function. SetRestartAnnotationAndPatch returns nil for not-found deployments, so no platform-specific conditional is needed.

Also adds the four CNO-managed components (cloud-network-config-controller, multus-admission-controller, network-node-identity, ovnkube-control-plane) that were missing from the restart documentation.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-88531

Special notes for your reviewer:

This is the same class of omission as ovnkube-control-plane, fixed in 9e1e73eeac. The fix follows the identical pattern.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code via /jira:solve OCPBUGS-88531

Summary by CodeRabbit

  • New Features

  • Extended control plane component restart functionality to include the cloud-network-config-controller deployment for AWS, Azure, GCP, and OpenStack.

  • Documentation

  • Updated restart instructions to add cloud-network-config-controller and refreshed/reshuffled the subsequent component list.

  • Bug Fixes

  • Improved robustness when applying restart annotations to deployment pod template annotations, ensuring annotations are correctly initialized before patching.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 21.42857% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.67%. Comparing base (712ba58) to head (1ac48ef).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...or/controllers/hostedcontrolplane/manifests/cno.go 0.00% 7 Missing ⚠️
...ostedcontrolplane/hostedcontrolplane_controller.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8738   +/-   ##
=======================================
  Coverage   41.66%   41.67%           
=======================================
  Files         758      758           
  Lines       93929    93939   +10     
=======================================
+ Hits        39135    39147   +12     
+ Misses      52046    52043    -3     
- Partials     2748     2749    +1     
Files with missing lines Coverage Δ
...controllers/hostedcontrolplane/v2/cno/component.go 15.38% <100.00%> (+10.29%) ⬆️
...ostedcontrolplane/hostedcontrolplane_controller.go 45.63% <0.00%> (-0.08%) ⬇️
...or/controllers/hostedcontrolplane/manifests/cno.go 0.00% <0.00%> (ø)
Flag Coverage Δ
cmd-support 34.96% <ø> (ø)
cpo-hostedcontrolplane 44.05% <21.42%> (+0.04%) ⬆️
cpo-other 43.45% <ø> (ø)
hypershift-operator 51.65% <ø> (ø)
other 31.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8738 June 15, 2026 15:48 Inactive
bryan-cox and others added 2 commits June 15, 2026 11:56
…otation

cloud-network-config-controller is a CNO operand deployed on cloud
platforms (AWS/Azure/GCP/OpenStack) that uses rotatable cloud API
credentials and a kubeconfig. It was omitted from the restart-date
annotation handling in cleanupClusterNetworkOperatorResources, leaving
it running with stale credentials after rotation.

This follows the same pattern as the ovnkube-control-plane fix in
9e1e73e. SetRestartAnnotationAndPatch returns nil for not-found
deployments, so no platform-specific conditional is needed.

Also fixes a pre-existing value-copy bug in SetRestartAnnotationAndPatch
where ObjectMeta was copied by value, causing the annotation assignment
to be lost when pod template annotations were nil.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add cloud-network-config-controller, multus-admission-controller,
network-node-identity, and ovnkube-control-plane to the documented
list of components restarted by the restart-date annotation. These
were already restarted but missing from the documentation.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation labels Jun 15, 2026
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bryan-cox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/cno/component_test.go (1)

42-53: ⚡ Quick win

Assert existing annotations are preserved, not just the restart key.

The “existing annotations” case currently only checks the new restart annotation at Line 97. Add an assertion that existing-key is still present to protect against map replacement regressions.

Suggested assertion addition
 				g.Expect(err).ToNot(HaveOccurred())
 				g.Expect(updated.Spec.Template.ObjectMeta.Annotations).To(HaveKeyWithValue(hyperv1.RestartDateAnnotation, tt.restartAnnotation))
+				if tt.name == "When deployment exists with existing annotations it should set the restart annotation" {
+					g.Expect(updated.Spec.Template.ObjectMeta.Annotations).To(HaveKeyWithValue("existing-key", "existing-value"))
+				}
 			}

As per coding guidelines, “Unit test any code changes and additions, and include e2e tests when changes impact consumer behavior.”

Also applies to: 97-97

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/cno/component_test.go`
around lines 42 - 53, The test case "When deployment exists with existing
annotations it should set the restart annotation" only verifies that the restart
annotation is set (at line 97), but does not verify that the pre-existing
annotations are preserved. Add an assertion after the restart annotation check
to confirm that the existing-key annotation with value existing-value is still
present in the deployment's template annotations. This protects against
regressions where the annotation map might be replaced instead of updated.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/cno/component_test.go`:
- Line 21: The error returned by appsv1.AddToScheme(scheme) on line 21 is being
discarded with a blank identifier, which can mask failures during test setup.
Instead of discarding the error, check if AddToScheme returns a non-nil error
and fail the test immediately using the appropriate test failure method (such as
t.Fatalf or Expect/Require from your testing library) to provide clear
diagnostics if scheme registration fails during test initialization.

---

Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/cno/component_test.go`:
- Around line 42-53: The test case "When deployment exists with existing
annotations it should set the restart annotation" only verifies that the restart
annotation is set (at line 97), but does not verify that the pre-existing
annotations are preserved. Add an assertion after the restart annotation check
to confirm that the existing-key annotation with value existing-value is still
present in the deployment's template annotations. This protects against
regressions where the annotation map might be replaced instead of updated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6cfedae9-454e-40f0-b68d-10847e8bc145

📥 Commits

Reviewing files that changed from the base of the PR and between 2ebe1ad and 1ac48ef.

⛔ Files ignored due to path filters (1)
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
📒 Files selected for processing (5)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/cno.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/component_test.go
  • docs/content/how-to/restart-control-plane-components.md
✅ Files skipped from review due to trivial changes (1)
  • docs/content/how-to/restart-control-plane-components.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • control-plane-operator/controllers/hostedcontrolplane/v2/cno/component.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/cno.go


func TestSetRestartAnnotationAndPatch(t *testing.T) {
scheme := runtime.NewScheme()
_ = appsv1.AddToScheme(scheme)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find and examine the target test file
fd -a 'component_test.go' control-plane-operator/controllers/hostedcontrolplane/v2/cno/

Repository: openshift/hypershift

Length of output: 162


🏁 Script executed:

# Read the specific file to see the context around line 21
wc -l control-plane-operator/controllers/hostedcontrolplane/v2/cno/component_test.go && \
head -30 control-plane-operator/controllers/hostedcontrolplane/v2/cno/component_test.go

Repository: openshift/hypershift

Length of output: 816


🏁 Script executed:

# Check if there are other AddToScheme calls in similar test files and how they're handled
rg -A 2 -B 2 "AddToScheme" control-plane-operator/controllers/hostedcontrolplane/v2/ --type go

Repository: openshift/hypershift

Length of output: 17118


Handle AddToScheme error in test setup instead of discarding it.

Line 21 discards the error return from appsv1.AddToScheme(scheme). If scheme registration fails, subsequent test assertions may fail for the wrong reason, making debugging harder.

Suggested fix
 	scheme := runtime.NewScheme()
-	_ = appsv1.AddToScheme(scheme)
+	if err := appsv1.AddToScheme(scheme); err != nil {
+		t.Fatalf("failed to add apps/v1 to scheme: %v", err)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_ = appsv1.AddToScheme(scheme)
scheme := runtime.NewScheme()
if err := appsv1.AddToScheme(scheme); err != nil {
t.Fatalf("failed to add apps/v1 to scheme: %v", err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/cno/component_test.go`
at line 21, The error returned by appsv1.AddToScheme(scheme) on line 21 is being
discarded with a blank identifier, which can mask failures during test setup.
Instead of discarding the error, check if AddToScheme returns a non-nil error
and fail the test immediately using the appropriate test failure method (such as
t.Fatalf or Expect/Require from your testing library) to provide clear
diagnostics if scheme registration fails during test initialization.

Source: Coding guidelines

@bryan-cox

Copy link
Copy Markdown
Member Author

Closing in favor of the CNO fix: openshift/cluster-network-operator#3030

CNO is the correct owner for restarting its operands when the restart-date annotation changes.

@bryan-cox bryan-cox closed this Jun 15, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@bryan-cox: This pull request references Jira Issue OCPBUGS-88531. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

Note: The primary fix for this bug is in CNO: openshift/cluster-network-operator#3030

This PR is a companion fix in hypershift that:

  1. Fixes a pre-existing value-copy bug in SetRestartAnnotationAndPatchpodMeta := patch.Spec.Template.ObjectMeta creates a value copy, so when podMeta.Annotations is nil, the new map is lost. Fixed by working directly on patch.Spec.Template.ObjectMeta.Annotations.

  2. Adds cloud-network-config-controller restart as a stopgap in CPO's cleanupClusterNetworkOperatorResources — this follows the existing pattern for multus-admission-controller, network-node-identity, and ovnkube-control-plane. Once the CNO fix lands, all CPO restart calls for CNO operands can be removed (tracked by the existing TODO comment at line 2026).

  3. Documents the four CNO-managed components that were missing from the restart documentation.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-88531

Special notes for your reviewer:

The architecturally correct fix is in CNO (PR #3030), where CNO reads the restart-date annotation from the HCP CR and injects it into all rendered operand pod templates. This hypershift PR provides the stopgap and bug fix.

The cleanupClusterNetworkOperatorResources function in CPO has a long-standing TODO asking "why is this not done in CNO?" — the CNO PR addresses that for the restart-date case.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

🤖 Generated with Claude Code via /jira:solve OCPBUGS-88531

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mgencur

mgencur commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Just a note. I'm adding this test and the fix that is very similar to what was in this PR: #8733

@mgencur

mgencur commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

I think some parts of this PR would still be useful:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants