Skip to content

CNTRLPLANE-3553: Wire osImageStream into NodePool controller (hash, token, status, validation)#8730

Draft
sdminonne wants to merge 5 commits into
openshift:mainfrom
sdminonne:CNTRLPLANE-3553-osImageStream-controller
Draft

CNTRLPLANE-3553: Wire osImageStream into NodePool controller (hash, token, status, validation)#8730
sdminonne wants to merge 5 commits into
openshift:mainfrom
sdminonne:CNTRLPLANE-3553-osImageStream-controller

Conversation

@sdminonne

@sdminonne sdminonne commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Wire spec.osImageStream and status.osImageStream into the NodePool
reconciliation loop: config hash (triggering rollouts on stream change),
token secret propagation, status reporting, and boot image function
parameter threading.

Aligns with the dual-stream RHEL NodePool enhancement
Phase 2 controller plumbing.

Dependencies

Changes

Config hash integration (config.go)

  • Add rhelStream to rolloutConfig, Hash(), and HashWithoutVersion()
  • Normalize rhelStream in NewConfigGenerator: when the explicit
    spec.osImageStream.name matches the version-derived default (e.g.
    "rhel-9" on a 4.x release), it is kept as "" so the hash doesn't
    change and no spurious rollout is triggered. Only a non-default stream
    produces a different hash
  • Extract defaultRHELStream() for the normalization comparison

Stream resolution (osstream.go)

  • getRHELStream() returns:
    • The explicit spec.osImageStream.name when set
    • "rhel-10" when unset and release >= 5.0
    • "" when unset and release < 5.0 (legacy single-stream behavior,
      no OSImageStream CR generated)
  • validateOSImageStream() checks that the stream name, if set, is one
    of rhel-9 or rhel-10

Validation (conditions.go)

  • osImageStream validation runs inside validMachineConfigCondition
    (not a separate condition), consistent with the enhancement design
    where stream validation has access to the ConfigGenerator context

Token secret (token.go)

  • Write os-stream key into the token secret for downstream ignition

Status propagation (capi.go)

  • Set status.osImageStream on rollout completion in both Replace
    (MachineDeployment) and InPlace (MachineSet) paths

Boot image threading (aws.go, gcp.go)

  • Thread rhelStream parameter through defaultNodePoolAMI,
    defaultNodePoolGCPImage, and their callers. Parameter is accepted
    but not yet consumed — marked with TODO([CNTRLPLANE-3553](https://redhat.atlassian.net/browse/CNTRLPLANE-3553)) for when
    multi-stream metadata parsing lands

What's not yet implemented (future work)

  • usesRunc guard (explicit rhel-10 + runc → error, implicit >=5.0 + runc → fallback to rhel-9)
  • Multi-stream boot image metadata parsing (CNTRLPLANE-3553)
  • Ignition server plumbing (OSImageStream CR generation in GetPayload())

Test plan

  • TestGetRHELStream — covers explicit stream, 4.x/5.x/6.x defaults, unparsable version
  • TestDefaultRHELStream — covers version-based derivation
  • TestValidateOSImageStream — covers empty, valid, and invalid stream names
  • TestHash / TestHashWithoutVersion — covers non-default stream changing the hash
  • TestReconcileMachineDeploymentStatus — covers status.osImageStream on rollout completion
  • make verify passes clean

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for specifying and managing RHEL OS image streams (RHEL-9 and RHEL-10) in NodePools with automatic derivation from release image versions
    • OS image stream configuration is now tracked in NodePool status
    • Implemented validation rules to prevent removal of configured OS image streams
  • Tests

    • Added comprehensive unit tests for OS image stream derivation and validation

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 12, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 12, 2026

Copy link
Copy Markdown

@sdminonne: This pull request references CNTRLPLANE-3553 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Wire spec.osImageStream into the NodePool reconciliation loop: config hash
(triggering rollouts on stream change), token secret propagation, status
reporting, and boot image function parameter threading.

DRAFT — depends on #8669 and #8719 merging first.
This PR will be rebased and conflict-resolved after those land.
Opening early for design review on the config hash, token secret,
and boot image threading approach.

Dependencies

This PR overlaps with both on condition constants and stream resolution logic.
After they merge, the overlapping pieces (osstream.go condition constants,
validOSImageStreamCondition allowlist) will be dropped and replaced with
the implementations from those PRs.

What this PR adds (unique to this PR)

  • Config hash integration (config.go): add rhelStream to rolloutConfig,
    Hash(), and HashWithoutVersion() — backward compatible when empty
  • Token secret (token.go): write os-stream key for downstream ignition
  • Status propagation (capi.go): set status.osImageStream on rollout
    completion in both Replace (MachineDeployment) and InPlace (MachineSet) paths
  • Boot image threading: add rhelStream parameter to defaultNodePoolAMI,
    defaultNodePoolGCPImage, and all callers (aws.go, gcp.go, token.go).
    Parameter is accepted but not yet used — marked with TODO([CNTRLPLANE-3553](https://redhat.atlassian.net/browse/CNTRLPLANE-3553))
    for when StreamForName() from CNTRLPLANE-3552: Multi-stream CoreOS metadata parsing and stream resolution #8669 is wired in

What overlaps (will be resolved on rebase)

Test plan

  • go test ./hypershift-operator/controllers/nodepool/... -count=1 — all pass
  • make lint-fix — 0 issues
  • go build ./... — full repo builds
  • CI will fail until dependencies merge — expected for draft

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR wires RHEL OS image stream support through the NodePool controller. It adds OSImageStreamRHEL9 ("rhel-9") and OSImageStreamRHEL10 ("rhel-10") API constants, a CEL validation rule preventing removal of osImageStream once set, and a new osstream.go with defaultRHELStream, getRHELStream, and validateOSImageStream helpers. The rhelStream value is incorporated into rollout config hashing via a new rolloutConfig.rhelStream field. Validation of the stream name is added to validMachineConfigCondition. The rhelStream parameter is threaded through defaultNodePoolAMI, defaultNodePoolGCPImage, and their call chains on AWS and GCP. A new TokenSecretOSStreamKey constant is added and the value is stored in token secrets and used by Karpenter AMI label generation. On rollout completion, nodePool.Status.OSImageStream is populated via getRHELStream in the reconciler, MachineDeployment, and MachineSet paths.

Possibly related PRs

  • openshift/hypershift#8669: Adds the initial GetRHELStream helper and rhel-9/rhel-10 constants that this PR extends with NodePool reconciliation wiring, config hashing, and image-selection propagation.

Suggested reviewers

  • muraee
  • Nirshal
  • sjenning
🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: wiring osImageStream into the NodePool controller across multiple concerns (hash, token, status, validation).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in osstream_test.go and modified test sections (config_test.go, capi_test.go) are stable and deterministic, following "When , it should " pattern with n...
Test Structure And Quality ✅ Passed The custom check requests review of Ginkgo test code, but this PR contains only standard Go testing.T tests with Gomega assertions. No Ginkgo tests (Describe/It blocks) are present.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies NodePool controller for OS image stream support (config hashing, token propagation, status reporting) with no pod/deployment manifests, affinity rules, topology constraints, PDBs, or sc...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. The new osstream_test.go contains only standard Go unit tests using testing.T and Gomega, with no IPv4 assumptions or external connectivity requirements.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons detected. PR uses FNV-1a hashing (non-cryptograph...
Container-Privileges ✅ Passed PR contains only Go controller logic and API type changes; no Kubernetes manifests, container specs, or security context configurations are present.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data (passwords, tokens, API keys, PII, session IDs) exposed in logs. Error messages only expose non-sensitive release versions and OS stream names.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdminonne
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform and removed do-not-merge/needs-area labels Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.23077% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.81%. Comparing base (7507291) to head (ebde395).

Files with missing lines Patch % Lines
...rshift-operator/controllers/nodepool/conditions.go 0.00% 10 Missing ⚠️
hypershift-operator/controllers/nodepool/config.go 44.44% 4 Missing and 1 partial ⚠️
hypershift-operator/controllers/nodepool/capi.go 50.00% 3 Missing ⚠️
...erator/controllers/nodepool/nodepool_controller.go 66.66% 3 Missing ⚠️
hypershift-operator/controllers/nodepool/aws.go 83.33% 1 Missing ⚠️
hypershift-operator/controllers/nodepool/gcp.go 75.00% 1 Missing ⚠️
hypershift-operator/controllers/nodepool/token.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8730      +/-   ##
==========================================
+ Coverage   41.79%   41.81%   +0.01%     
==========================================
  Files         759      760       +1     
  Lines       94037    94099      +62     
==========================================
+ Hits        39304    39344      +40     
- Misses      51983    52004      +21     
- Partials     2750     2751       +1     
Files with missing lines Coverage Δ
...pershift-operator/controllers/nodepool/osstream.go 100.00% <100.00%> (ø)
hypershift-operator/controllers/nodepool/aws.go 70.02% <83.33%> (ø)
hypershift-operator/controllers/nodepool/gcp.go 66.21% <75.00%> (-0.30%) ⬇️
hypershift-operator/controllers/nodepool/token.go 82.59% <75.00%> (+0.05%) ⬆️
hypershift-operator/controllers/nodepool/capi.go 71.65% <50.00%> (-0.13%) ⬇️
...erator/controllers/nodepool/nodepool_controller.go 43.37% <66.66%> (+0.11%) ⬆️
hypershift-operator/controllers/nodepool/config.go 83.77% <44.44%> (-1.75%) ⬇️
...rshift-operator/controllers/nodepool/conditions.go 53.27% <0.00%> (-0.66%) ⬇️
Flag Coverage Δ
cmd-support 35.11% <ø> (ø)
cpo-hostedcontrolplane 44.10% <ø> (ø)
cpo-other 43.45% <ø> (ø)
hypershift-operator 51.90% <69.23%> (+0.03%) ⬆️
other 31.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hypershift-operator/controllers/nodepool/config.go (1)

88-94: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Thread the resolved OS stream, not the raw spec field.

getRHELStream() is the contract for this feature: when spec.osImageStream is unset, the controller derives rhel-9/rhel-10 from the release. These sites currently cache/forward nodePool.Spec.OSImageStream.Name directly, so the value collapses to "" for defaulted NodePools. That leaves the hash path out of sync with status.osImageStream today, and it will feed the wrong stream into AWS/GCP image selection as soon as the dependent multi-stream metadata work starts using rhelStream after rebase.

Compute the effective stream once from getRHELStream(nodePool, releaseImage) and thread that value through rolloutConfig, CAPI, and the platform validation helpers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/config.go` around lines 88 - 94,
Compute the effective RHEL stream once by calling getRHELStream(nodePool,
releaseImage) and pass that computed value into rolloutConfig.rhelStream instead
of using nodePool.Spec.OSImageStream.Name; also propagate the same computed
value into any CAPI-related code paths and platform validation helper calls that
previously read spec.osImageStream so all consumers use the resolved stream
(e.g., update usages in rolloutConfig initialization, CAPI helpers, and platform
validation functions to accept/consume the new resolved rhelStream variable).
🧹 Nitpick comments (2)
hypershift-operator/controllers/nodepool/osstream_test.go (1)

17-80: ⚡ Quick win

Add test coverage for getRHELStream error path.

TestGetRHELStream only covers success cases. Add a test case for when releaseImage.Version() returns an invalid semver string, to verify that getRHELStream returns an error as expected (Line 26-29 in osstream.go handles this, but it's untested).

📋 Proposed test case for invalid semver
 		expectedStream: "rhel-10",
 	},
+	{
+		name: "When spec.osImageStream.Name is empty and version is invalid, it should return error",
+		nodePool: &hyperv1.NodePool{
+			Spec: hyperv1.NodePoolSpec{},
+		},
+		releaseImage: &releaseinfo.ReleaseImage{
+			ImageStream: &imageapi.ImageStream{ObjectMeta: metav1.ObjectMeta{Name: "invalid-version"}},
+		},
+		expectErr: true,
+	},
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/osstream_test.go` around lines 17 -
80, TestGetRHELStream lacks coverage for the error path in getRHELStream when
releaseImage.Version() yields an invalid semver; add a new test case in
TestGetRHELStream that supplies a releaseinfo.ReleaseImage whose
ImageStream.ObjectMeta.Name is an invalid version string (e.g., "not-a-semver"),
call getRHELStream(nodePool, releaseImage) and assert that an error is returned
(Expect(err).To(HaveOccurred())); reference the existing test harness and types
(TestGetRHELStream, getRHELStream, releaseinfo.ReleaseImage,
imageapi.ImageStream/ObjectMeta.Name) so the new case is consistent with the
other table-driven cases.
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)

407-409: ⚡ Quick win

Log or surface getRHELStream errors in all three status-update paths.

In nodepool_controller.go (Line 407-409), capi.go reconcileMachineDeploymentStatus (Line 617-619), and capi.go reconcileMachineSet (Line 1018-1020), errors from getRHELStream are silently ignored with if err == nil guards. The shared root cause is missing error visibility: when OS stream resolution fails (e.g., invalid semver in release version), Status.OSImageStream remains unset or stale and no log or condition surfaces the issue to the user. As per coding guidelines, "Always check errors in Go — don't ignore them." Add logging or set a condition in each path to surface failures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/nodepool_controller.go` around lines
407 - 409, getRHELStream errors are being ignored in three places
(nodepool_controller.go when setting nodePool.Status.OSImageStream, capi.go
reconcileMachineDeploymentStatus, and capi.go reconcileMachineSet); update each
location to handle the error path: when getRHELStream returns an error, log the
error with the controller logger (including err.Error()) and set a clear status
condition (e.g., "OSImageStreamResolutionFailed") on the owning object with the
error message so users see the failure, and when getRHELStream succeeds clear
that condition and set Status.OSImageStream as before; reference the
getRHELStream call sites and the Status.OSImageStream update points and ensure
status updates are persisted.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/nodepool/osstream.go`:
- Around line 21-35: The getRHELStream function dereferences nodePool
(nodePool.Spec.OSImageStream.Name) and releaseImage (releaseImage.Version())
without nil checks; add defensive nil guards at the top of getRHELStream to
return a clear error if nodePool or releaseImage is nil, and also guard
nodePool.Spec or nodePool.Spec.OSImageStream as needed before accessing Name to
avoid panics, then proceed with semver.Parse on releaseImage.Version() only when
releaseImage is non-nil.

---

Outside diff comments:
In `@hypershift-operator/controllers/nodepool/config.go`:
- Around line 88-94: Compute the effective RHEL stream once by calling
getRHELStream(nodePool, releaseImage) and pass that computed value into
rolloutConfig.rhelStream instead of using nodePool.Spec.OSImageStream.Name; also
propagate the same computed value into any CAPI-related code paths and platform
validation helper calls that previously read spec.osImageStream so all consumers
use the resolved stream (e.g., update usages in rolloutConfig initialization,
CAPI helpers, and platform validation functions to accept/consume the new
resolved rhelStream variable).

---

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller.go`:
- Around line 407-409: getRHELStream errors are being ignored in three places
(nodepool_controller.go when setting nodePool.Status.OSImageStream, capi.go
reconcileMachineDeploymentStatus, and capi.go reconcileMachineSet); update each
location to handle the error path: when getRHELStream returns an error, log the
error with the controller logger (including err.Error()) and set a clear status
condition (e.g., "OSImageStreamResolutionFailed") on the owning object with the
error message so users see the failure, and when getRHELStream succeeds clear
that condition and set Status.OSImageStream as before; reference the
getRHELStream call sites and the Status.OSImageStream update points and ensure
status updates are persisted.

In `@hypershift-operator/controllers/nodepool/osstream_test.go`:
- Around line 17-80: TestGetRHELStream lacks coverage for the error path in
getRHELStream when releaseImage.Version() yields an invalid semver; add a new
test case in TestGetRHELStream that supplies a releaseinfo.ReleaseImage whose
ImageStream.ObjectMeta.Name is an invalid version string (e.g., "not-a-semver"),
call getRHELStream(nodePool, releaseImage) and assert that an error is returned
(Expect(err).To(HaveOccurred())); reference the existing test harness and types
(TestGetRHELStream, getRHELStream, releaseinfo.ReleaseImage,
imageapi.ImageStream/ObjectMeta.Name) so the new case is consistent with the
other table-driven cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e87df744-44ee-4acb-9a91-4746ce49edd8

📥 Commits

Reviewing files that changed from the base of the PR and between 39f04eb and afe4b10.

⛔ Files ignored due to path filters (1)
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (13)
  • api/hypershift/v1beta1/nodepool_conditions.go
  • hypershift-operator/controllers/nodepool/aws.go
  • hypershift-operator/controllers/nodepool/aws_test.go
  • hypershift-operator/controllers/nodepool/capi.go
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/gcp.go
  • hypershift-operator/controllers/nodepool/gcp_test.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/nodepool_controller_test.go
  • hypershift-operator/controllers/nodepool/osstream.go
  • hypershift-operator/controllers/nodepool/osstream_test.go
  • hypershift-operator/controllers/nodepool/token.go
  • hypershift-operator/controllers/nodepool/token_test.go

Comment on lines +21 to +35
func getRHELStream(nodePool *hyperv1.NodePool, releaseImage *releaseinfo.ReleaseImage) (string, error) {
if nodePool.Spec.OSImageStream.Name != "" {
return nodePool.Spec.OSImageStream.Name, nil
}

version, err := semver.Parse(releaseImage.Version())
if err != nil {
return "", fmt.Errorf("failed to parse release image version %q: %w", releaseImage.Version(), err)
}

if version.Major >= 5 {
return "rhel-10", nil
}
return "rhel-9", nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add nil-safety guards to prevent panics.

getRHELStream dereferences nodePool (Line 22) and releaseImage (Line 26) without nil checks. If either parameter is nil, the function will panic. Current callers appear to pass valid pointers, but defensive validation would prevent runtime panics if the contract changes or a new caller is added.

🛡️ Proposed fix to add nil guards
 func getRHELStream(nodePool *hyperv1.NodePool, releaseImage *releaseinfo.ReleaseImage) (string, error) {
+	if nodePool == nil {
+		return "", fmt.Errorf("nodePool cannot be nil")
+	}
+	if releaseImage == nil {
+		return "", fmt.Errorf("releaseImage cannot be nil")
+	}
 	if nodePool.Spec.OSImageStream.Name != "" {
 		return nodePool.Spec.OSImageStream.Name, nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/osstream.go` around lines 21 - 35,
The getRHELStream function dereferences nodePool
(nodePool.Spec.OSImageStream.Name) and releaseImage (releaseImage.Version())
without nil checks; add defensive nil guards at the top of getRHELStream to
return a clear error if nodePool or releaseImage is nil, and also guard
nodePool.Spec or nodePool.Spec.OSImageStream as needed before accessing Name to
avoid panics, then proceed with semver.Parse on releaseImage.Version() only when
releaseImage is non-nil.

@sdminonne sdminonne force-pushed the CNTRLPLANE-3553-osImageStream-controller branch from 03a5cef to 8e027da Compare June 17, 2026 07:40
@sdminonne sdminonne changed the title CNTRLPLANE-3553: Wire osImageStream into NodePool controller (config hash, token, boot image) CNTRLPLANE-3553: Wire osImageStream into NodePool controller (hash, token, status, validation) Jun 17, 2026
@openshift-ci openshift-ci Bot added the area/cli Indicates the PR includes changes for CLI label Jun 17, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
api/hypershift/v1beta1/nodepool_types.go (1)

51-60: 💤 Low value

Consider nil check for defensive safety.

The function dereferences nodePool at Line 52 without a nil check. While current callers in context snippets (conditions.go:390) pass valid pointers, adding a guard would prevent panics if a new caller is added or the contract changes.

🛡️ Optional defensive nil guard
 func validateOSImageStream(nodePool *hyperv1.NodePool) error {
+	if nodePool == nil {
+		return fmt.Errorf("nodePool cannot be nil")
+	}
 	name := nodePool.Spec.OSImageStream.Name
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/hypershift/v1beta1/nodepool_types.go` around lines 51 - 60, Locate the
function that dereferences the nodePool parameter (likely in the
nodepool_types.go file around the specified line range) and add a nil check
guard before the dereference operation. The check should verify that nodePool is
not nil before attempting to access its fields or methods, returning an
appropriate error or early exit if the pointer is nil. This defensive check
prevents potential panics if the function contract changes or new callers pass
nil values in the future.
hypershift-operator/controllers/nodepool/capi_test.go (2)

2771-2773: ⚡ Quick win

Add nil-safety to OSImageStream assertion.

The assertion at line 2772 directly accesses nodePool.Status.OSImageStream.Name without first checking if nodePool.Status.OSImageStream is nil. If the status field is unexpectedly nil when expectedOSImageStream is non-empty, this will panic rather than provide a clear test failure message.

Safer assertion pattern
 if tc.expectedOSImageStream != "" {
+    g.Expect(nodePool.Status.OSImageStream).ToNot(BeNil(), "OSImageStream should be set when stream is expected")
     g.Expect(nodePool.Status.OSImageStream.Name).To(Equal(tc.expectedOSImageStream))
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/capi_test.go` around lines 2771 -
2773, In the test assertion block where tc.expectedOSImageStream is checked, add
a nil-safety check before accessing the Name field. When
tc.expectedOSImageStream is non-empty, first verify that
nodePool.Status.OSImageStream is not nil before attempting to access
nodePool.Status.OSImageStream.Name in the Expect call. This will prevent a panic
and provide a clear test failure message if the OSImageStream field is
unexpectedly nil.

2636-2674: ⚡ Quick win

Add test case to verify OSImageStream status population on rollout completion.

The test infrastructure includes an expectedOSImageStream field and conditional assertion (lines 2771-2773), but no test case actually verifies that nodePool.Status.OSImageStream is populated when the MachineDeployment completes. The test case at line 2652 sets expectedOSImageStream: "", which skips the assertion entirely.

According to the layer description, reconcileMachineDeploymentStatus should call getRHELStream and populate nodePool.Status.OSImageStream when the deployment reaches completion. Add a test case that sets a non-empty expectedOSImageStream value and verifies the status field is correctly populated.

Consider also updating the test name at line 2652 to reflect what is actually being verified, or add a separate test case specifically for OSImageStream status population.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/capi_test.go` around lines 2636 -
2674, Add a new test case to the testCases slice in the
TestReconcileMachineDeploymentStatus function that verifies OSImageStream status
population when a MachineDeployment completes. This test case should have a
non-empty expectedOSImageStream value (such as a valid RHEL stream identifier)
and match the completion conditions of the existing test case (all replica
counts at 3, ObservedGeneration matching Generation). This will ensure the
conditional assertion at lines 2771-2773 that checks
nodePool.Status.OSImageStream is actually exercised with a meaningful
verification rather than being skipped due to an empty expected value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/nodepool/nodepool_controller.go`:
- Around line 406-408: In the nodepool_controller.go file, the call to
getRHELStream is silently ignoring errors by only handling the success case with
if err == nil. Instead of dropping the error, properly handle failure cases by
either logging the error with context and returning it to trigger
reconciliation, or updating the NodePool status to reflect the error condition.
This ensures that stream-resolution failures are visible and the status remains
synchronized with the actual state rather than remaining stale and unset.

In `@hypershift-operator/controllers/nodepool/token.go`:
- Around line 358-359: The assignment of TokenSecretOSStreamKey to
tokenSecret.Data only executes within a conditional block that checks if
tokenSecret.Data is nil, meaning existing token secrets never get the os-stream
field populated on reconciliation. Move the line
tokenSecret.Data[TokenSecretOSStreamKey] = []byte(t.rhelStream) outside of the
nil-check conditional so it runs on every reconciliation, ensuring all token
secrets have the os-stream field persisted consistently. Additionally, add unit
tests to verify that the os-stream field is set during both initial creation and
subsequent reconciliations, and add e2e tests to validate this behavior impacts
consumer behavior correctly.

---

Nitpick comments:
In `@api/hypershift/v1beta1/nodepool_types.go`:
- Around line 51-60: Locate the function that dereferences the nodePool
parameter (likely in the nodepool_types.go file around the specified line range)
and add a nil check guard before the dereference operation. The check should
verify that nodePool is not nil before attempting to access its fields or
methods, returning an appropriate error or early exit if the pointer is nil.
This defensive check prevents potential panics if the function contract changes
or new callers pass nil values in the future.

In `@hypershift-operator/controllers/nodepool/capi_test.go`:
- Around line 2771-2773: In the test assertion block where
tc.expectedOSImageStream is checked, add a nil-safety check before accessing the
Name field. When tc.expectedOSImageStream is non-empty, first verify that
nodePool.Status.OSImageStream is not nil before attempting to access
nodePool.Status.OSImageStream.Name in the Expect call. This will prevent a panic
and provide a clear test failure message if the OSImageStream field is
unexpectedly nil.
- Around line 2636-2674: Add a new test case to the testCases slice in the
TestReconcileMachineDeploymentStatus function that verifies OSImageStream status
population when a MachineDeployment completes. This test case should have a
non-empty expectedOSImageStream value (such as a valid RHEL stream identifier)
and match the completion conditions of the existing test case (all replica
counts at 3, ObservedGeneration matching Generation). This will ensure the
conditional assertion at lines 2771-2773 that checks
nodePool.Status.OSImageStream is actually exercised with a meaningful
verification rather than being skipped due to an empty expected value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8ebdba94-e5e5-4bf0-8e2a-0fa112fff63f

📥 Commits

Reviewing files that changed from the base of the PR and between 03a5cef and 8e027da.

⛔ Files ignored due to path filters (5)
  • api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OSStreams.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • cmd/install/assets/crds/hypershift-operator/tests/nodepools.hypershift.openshift.io/featuregated.nodepools.osimagestream.testsuite.yaml is excluded by !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_types.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (16)
  • api/hypershift/v1beta1/nodepool_types.go
  • hypershift-operator/controllers/nodepool/aws.go
  • hypershift-operator/controllers/nodepool/aws_test.go
  • hypershift-operator/controllers/nodepool/capi.go
  • hypershift-operator/controllers/nodepool/capi_test.go
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/config_test.go
  • hypershift-operator/controllers/nodepool/gcp.go
  • hypershift-operator/controllers/nodepool/gcp_test.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/nodepool/nodepool_controller_test.go
  • hypershift-operator/controllers/nodepool/osstream.go
  • hypershift-operator/controllers/nodepool/osstream_test.go
  • hypershift-operator/controllers/nodepool/token.go
  • hypershift-operator/controllers/nodepool/token_test.go
✅ Files skipped from review due to trivial changes (2)
  • hypershift-operator/controllers/nodepool/nodepool_controller_test.go
  • hypershift-operator/controllers/nodepool/aws_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/aws.go
  • hypershift-operator/controllers/nodepool/gcp_test.go
  • hypershift-operator/controllers/nodepool/gcp.go
  • hypershift-operator/controllers/nodepool/capi.go
  • hypershift-operator/controllers/nodepool/config_test.go

Comment on lines +406 to +408
if stream, err := getRHELStream(nodePool, releaseImage); err == nil {
nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{Name: stream}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not silently drop getRHELStream errors when updating status.

On Line 406, ignoring the error makes stream-resolution failures invisible and can leave status.osImageStream stale/unset without any reconcile signal.

As per coding guidelines, "Never ignore error returns".

Suggested fix
-		if stream, err := getRHELStream(nodePool, releaseImage); err == nil {
-			nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{Name: stream}
-		}
+		stream, err := getRHELStream(nodePool, releaseImage)
+		if err != nil {
+			return ctrl.Result{}, fmt.Errorf("failed to resolve OS image stream: %w", err)
+		}
+		nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{Name: stream}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/nodepool_controller.go` around lines
406 - 408, In the nodepool_controller.go file, the call to getRHELStream is
silently ignoring errors by only handling the success case with if err == nil.
Instead of dropping the error, properly handle failure cases by either logging
the error with context and returning it to trigger reconciliation, or updating
the NodePool status to reflect the error condition. This ensures that
stream-resolution failures are visible and the status remains synchronized with
the actual state rather than remaining stale and unset.

Source: Coding guidelines

Comment on lines +358 to 359
tokenSecret.Data[TokenSecretOSStreamKey] = []byte(t.rhelStream)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist os-stream on every token-secret reconcile, not only on create.

Line 358 runs only when tokenSecret.Data == nil. Existing token secrets can miss os-stream indefinitely, which weakens the new cross-component secret contract.

As per coding guidelines, "Include unit tests for any code changes and additions, and include e2e tests when changes impact consumer behavior".

Suggested fix
-		tokenSecret.Data[TokenSecretOSStreamKey] = []byte(t.rhelStream)
 	}
+	tokenSecret.Data[TokenSecretOSStreamKey] = []byte(t.rhelStream)
 	// TODO (alberto): Only apply this on creation and change the hash generation to only use triggering upgrade fields.
 	// We let this change to happen inplace now as the tokenSecret and the mcs config use the whole spec.Config for the comparing hash.
 	// Otherwise if something which does not trigger a new token generation from spec.Config changes, like .IDP, both hashes would mismatch forever.
 	tokenSecret.Data[TokenSecretHCConfigurationHashKey] = t.globalConfigHash
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/token.go` around lines 358 - 359,
The assignment of TokenSecretOSStreamKey to tokenSecret.Data only executes
within a conditional block that checks if tokenSecret.Data is nil, meaning
existing token secrets never get the os-stream field populated on
reconciliation. Move the line tokenSecret.Data[TokenSecretOSStreamKey] =
[]byte(t.rhelStream) outside of the nil-check conditional so it runs on every
reconciliation, ensuring all token secrets have the os-stream field persisted
consistently. Additionally, add unit tests to verify that the os-stream field is
set during both initial creation and subsequent reconciliations, and add e2e
tests to validate this behavior impacts consumer behavior correctly.

Source: Coding guidelines

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown

I now have all the information needed. Here is the final report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

hypershift-operator/controllers/nodepool/stream_test.go:11:6: TestGetRHELStream redeclared in this block
	hypershift-operator/controllers/nodepool/osstream_test.go:16:6: other declaration of TestGetRHELStream
FAIL	github.com/openshift/hypershift/hypershift-operator/controllers/nodepool [build failed]

hypershift-operator/controllers/nodepool/capi.go:856:6: func generateMachineTemplateName is unused (U1000)

Summary

Both jobs fail due to the same root cause: a duplicate Go test function name (TestGetRHELStream). PR #8730 adds a new file osstream_test.go containing TestGetRHELStream, but an identically-named function already exists in stream_test.go (introduced by the recently-merged dependency PR #8669 / CNTRLPLANE-3552). Since both files are in the same Go package (nodepool), Go refuses to compile. The verify job also catches a secondary issue: generateMachineTemplateName in capi.go:856 is unused (a pre-existing condition on main now surfaced by staticcheck).

Root Cause

Error 1 — Duplicate TestGetRHELStream (compilation failure, affects both jobs):

PR #8669 (CNTRLPLANE-3552, merged 2026-06-17T00:06:30Z) added hypershift-operator/controllers/nodepool/stream_test.go containing TestGetRHELStream at line 11. This test uses the old function signature: getRHELStream(explicitStream string, releaseVersion semver.Version, usesRunc bool).

PR #8730 then adds hypershift-operator/controllers/nodepool/osstream_test.go with a new TestGetRHELStream at line 16, testing the refactored function signature: getRHELStream(nodePool *hyperv1.NodePool, releaseImage *releaseinfo.ReleaseImage).

The PR author refactored getRHELStream to accept higher-level types (NodePool, ReleaseImage) instead of primitive arguments, and wrote new tests in osstream_test.go — but did not delete the old test file stream_test.go that was added by the dependency PR. Both files are in the same Go package (package nodepool), and Go does not allow two functions with the same name in a package, causing a compile-time error.

Error 2 — Unused generateMachineTemplateName (staticcheck U1000, affects verify job only):

The function generateMachineTemplateName at capi.go:856 is defined in production code but only called from test code (capi_test.go). Staticcheck's U1000 rule ignores test-file usages when determining if a function is used, so it flags this as unused. This is a pre-existing issue on main (the function was already only called from tests before this PR), but it surfaces now because the verify job runs staticcheck on the package that this PR modifies.

Recommendations
  1. Delete stream_test.go: Remove hypershift-operator/controllers/nodepool/stream_test.go entirely. The new osstream_test.go replaces its test coverage with the updated function signature. The old test file references parameters (explicitStream string, releaseVersion semver.Version, usesRunc bool) that no longer match the refactored getRHELStream function.

  2. Fix generateMachineTemplateName unused warning: Either:

    • Move generateMachineTemplateName to the test file (capi_test.go) where it is actually used, or
    • Export it if it's intended as part of the package API, or
    • Remove it if it's no longer needed and update the test to use an alternative
  3. Rebase onto latest main: Since PR CNTRLPLANE-3552: Multi-stream CoreOS metadata parsing and stream resolution #8669 merged recently and introduced stream_test.go, ensure the branch is rebased on current main to have the latest state of the conflicting file.

Evidence
Evidence Detail
Duplicate function TestGetRHELStream defined at stream_test.go:11 AND osstream_test.go:16 in same package nodepool
Old test file origin stream_test.go added by PR #8669 (CNTRLPLANE-3552), merged 2026-06-17T00:06:30Z
New test file origin osstream_test.go added by this PR #8730 (CNTRLPLANE-3553) as a new file
Signature mismatch Old: getRHELStream(explicitStream string, releaseVersion semver.Version, usesRunc bool) — New: getRHELStream(nodePool *hyperv1.NodePool, releaseImage *releaseinfo.ReleaseImage)
Unused function generateMachineTemplateName at capi.go:856 — only caller is in capi_test.go (staticcheck U1000)
Unit test exit FAIL github.com/openshift/hypershift/hypershift-operator/controllers/nodepool [build failed]make test-shard exit code 2
Verify exit make staticcheck exit code 2 at Makefile:561
PR dependency PR #8730 declares PR #8719 as dependency, but stream_test.go was introduced by PR #8669 (a sibling PR in the same feature)

sdminonne and others added 5 commits June 17, 2026 10:24
Add a FeatureGateAwareXValidation CEL rule on NodePoolSpec that prevents
removing the osImageStream field once set, closing the two-step bypass
for optional immutable fields on feature-gated types. Add an envtest
case covering the removal scenario.

Also add OSImageStreamRHEL9 and OSImageStreamRHEL10 constants for
consistent use of stream name strings across the codebase.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Wire the spec.osImageStream and status.osImageStream API fields
(added in CNTRLPLANE-3017) into the NodePool reconciliation loop so
that an explicit stream name is validated, included in the config hash
(triggering rollouts), propagated to the token secret, and reported
in status.

- Add ValidOSImageStream condition and InvalidOSImageStream reason
  constants to the API.
- Create osstream.go with getRHELStream() (derives stream from spec
  or release version) and validOSImageStreamCondition().
- Add rhelStream to rolloutConfig, Hash(), and HashWithoutVersion()
  for rollout triggering (backward compatible when empty).
- Write os-stream key into the token secret for downstream ignition.
- Set status.osImageStream on rollout completion in both Replace
  (MachineDeployment) and InPlace (MachineSet) paths.
- Thread rhelStream parameter through boot image resolution functions
  (defaultNodePoolAMI, defaultNodePoolGCPImage, and their callers)
  in preparation for multi-stream metadata parsing (CNTRLPLANE-3553).

Signed-off-by: Salvatore Dario Minonne <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…us, and error path

- Add "When rhelStream is set, it should change the hash" case to
  TestHash and TestHashWithoutVersion, verifying that changing
  rhelStream changes the hash and that empty rhelStream preserves
  backward-compatible values.
- Add expectedOSImageStream assertion to TestReconcileMachineDeploymentStatus,
  verifying status.osImageStream is set to rhel-9 on rollout completion
  with a 4.x release.
- Add error path case to TestGetRHELStream for unparseable version strings.

Signed-off-by: Salvatore Dario Minonne <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…hancement

Fix getRHELStream() to return "" for unset+<5.0 (legacy single-stream
behavior where no OSImageStream CR is generated), matching the
enhancement spec. For unset+>=5.0 it returns "rhel-10".

Extract defaultRHELStream() for version-based derivation, used by
NewConfigGenerator to normalize the hash: when the user explicitly sets
osImageStream to the version-derived default (e.g. "rhel-9" on 4.x),
the hash stays unchanged and no spurious rollout is triggered. Only a
non-default stream produces a different hash.

Move osImageStream validation from a separate ValidOSImageStream
condition into validMachineConfigCondition, consistent with the
enhancement design where stream validation runs alongside config
validation and has access to the ConfigGenerator context.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The nilerr linter flagged that validateOSImageStream error was checked
but the function returned nil instead of the error. Propagate the error
to match the pattern used by other validation failures in the same
function.

Signed-off-by: Salvatore Dario Minonne <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sdminonne sdminonne force-pushed the CNTRLPLANE-3553-osImageStream-controller branch from 49e64cb to ebde395 Compare June 17, 2026 08:31
@sdminonne

Copy link
Copy Markdown
Contributor Author

I'll rebase this once #8699 will be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/cli Indicates the PR includes changes for CLI area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants