Skip to content

DNM: CNTRLPLANE-3612: Add dual-stream RHEL support in HCP#8734

Open
hypershift-jira-solve-ci[bot] wants to merge 9 commits into
openshift:mainfrom
hypershift-community:fix-CNTRLPLANE-3612
Open

DNM: CNTRLPLANE-3612: Add dual-stream RHEL support in HCP#8734
hypershift-jira-solve-ci[bot] wants to merge 9 commits into
openshift:mainfrom
hypershift-community:fix-CNTRLPLANE-3612

Conversation

@hypershift-jira-solve-ci

@hypershift-jira-solve-ci hypershift-jira-solve-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

What this PR does / why we need it:

Adds dual-stream RHEL (9/10) support to the NodePool and ignition server pipelines so that HCP clusters running OCP 5.0+ can resolve, validate, and propagate the correct OS stream for worker nodes.

Key changes:

  • OS stream resolution (getRHELStream): resolves the RHEL stream from the user's explicit spec.osImageStream.name, the release version, and whether the NodePool uses runc. Defaults to rhel-10 on OCP ≥5.0; falls back to rhel-9 when runc is configured (RHEL 10 drops runc support). Pre-5.0 releases retain legacy behavior (empty stream).
  • Runc detection: ConfigGenerator inspects ContainerRuntimeConfig manifests during config parsing to detect runc usage, informing stream resolution.
  • Token secret plumbing: the resolved osStream is stored in the token secret under a new os-stream data key. TokenSecretReconciler reads it and passes it to the ignition provider via an updated GetPayload() interface.
  • Ignition payload generation: writes a 99_osimagestream.yaml manifest (an OSImageStream CR) into the MCC input directory when osStream is non-empty, enabling MCC bootstrap to discover available OS streams via OCI label inspection.
  • Validation and status: validMachineConfigCondition validates the stream selection early (e.g., rhel-10 on pre-5.0, runc + rhel-10) and surfaces errors as NodePool conditions. OS stream inference from CAPI Machine NodeInfo writes to status.osImageStream for observability.
  • Config hash integration: rhelStream is included in HashWithoutVersion() so stream-only changes trigger config update detection and CAPI reconciliation.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-3612

Special notes for your reviewer:

  • The rhelStream field in rolloutConfig uses the spec value (not the resolved value) to prevent fleet-wide rollouts when the default stream changes across releases.
  • The inferStreamFromOSImage() pattern is anchored to require dot-separated version format to reduce false-positive risk.
  • When no explicit stream is set and the release is ≥5.0 with runc, a status condition message surfaces the implicit rhel-9 fallback.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Always review AI generated responses prior to use.
Generated with Claude Code via /jira:solve [CNTRLPLANE-3612](https://redhat.atlassian.net/browse/CNTRLPLANE-3612)

Summary by CodeRabbit

Release Notes

  • New Features

    • Added dual-stream RHEL OS stream resolution for NodePools (RHEL 9 / RHEL 10), including container-runtime-aware behavior.
    • Token ignition payload generation now carries the resolved OS stream, enabling dual-stream rendering.
    • Enhanced NodePool status by inferring the OS image stream from existing machine nodes.
  • Bug Fixes

    • Improved validation and rollout behavior so OS stream changes correctly trigger rollouts and misconfigurations are rejected with clear conditions.
  • Documentation

    • Added a guide covering dual-stream RHEL OS provisioning, rules, and troubleshooting.

OpenShift CI Bot and others added 6 commits June 15, 2026 09:03
…stream support

Implements the core getRHELStream() function that resolves the RHEL OS stream
based on the user's explicit spec.osImageStream.name selection, the release
version, and whether the NodePool configuration uses runc.

Resolution rules:
- Explicit rhel-10 on pre-5.0 releases returns an error
- Explicit rhel-10 with runc config returns an error (RHEL 10 drops runc)
- No explicit stream + pre-5.0 returns empty (legacy behavior)
- No explicit stream + >=5.0 defaults to rhel-10
- No explicit stream + >=5.0 + runc falls back to rhel-9

Also adds usesRunc detection to ConfigGenerator by inspecting ContainerRuntimeConfig
manifests during config parsing, and adds rhelStream to rolloutConfig for config
hash computation (populated from spec only, not resolved value, to prevent fleet-wide
rollouts when the default stream changes across releases).

CNTRLPLANE-3612

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…vider

Adds osStream field to Token struct, resolved via getRHELStream() during
NewToken(). The resolved stream is stored in the token secret under a new
"os-stream" data key so the ignition server can use it during payload generation.

Updates the IgnitionProvider.GetPayload() interface to accept an osStream
parameter. The TokenSecretReconciler reads the os-stream key from the token
secret and passes it through to the ignition provider.

This is the plumbing layer that connects the NodePool controller's stream
resolution to the ignition server's payload generation pipeline.

CNTRLPLANE-3612

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Writes a 99_osimagestream.yaml manifest into the MCC input directory when
the osStream parameter is non-empty. This manifest contains an OSImageStream
CR with the resolved defaultStream, which MCC bootstrap uses to discover
available OS streams via OCI label inspection and render MachineConfigs
with the correct osImageURL.

The CR is written at the end of runMCO (after copyMCOOutputToMCC) so it is
available when runMCC processes the MCC input directory. When osStream is
empty (pre-5.0 legacy behavior), no CR is written.

CNTRLPLANE-3612

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Extends validMachineConfigCondition to validate the OS stream selection
before proceeding with token creation. This catches invalid configurations
early (e.g., rhel-10 on pre-5.0 releases, runc + rhel-10) and surfaces them
as NodePool conditions.

Adds OS stream inference from CAPI Machine NodeInfo: RHCOS version strings
starting with "4" indicate RHEL 9, while versions starting with "5" indicate
RHEL 10. The inferred stream is written to status.osImageStream to give
users visibility into which RHEL version their nodes are actually running.

CNTRLPLANE-3612

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Covers:
- getRHELStream(): all resolution paths including defaults, explicit
  selections, runc fallback, and error cases
- inferStreamFromOSImage(): RHCOS version string parsing for RHEL 9/10
- inferOSStreamFromMachines(): aggregation across multiple CAPI Machines
- writeOSImageStreamCR(): OSImageStream manifest generation
- fakeIgnitionProvider: updated to match new GetPayload signature

CNTRLPLANE-3612

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add rhelStream to HashWithoutVersion() so pure OS stream changes
  trigger config update detection and CAPI reconciliation
- Fix %q to %s in writeOSImageStreamCR() to produce idiomatic YAML
- Deduplicate TokenSecretOSStreamKey constant (define in ignserver only)
- Make error handling consistent in validMachineConfigCondition (both
  version parse and stream validation now return Result{}, nil)
- Surface implicit rhel-9 fallback as a status condition message
- Anchor inferStreamFromOSImage() pattern to require dot-separated
  version format, reducing false-positive risk
- Fix tie-break doc comment to say "plurality" instead of "majority"
- Fix inconsistent NewGomegaWithT usage in ignition-server tests
- Add tests: rhelStream in Hash/HashWithoutVersion, usesRunc detection,
  TokenSecretOSStreamKey in token secret, empty-string no-op for
  writeOSImageStreamCR, mixed-stream tie-break for inferOSStreamFromMachines

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 15, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 15, 2026

Copy link
Copy Markdown

@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-3612 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Adds dual-stream RHEL (9/10) support to the NodePool and ignition server pipelines so that HCP clusters running OCP 5.0+ can resolve, validate, and propagate the correct OS stream for worker nodes.

Key changes:

  • OS stream resolution (getRHELStream): resolves the RHEL stream from the user's explicit spec.osImageStream.name, the release version, and whether the NodePool uses runc. Defaults to rhel-10 on OCP ≥5.0; falls back to rhel-9 when runc is configured (RHEL 10 drops runc support). Pre-5.0 releases retain legacy behavior (empty stream).
  • Runc detection: ConfigGenerator inspects ContainerRuntimeConfig manifests during config parsing to detect runc usage, informing stream resolution.
  • Token secret plumbing: the resolved osStream is stored in the token secret under a new os-stream data key. TokenSecretReconciler reads it and passes it to the ignition provider via an updated GetPayload() interface.
  • Ignition payload generation: writes a 99_osimagestream.yaml manifest (an OSImageStream CR) into the MCC input directory when osStream is non-empty, enabling MCC bootstrap to discover available OS streams via OCI label inspection.
  • Validation and status: validMachineConfigCondition validates the stream selection early (e.g., rhel-10 on pre-5.0, runc + rhel-10) and surfaces errors as NodePool conditions. OS stream inference from CAPI Machine NodeInfo writes to status.osImageStream for observability.
  • Config hash integration: rhelStream is included in HashWithoutVersion() so stream-only changes trigger config update detection and CAPI reconciliation.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-3612

Special notes for your reviewer:

  • The rhelStream field in rolloutConfig uses the spec value (not the resolved value) to prevent fleet-wide rollouts when the default stream changes across releases.
  • The inferStreamFromOSImage() pattern is anchored to require dot-separated version format to reduce false-positive risk.
  • When no explicit stream is set and the release is ≥5.0 with runc, a status condition message surfaces the implicit rhel-9 fallback.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Always review AI generated responses prior to use.
Generated with Claude Code via /jira:solve [CNTRLPLANE-3612](https://redhat.atlassian.net/browse/CNTRLPLANE-3612)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5a924ab8-54a4-4ecc-bde4-2eeb891b4d71

📥 Commits

Reviewing files that changed from the base of the PR and between 93f8b05 and 677760c.

⛔ Files ignored due to path filters (1)
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
📒 Files selected for processing (3)
  • docs/content/how-to/automated-machine-management/dual-stream-rhel.md
  • docs/mkdocs.yml
  • hypershift-operator/controllers/nodepool/token_test.go
✅ Files skipped from review due to trivial changes (1)
  • docs/mkdocs.yml

📝 Walkthrough

Walkthrough

This PR implements dual-stream RHEL OS support (RHEL 9/10) for HyperShift NodePools. A new getRHELStream function resolves the appropriate RHEL stream from the NodePool spec, OCP release semver version, and whether the node pool's container runtime is runc (which is incompatible with RHEL 10). The resolved stream is incorporated into rollout hashing, validated during NodePool condition checks, stored in the token Secret under a new TokenSecretOSStreamKey, and passed through to the ignition server where it triggers generation of an OSImageStream CR (99_osimagestream.yaml) for MCC to consume. Additionally, setNodesInfoStatus now infers and sets nodePool.Status.OSImageStream by parsing running machines' OSImage strings. Supporting documentation describes the feature, resolution matrix, transition rules, and common upgrade scenarios.

Suggested reviewers

  • muraee
  • Nirshal
🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Assertion messages are missing from most Ginkgo Expect statements (e.g., g.Expect(err).ToNot(HaveOccurred()) lacks diagnostic context), making failures harder to diagnose. Add meaningful failure messages to Expect assertions: use Expect(...).To(..., "message") or Expect(...).ToNot(..., "message") to clarify what went wrong, especially for non-error assertions.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'DNM: CNTRLPLANE-3612: Add dual-stream RHEL support in HCP' is directly related to the core changeset, which implements dual-stream RHEL (9/10) support. However, the 'DNM' prefix indicates Do-Not-Merge status, which is a meta-concern rather than a clarity issue about the actual changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR contains only standard Go table-driven unit tests (using t.Run() and test case structs), not Ginkgo tests. No Ginkgo imports or It()/Describe()/Context()/When() declarations found. All test...
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only controller logic and configuration generation, introducing no pod scheduling constraints, topology assumptions, or operator deployments affecting SNO/TNF/TNA/HyperShift topologies.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All new tests are standard Go unit tests (TestGetRHELStream, TestUsesRuncDetection, TestInferStreamFromOSImage, TestInferOSStreamFromMachines, TestWriteOSI...
No-Weak-Crypto ✅ Passed No weak crypto detected: no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB usage, no custom crypto implementations, no non-constant-time secret comparisons. Hashing uses FNV-1a algorithm; tokens use safe map l...
Container-Privileges ✅ Passed PR introduces no Kubernetes manifests with privileged settings (privileged: true, hostPID/Network/IPC, SYS_ADMIN, allowPrivilegeEscalation). Changes are OS stream resolution logic and non-privilege...
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data (passwords, tokens, API keys, PII, session IDs, hostnames, or customer data) is exposed in logs. The osStream parameter (e.g., "rhel-9", "rhel-10") is not sensitive and is never l...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from Nirshal and cblecker June 15, 2026 09:39
@openshift-ci openshift-ci Bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label Jun 15, 2026
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hypershift-jira-solve-ci[bot]
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jun 15, 2026
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 58.82353% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.69%. Comparing base (712ba58) to head (677760c).

Files with missing lines Patch % Lines
...rshift-operator/controllers/nodepool/conditions.go 0.00% 43 Missing ⚠️
...ypershift-operator/controllers/nodepool/version.go 80.39% 7 Missing and 3 partials ⚠️
...ition-server/controllers/local_ignitionprovider.go 66.66% 6 Missing and 1 partial ⚠️
hypershift-operator/controllers/nodepool/token.go 60.00% 4 Missing and 2 partials ⚠️
hypershift-operator/controllers/nodepool/config.go 72.72% 2 Missing and 1 partial ⚠️
ignition-server/cmd/run_local_ignitionprovider.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8734      +/-   ##
==========================================
+ Coverage   41.66%   41.69%   +0.03%     
==========================================
  Files         758      759       +1     
  Lines       93929    94083     +154     
==========================================
+ Hits        39135    39232      +97     
- Misses      52046    52096      +50     
- Partials     2748     2755       +7     
Files with missing lines Coverage Δ
...shift-operator/controllers/nodepool/rhel_stream.go 100.00% <100.00%> (ø)
...ition-server/controllers/tokensecret_controller.go 59.18% <100.00%> (+0.16%) ⬆️
ignition-server/cmd/run_local_ignitionprovider.go 0.00% <0.00%> (ø)
hypershift-operator/controllers/nodepool/config.go 84.78% <72.72%> (-0.74%) ⬇️
hypershift-operator/controllers/nodepool/token.go 81.51% <60.00%> (-1.03%) ⬇️
...ition-server/controllers/local_ignitionprovider.go 39.17% <66.66%> (+1.02%) ⬆️
...ypershift-operator/controllers/nodepool/version.go 88.39% <80.39%> (-6.69%) ⬇️
...rshift-operator/controllers/nodepool/conditions.go 51.70% <0.00%> (-2.23%) ⬇️
Flag Coverage Δ
cmd-support 34.96% <ø> (ø)
cpo-hostedcontrolplane 44.00% <ø> (ø)
cpo-other 43.45% <ø> (ø)
hypershift-operator 51.70% <57.53%> (+0.04%) ⬆️
other 31.65% <66.66%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
hypershift-operator/controllers/nodepool/version_test.go (2)

329-365: 💤 Low value

Use Gomega for test assertions to maintain consistency.

This test uses t.Errorf for assertions, while the rest of the file uses Gomega (g.Expect). For consistency and better error messages, consider switching to Gomega:

♻️ Suggested refactor
 func TestInferStreamFromOSImage(t *testing.T) {
+	g := NewWithT(t)
+
 	tests := []struct {
 		name    string
 		osImage string
 		want    string
 	}{
 		// ... test cases ...
 	}
 
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
 			got := inferStreamFromOSImage(tt.osImage)
-			if got != tt.want {
-				t.Errorf("inferStreamFromOSImage(%q) = %v, want %v", tt.osImage, got, tt.want)
-			}
+			g.Expect(got).To(Equal(tt.want))
 		})
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/version_test.go` around lines 329 -
365, The TestInferStreamFromOSImage function uses manual t.Errorf assertions
instead of Gomega for consistency with the rest of the test file. Replace the if
statement that checks if got != tt.want with a Gomega g.Expect assertion to
match the testing style used elsewhere in the file. The assertion should compare
the result of inferStreamFromOSImage(tt.osImage) against tt.want using Gomega's
assertion methods.

367-427: 💤 Low value

Use Gomega for test assertions to maintain consistency.

This test uses t.Errorf for assertions, while the rest of the file uses Gomega (g.Expect). For consistency and better error messages, consider switching to Gomega:

♻️ Suggested refactor
 func TestInferOSStreamFromMachines(t *testing.T) {
+	g := NewWithT(t)
+
 	tests := []struct {
 		name     string
 		machines []*v1beta1.Machine
 		want     string
 	}{
 		// ... test cases ...
 	}
 
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
 			got := inferOSStreamFromMachines(tt.machines)
-			if got != tt.want {
-				t.Errorf("inferOSStreamFromMachines() = %v, want %v", got, tt.want)
-			}
+			g.Expect(got).To(Equal(tt.want))
 		})
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/version_test.go` around lines 367 -
427, The TestInferOSStreamFromMachines function currently uses t.Errorf for
assertions instead of Gomeka assertions like the rest of the file. Replace the
t.Errorf assertion inside the test's inner function with a Gomega assertion
using g.Expect and To(Equal()) pattern to maintain consistency with the rest of
the test file and provide better error messages when assertions fail.
ignition-server/controllers/local_ignitionprovider.go (1)

797-819: ⚡ Quick win

Consider validating osStream before YAML interpolation.

The osStream value is directly interpolated into the YAML string without validation. While in practice it's constrained to constants like "rhel-9" or "rhel-10" from upstream getRHELStream(), adding defensive validation would make the code more robust against future changes:

🛡️ Suggested validation
 func writeOSImageStreamCR(mccDir, osStream string) error {
 	if osStream == "" {
 		return nil
 	}
+	// Validate osStream to prevent YAML injection
+	if osStream != "rhel-9" && osStream != "rhel-10" {
+		return fmt.Errorf("invalid osStream value: %q (expected rhel-9 or rhel-10)", osStream)
+	}
+
 	osImageStreamCR := fmt.Sprintf(`apiVersion: machineconfiguration.openshift.io/v1alpha1
 kind: OSImageStream
 metadata:
   name: cluster
 spec:
   defaultStream: %s
 `, osStream)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ignition-server/controllers/local_ignitionprovider.go` around lines 797 -
819, The writeOSImageStreamCR function directly interpolates the osStream
parameter into YAML without validating its contents, which could be vulnerable
if the function is called with unexpected values in the future. Add defensive
validation before the fmt.Sprintf call to ensure osStream contains only expected
characters (such as alphanumeric characters, hyphens, and underscores). If
validation fails, return an error with a descriptive message indicating that the
osStream value is invalid. This makes the code more robust against future
changes while maintaining the current behavior for valid inputs like "rhel-9"
and "rhel-10".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/nodepool/token.go`:
- Around line 120-129: Add a nil check for configGenerator.releaseImage before
calling the Version() method on it in the NewToken function. If
configGenerator.releaseImage is nil, return an error with a descriptive message
(similar to the existing error handling pattern used for parseErr) rather than
allowing a panic to occur when dereferencing the nil pointer.

In `@hypershift-operator/controllers/nodepool/version.go`:
- Around line 102-107: The OSImageStream status field is only updated when
inferOSStreamFromMachines returns a non-empty stream, causing stale data to
persist when the function returns an empty string. Remove the conditional check
and always set nodePool.Status.OSImageStream to ensure the status accurately
reflects the current inferred stream state, whether it contains a valid stream
name or is empty.

---

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/version_test.go`:
- Around line 329-365: The TestInferStreamFromOSImage function uses manual
t.Errorf assertions instead of Gomega for consistency with the rest of the test
file. Replace the if statement that checks if got != tt.want with a Gomega
g.Expect assertion to match the testing style used elsewhere in the file. The
assertion should compare the result of inferStreamFromOSImage(tt.osImage)
against tt.want using Gomega's assertion methods.
- Around line 367-427: The TestInferOSStreamFromMachines function currently uses
t.Errorf for assertions instead of Gomeka assertions like the rest of the file.
Replace the t.Errorf assertion inside the test's inner function with a Gomega
assertion using g.Expect and To(Equal()) pattern to maintain consistency with
the rest of the test file and provide better error messages when assertions
fail.

In `@ignition-server/controllers/local_ignitionprovider.go`:
- Around line 797-819: The writeOSImageStreamCR function directly interpolates
the osStream parameter into YAML without validating its contents, which could be
vulnerable if the function is called with unexpected values in the future. Add
defensive validation before the fmt.Sprintf call to ensure osStream contains
only expected characters (such as alphanumeric characters, hyphens, and
underscores). If validation fails, return an error with a descriptive message
indicating that the osStream value is invalid. This makes the code more robust
against future changes while maintaining the current behavior for valid inputs
like "rhel-9" and "rhel-10".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 37db3fde-e3f9-426c-94f0-818e787d8440

📥 Commits

Reviewing files that changed from the base of the PR and between 712ba58 and 93f8b05.

📒 Files selected for processing (14)
  • hypershift-operator/controllers/nodepool/conditions.go
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/config_test.go
  • hypershift-operator/controllers/nodepool/rhel_stream.go
  • hypershift-operator/controllers/nodepool/rhel_stream_test.go
  • hypershift-operator/controllers/nodepool/token.go
  • hypershift-operator/controllers/nodepool/token_test.go
  • hypershift-operator/controllers/nodepool/version.go
  • hypershift-operator/controllers/nodepool/version_test.go
  • ignition-server/cmd/run_local_ignitionprovider.go
  • ignition-server/controllers/local_ignitionprovider.go
  • ignition-server/controllers/local_ignitionprovider_test.go
  • ignition-server/controllers/tokensecret_controller.go
  • ignition-server/controllers/tokensecret_controller_test.go

Comment on lines +120 to +129
// Resolve OS stream based on spec, release version, and runc usage.
var osStream string
releaseVersion, parseErr := semver.Parse(configGenerator.releaseImage.Version())
if parseErr != nil {
return nil, fmt.Errorf("failed to parse release version for OS stream resolution: %w", parseErr)
}
osStream, err = getRHELStream(configGenerator.nodePool.Spec.OSImageStream.Name, releaseVersion, configGenerator.usesRunc)
if err != nil {
return nil, fmt.Errorf("failed to resolve OS stream: %w", err)
}

@coderabbitai coderabbitai Bot Jun 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard NewToken against nil embedded rollout data before resolving OS stream.

NewToken dereferences configGenerator.releaseImage.Version() unconditionally. Because ConfigGenerator embeds *rolloutConfig, a nil rolloutConfig (or nil releaseImage) will panic instead of returning a validation error.

🐛 Proposed fix
 func NewToken(ctx context.Context, configGenerator *ConfigGenerator, cpoCapabilities *CPOCapabilities) (*Token, error) {
 	if configGenerator == nil {
 		return nil, fmt.Errorf("configGenerator can't be nil")
 	}
+	if configGenerator.rolloutConfig == nil || configGenerator.releaseImage == nil {
+		return nil, fmt.Errorf("configGenerator release image can't be nil")
+	}
+	if configGenerator.nodePool == nil {
+		return nil, fmt.Errorf("configGenerator nodePool can't be nil")
+	}
 
 	if cpoCapabilities == nil {
 		return nil, fmt.Errorf("cpoCapabilities can't be nil")
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Resolve OS stream based on spec, release version, and runc usage.
var osStream string
releaseVersion, parseErr := semver.Parse(configGenerator.releaseImage.Version())
if parseErr != nil {
return nil, fmt.Errorf("failed to parse release version for OS stream resolution: %w", parseErr)
}
osStream, err = getRHELStream(configGenerator.nodePool.Spec.OSImageStream.Name, releaseVersion, configGenerator.usesRunc)
if err != nil {
return nil, fmt.Errorf("failed to resolve OS stream: %w", err)
}
func NewToken(ctx context.Context, configGenerator *ConfigGenerator, cpoCapabilities *CPOCapabilities) (*Token, error) {
if configGenerator == nil {
return nil, fmt.Errorf("configGenerator can't be nil")
}
if configGenerator.rolloutConfig == nil || configGenerator.releaseImage == nil {
return nil, fmt.Errorf("configGenerator release image can't be nil")
}
if configGenerator.nodePool == nil {
return nil, fmt.Errorf("configGenerator nodePool can't be nil")
}
if cpoCapabilities == nil {
return nil, fmt.Errorf("cpoCapabilities can't be nil")
}
// Resolve OS stream based on spec, release version, and runc usage.
var osStream string
releaseVersion, parseErr := semver.Parse(configGenerator.releaseImage.Version())
if parseErr != nil {
return nil, fmt.Errorf("failed to parse release version for OS stream resolution: %w", parseErr)
}
osStream, err = getRHELStream(configGenerator.nodePool.Spec.OSImageStream.Name, releaseVersion, configGenerator.usesRunc)
if err != nil {
return nil, fmt.Errorf("failed to resolve OS stream: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/token.go` around lines 120 - 129,
Add a nil check for configGenerator.releaseImage before calling the Version()
method on it in the NewToken function. If configGenerator.releaseImage is nil,
return an error with a descriptive message (similar to the existing error
handling pattern used for parseErr) rather than allowing a panic to occur when
dereferencing the nil pointer.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern is valid in principle, but these nil cases are already guarded by NewConfigGenerator — the only constructor for ConfigGenerator:

  • releaseImage != nil is validated at config.go:80-82
  • rolloutConfig is always initialized as part of the struct (it embeds releaseImage), so it cannot be nil when NewConfigGenerator succeeds
  • nodePool is a required constructor parameter and all three call sites (nodepool_controller.go:371, nodepool_controller.go:465, secret_janitor.go:122) provide a non-nil value obtained from the reconcile loop

Adding redundant nil checks in NewToken would duplicate existing validation without increasing safety, since NewToken is not called with a manually-constructed ConfigGenerator outside of tests (and tests already set these fields). Skipping this change to keep the validation in one place.


AI-assisted response via Claude Code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped: comment is from another GitHub bot.

Comment on lines +102 to +107
// Infer OS stream from node OSImage and set status.osImageStream.
if stream := inferOSStreamFromMachines(machines); stream != "" {
nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{
Name: stream,
}
}

@coderabbitai coderabbitai Bot Jun 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Status field is not cleared when OS stream inference returns empty, potentially leaving stale data.

When inferOSStreamFromMachines returns an empty string (e.g., no machines with NodeInfo, or all machines scaled to zero), the code skips updating nodePool.Status.OSImageStream. This leaves the previous value in place even though it no longer reflects the current machine state.

Consider always setting the field to ensure status accurately reflects the current inferred stream (or lack thereof):

-	// Infer OS stream from node OSImage and set status.osImageStream.
-	if stream := inferOSStreamFromMachines(machines); stream != "" {
-		nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{
-			Name: stream,
-		}
-	}
+	// Infer OS stream from node OSImage and set status.osImageStream.
+	stream := inferOSStreamFromMachines(machines)
+	nodePool.Status.OSImageStream = hyperv1.OSImageStreamReference{
+		Name: stream,
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/version.go` around lines 102 - 107,
The OSImageStream status field is only updated when inferOSStreamFromMachines
returns a non-empty stream, causing stale data to persist when the function
returns an empty string. Remove the conditional check and always set
nodePool.Status.OSImageStream to ensure the status accurately reflects the
current inferred stream state, whether it contains a valid stream name or is
empty.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior is intentional. Preserving the last-known OS stream during transient states is the correct design choice here:

  1. During initial provisioning: Before machines register NodeInfo, inferOSStreamFromMachines returns empty. Clearing the status would remove useful observability data that was already set from the spec-driven resolution path.
  2. During scale-to-zero: When all machines are scaled down, there's no NodeInfo to inspect. Retaining the last observed stream is more useful for operators than showing an empty field.
  3. The status field documents what was observed: The status.osImageStream comment says "reports the OS stream observed on the nodes" — when no observation is possible, preserving the last known value is consistent with how other Kubernetes status fields behave (e.g., status.version isn't cleared during rollouts).

An empty inferOSStreamFromMachines result means "no data available," not "the stream changed to nothing." Unconditionally setting it would introduce misleading status transitions.


AI-assisted response via Claude Code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped: comment is from another GitHub bot.

@enxebre enxebre changed the title CNTRLPLANE-3612: Add dual-stream RHEL support in HCP DNM: CNTRLPLANE-3612: Add dual-stream RHEL support in HCP Jun 15, 2026
@enxebre

enxebre commented Jun 15, 2026

Copy link
Copy Markdown
Member

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 15, 2026
@enxebre

enxebre commented Jun 15, 2026

Copy link
Copy Markdown
Member

add a new section in the automated-machine-management docs capturing the UX for the different dualstream permutations captured in the enhacement

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Author

Confirmed. The base branch (before this PR) did not access configGenerator.releaseImage in NewToken(). The PR inserts new code at lines 120–128 that accesses configGenerator.releaseImage.Version() — but the test's ConfigGenerator does not set the embedded *rolloutConfig, so releaseImage is nil, causing the SIGSEGV.

Now I have everything needed for the report.

Test Failure Analysis Complete

Job Information

  • Prow Job: test / Unit Tests (hypershift-operator)
  • Build ID: 27537405518 / job 81395274701
  • PR: #8734 — DNM: CNTRLPLANE-3612: Add dual-stream RHEL support in HCP
  • Test: TestNewToken/when_all_input_is_given_it_should_create_token_successfully
  • Package: github.com/openshift/hypershift/hypershift-operator/controllers/nodepool

Test Failure Analysis

Error

--- FAIL: TestNewToken (0.02s)
    --- FAIL: TestNewToken/when_all_input_is_given_it_should_create_token_successfully (0.02s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x266e1a8]

goroutine 454 [running]:
github.com/openshift/hypershift/hypershift-operator/controllers/nodepool.NewToken({0x346ab58, 0xc000c86140}, 0xc0011c2600, 0xc000dcf30e)
	/home/runner/_work/hypershift/hypershift/hypershift-operator/controllers/nodepool/token.go:122 +0x378

Summary

The PR adds OS stream resolution logic to NewToken() in token.go that calls configGenerator.releaseImage.Version() at line 122. However, the existing test TestNewToken/when_all_input_is_given_it_should_create_token_successfully constructs a ConfigGenerator struct without setting the embedded *rolloutConfig field (which contains releaseImage). Since rolloutConfig is an embedded pointer (*rolloutConfig) that defaults to nil, accessing configGenerator.releaseImage dereferences the nil pointer, causing a SIGSEGV panic.

Root Cause

The PR inserts new code at token.go:120-128 inside NewToken():

// Resolve OS stream based on spec, release version, and runc usage.
var osStream string
releaseVersion, parseErr := semver.Parse(configGenerator.releaseImage.Version())  // LINE 122

The ConfigGenerator struct embeds *rolloutConfig as a pointer:

type ConfigGenerator struct {
    client.Client
    hostedCluster         *hyperv1.HostedCluster
    nodePool              *hyperv1.NodePool
    controlplaneNamespace string
    usesRunc              bool
    *rolloutConfig  // <-- embedded pointer, nil by default
}

And rolloutConfig contains releaseImage:

type rolloutConfig struct {
    releaseImage              *releaseinfo.ReleaseImage
    // ... other fields
}

The test at token_test.go:77 constructs the ConfigGenerator directly without setting rolloutConfig:

configGenerator: &ConfigGenerator{
    hostedCluster: &hyperv1.HostedCluster{...},
    nodePool:      &hyperv1.NodePool{},
    controlplaneNamespace: controlplaneNamespace,
},

Before this PR, NewToken() never accessed configGenerator.releaseImage, so the nil rolloutConfig was harmless. The PR adds configGenerator.releaseImage.Version() which dereferences the nil embedded *rolloutConfig, triggering SIGSEGV.

The fix requires updating the test to include a rolloutConfig with a valid releaseImage, for example:

configGenerator: &ConfigGenerator{
    hostedCluster: &hyperv1.HostedCluster{...},
    nodePool:      &hyperv1.NodePool{},
    controlplaneNamespace: controlplaneNamespace,
    rolloutConfig: &rolloutConfig{
        releaseImage: &releaseinfo.ReleaseImage{
            ImageStream: &imageapi.ImageStream{
                ObjectMeta: metav1.ObjectMeta{Name: "4.17.0"},
            },
        },
    },
},
Recommendations
  1. Fix the test: Add a rolloutConfig with a valid releaseImage to the "when all input is given it should create token successfully" test case in token_test.go:77. The releaseImage must have an ImageStream with a valid semver Name (e.g., "4.17.0" or "5.0.0") since NewToken() now calls semver.Parse(configGenerator.releaseImage.Version()).

  2. Add a nil-guard: Consider adding a nil check for configGenerator.rolloutConfig or configGenerator.releaseImage at the start of NewToken() (alongside the existing configGenerator == nil check) to produce a clear error message instead of a panic.

  3. Audit other tests: Check whether other TestNewToken sub-tests that construct ConfigGenerator without rolloutConfig also need updating — the error-path tests (missing pull secret, missing trust bundle, etc.) may hit the nil dereference before reaching their expected error if the new OS stream code runs first. Review test execution order to confirm.

Evidence
Evidence Detail
Panic location token.go:122configGenerator.releaseImage.Version()
Signal SIGSEGV: segmentation violation code=0x1 addr=0x0 (nil pointer dereference)
Failing test TestNewToken/when_all_input_is_given_it_should_create_token_successfully (token_test.go:77)
Root cause Test constructs ConfigGenerator without embedded *rolloutConfig, leaving releaseImage nil
PR change Lines added at token.go:120-128 access configGenerator.releaseImage.Version() for OS stream resolution
Base branch behavior Pre-PR NewToken() never accessed configGenerator.releaseImage — nil rolloutConfig was safe
Package github.com/openshift/hypershift/hypershift-operator/controllers/nodepool (exit code 1, FAIL)
All other packages Passed (34+ packages OK)

Document the UX for all dual-stream RHEL permutations including the OS
stream resolution matrix, stream transitions, runc compatibility,
observability via status fields and conditions, and common upgrade
scenarios.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@hypershift-jira-solve-ci

Copy link
Copy Markdown
Author

Re: #8734 (comment)

Done. Added docs/content/how-to/automated-machine-management/dual-stream-rhel.md documenting the UX for all dual-stream permutations:

  • OS stream resolution matrix — every combination of spec.osImageStream.name, release version, and runc usage with the resolved outcome
  • Stream transitions — allowed forward upgrades (rhel-9 → rhel-10) and blocked downgrades (CEL validation)
  • runc compatibility — error vs. automatic fallback behavior
  • Observabilitystatus.osImageStream inference and ValidMachineConfig condition messages
  • Common scenarios — OCP 4.x→5.0 upgrade, mixed OS pools, and RHEL 9→10 migration steps

Also added the page to the mkdocs nav under "Automated Machine Management".


AI-assisted response via Claude Code

OpenShift CI Bot and others added 2 commits June 15, 2026 10:31
…stTokenReconcile

The dual-stream RHEL support added OS stream resolution to NewToken()
which requires configGenerator.releaseImage to be set. Three
TestNewToken cases (success, missing ignition endpoint, and missing CA
cert) reached the new code path without a rolloutConfig, causing a nil
pointer dereference on configGenerator.releaseImage.Version().

Additionally, TestTokenReconcile used ImageStream Name "4.17" which is
not valid semver — semver.Parse requires Major.Minor.Patch format.

Fix by adding rolloutConfig with a valid releaseImage to the affected
TestNewToken cases and updating TestTokenReconcile to use "4.17.0".

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@openshift-ci openshift-ci Bot added the area/documentation Indicates the PR includes changes for documentation label Jun 15, 2026
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@hypershift-jira-solve-ci[bot]: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@cblecker

Copy link
Copy Markdown
Member

/uncc

@openshift-ci openshift-ci Bot removed the request for review from cblecker June 16, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants