CNTRLPLANE-3552: Multi-stream CoreOS metadata parsing and stream resolution#8669
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@jparrill: This pull request references CNTRLPLANE-3552 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces multi-stream RHEL CoreOS image metadata support to the hypershift operator. It adds constants and resolution logic for rhel-9 and rhel-10 streams ( Sequence DiagramsequenceDiagram
participant NodePool as NodePool Controller
participant GetRHELStream as GetRHELStream
participant RegistryClient as RegistryClientProvider
participant Deserialize as DeserializeImageMetadata
participant ReleaseImage as ReleaseImage
participant StreamForName as StreamForName
participant ImageResolver as Image Resolver (kubevirt/openstack/powervs)
NodePool->>GetRHELStream: explicitStream, releaseVersion, usesRunc
GetRHELStream-->>NodePool: resolved stream name (or error)
RegistryClient->>Deserialize: ConfigMap data bytes
Deserialize-->>RegistryClient: (singleStreamMeta, multiStreamMap, error)
RegistryClient->>ReleaseImage: populate StreamMetadata + OSStreams
NodePool->>StreamForName: streamName
StreamForName->>ReleaseImage: lookup in StreamMetadata or OSStreams
StreamForName-->>NodePool: CoreOSStreamMetadata (or error)
NodePool->>ImageResolver: resolved metadata
ImageResolver->>ImageResolver: validate StreamMetadata not nil
ImageResolver-->>NodePool: image artifact (AMI, GCP image, etc.)
Suggested reviewers
🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@support/releaseinfo/deserialize.go`:
- Around line 41-48: The code currently initializes a zero-value
CoreOSStreamMetadata and returns its address even when the "stream" key is
absent; update the logic in the json unmarshal block and final return so that
when hasStreamData is false the function returns (nil, osStreams, nil) instead
of &coreOSMeta. Concretely, only allocate or populate coreOSMeta when
hasStreamData and json.Unmarshal succeeds (inside the if hasStreamData { ... }
block using json.Unmarshal into a local variable), and change the final return
to return the pointer to that populated struct or nil when hasStreamData was
false (i.e., return nil, osStreams, nil).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 39f31f55-c3e5-4714-a09f-53a65264d50e
📒 Files selected for processing (11)
hypershift-operator/controllers/nodepool/stream.gohypershift-operator/controllers/nodepool/stream_test.gosupport/releaseinfo/deserialize.gosupport/releaseinfo/deserialize_test.gosupport/releaseinfo/fixtures/5.0-installer-coreos-bootimages.yamlsupport/releaseinfo/fixtures/fixtures.gosupport/releaseinfo/registry_mirror_provider.gosupport/releaseinfo/registryclient_provider.gosupport/releaseinfo/releaseinfo.gosupport/releaseinfo/releaseinfo_test.gotest/integration/osstreams/osstreams_test.go
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8669 +/- ##
==========================================
+ Coverage 41.67% 41.79% +0.11%
==========================================
Files 758 759 +1
Lines 93945 94037 +92
==========================================
+ Hits 39155 39304 +149
+ Misses 52043 51983 -60
- Partials 2747 2750 +3
... and 5 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
6651f3c to
3121c68
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (3)
support/releaseinfo/deserialize_test.go (2)
20-26: ⚡ Quick winMark test helper as a helper for cleaner failure locations.
testConfigMapis a helper but does not mark itself witht.Helper(), which makes failures point to the helper body instead of the calling test.Suggested change
-func testConfigMap(dataFields map[string]string) []byte { +func testConfigMap(t *testing.T, dataFields map[string]string) []byte { + t.Helper() cm := "apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: test\ndata:\n" for k, v := range dataFields { cm += fmt.Sprintf(" %s: %s\n", k, v) } return []byte(cm) }// Update call sites in this file: data: testConfigMap(t, map[string]string{ ... })As per coding guidelines, "Use
t.Helper()in Go helper functions to improve error tracebacks".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@support/releaseinfo/deserialize_test.go` around lines 20 - 26, testConfigMap is a test helper but doesn't call t.Helper() or accept *testing.T, so failures point into the helper; change the signature of testConfigMap to accept t *testing.T (e.g., testConfigMap(t *testing.T, dataFields map[string]string)), call t.Helper() at the top of that function, and update all call sites in this file (places that call testConfigMap(...)) to pass the test variable (e.g., testConfigMap(t, map[string]string{...})).
28-123: ⚡ Quick winRun these unit tests in parallel.
Both test functions and their subtests can run independently; adding
t.Parallel()aligns with repository test guidance and reduces suite runtime.Suggested change
func TestDeserializeImageMetadata(t *testing.T) { + t.Parallel() tests := []struct { ... } for _, tt := range tests { + tt := tt t.Run(tt.name, func(t *testing.T) { + t.Parallel() g := NewWithT(t) ... }) } } func TestDeserializeImageMetadataMultiStreamContent(t *testing.T) { + t.Parallel() defaultStream, osStreams, err := DeserializeImageMetadata(fixtures.CoreOSBootImagesYAML_5_0) ... for _, tt := range tests { + tt := tt t.Run(tt.name, func(t *testing.T) { + t.Parallel() tt.assert(NewWithT(t)) }) } }As per coding guidelines, "Unit tests should use race detection and parallel execution".
Also applies to: 125-229
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@support/releaseinfo/deserialize_test.go` around lines 28 - 123, Add parallel execution to the tests: call t.Parallel() at the start of the top-level TestDeserializeImageMetadata function and also inside each subtest goroutine (the func passed to t.Run) so subtests run in parallel; do the same for the other test function referenced (the one around lines 125-229). Locate the test functions by name (TestDeserializeImageMetadata and the other test function in the same file) and add t.Parallel() both at the top of each test and at the start of each t.Run subtest closure, ensuring no shared mutable state is assumed.hypershift-operator/controllers/nodepool/stream_test.go (1)
11-171: ⚡ Quick winEnable parallel execution for this table-driven test.
This suite is a good candidate for
t.Parallel()at both parent and subtest level.Suggested change
func TestGetRHELStream(t *testing.T) { + t.Parallel() tests := []struct { ... } for _, tt := range tests { + tt := tt t.Run(tt.name, func(t *testing.T) { + t.Parallel() g := NewWithT(t) result, err := GetRHELStream(tt.explicitStream, tt.releaseVersion, tt.usesRunc) ... }) } }As per coding guidelines, "Unit tests should use race detection and parallel execution".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/stream_test.go` around lines 11 - 171, The test TestGetRHELStream should enable parallel execution to follow guidelines; add t.Parallel() as the first statement in TestGetRHELStream and also call t.Parallel() at the start of each subtest inside the t.Run closure so each case runs concurrently; locate TestGetRHELStream and the anonymous func passed to t.Run around the table loop and insert the t.Parallel() calls while ensuring no shared mutable state is accessed when invoking GetRHELStream.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/stream_test.go`:
- Around line 11-171: The test TestGetRHELStream should enable parallel
execution to follow guidelines; add t.Parallel() as the first statement in
TestGetRHELStream and also call t.Parallel() at the start of each subtest inside
the t.Run closure so each case runs concurrently; locate TestGetRHELStream and
the anonymous func passed to t.Run around the table loop and insert the
t.Parallel() calls while ensuring no shared mutable state is accessed when
invoking GetRHELStream.
In `@support/releaseinfo/deserialize_test.go`:
- Around line 20-26: testConfigMap is a test helper but doesn't call t.Helper()
or accept *testing.T, so failures point into the helper; change the signature of
testConfigMap to accept t *testing.T (e.g., testConfigMap(t *testing.T,
dataFields map[string]string)), call t.Helper() at the top of that function, and
update all call sites in this file (places that call testConfigMap(...)) to pass
the test variable (e.g., testConfigMap(t, map[string]string{...})).
- Around line 28-123: Add parallel execution to the tests: call t.Parallel() at
the start of the top-level TestDeserializeImageMetadata function and also inside
each subtest goroutine (the func passed to t.Run) so subtests run in parallel;
do the same for the other test function referenced (the one around lines
125-229). Locate the test functions by name (TestDeserializeImageMetadata and
the other test function in the same file) and add t.Parallel() both at the top
of each test and at the start of each t.Run subtest closure, ensuring no shared
mutable state is assumed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c73114df-faea-4ddf-bd9d-f0e03b64c080
📒 Files selected for processing (11)
hypershift-operator/controllers/nodepool/stream.gohypershift-operator/controllers/nodepool/stream_test.gosupport/releaseinfo/deserialize.gosupport/releaseinfo/deserialize_test.gosupport/releaseinfo/fixtures/5.0-installer-coreos-bootimages.yamlsupport/releaseinfo/fixtures/fixtures.gosupport/releaseinfo/registry_mirror_provider.gosupport/releaseinfo/registryclient_provider.gosupport/releaseinfo/releaseinfo.gosupport/releaseinfo/releaseinfo_test.gotest/integration/osstreams/osstreams_test.go
🚧 Files skipped from review as they are similar to previous changes (9)
- support/releaseinfo/fixtures/fixtures.go
- support/releaseinfo/releaseinfo_test.go
- support/releaseinfo/releaseinfo.go
- support/releaseinfo/registryclient_provider.go
- support/releaseinfo/registry_mirror_provider.go
- hypershift-operator/controllers/nodepool/stream.go
- support/releaseinfo/deserialize.go
- support/releaseinfo/fixtures/5.0-installer-coreos-bootimages.yaml
- test/integration/osstreams/osstreams_test.go
|
/hold We need to discuss what's the best approach for the dependent PR #8673 |
ce055e3 to
6eadd8e
Compare
|
Scheduling tests matching the |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/retest |
3a6e145 to
3c153cb
Compare
|
Rebased |
| } | ||
|
|
||
| if !isOCP5Plus { | ||
| return "", nil |
There was a problem hiding this comment.
I wonder if we should be explicit here, we can always follow up.
There was a problem hiding this comment.
Good point. The function's job is to resolve the stream, and for <5.0 that's always rhel-9. Returning empty string here is intentional though — it signals "legacy single-stream behavior" to callers. The config hash (Phase 2, CNTRLPLANE-3553) only includes the stream when spec.osImageStream.name is explicitly set by the user, so this return value never enters the hash. Happy to make it explicit if you prefer — it won't cause rollouts either way.
| return nil, err | ||
| } | ||
|
|
||
| if coreOSMeta == nil && len(osStreams) > 0 { |
There was a problem hiding this comment.
where is this choice coming from? can we validate with the owner of this api what's the expectation and contract for consumers?
Then at minimum add a //comment with the rationale for that choice
There was a problem hiding this comment.
Validated with the MCO team (Slack thread). The legacy stream key in the coreos-bootimages ConfigMap is frozen to rhel-9 and will be present until rhel-9 EoL (OCP 5.3). There's no "default stream" field in the ConfigMap itself — the installer injects a separate OSImageStream CR with spec.defaultStream (installer source), but HyperShift doesn't use installer assets, so we generate our own in Phase 2 (CNTRLPLANE-3553).
The fallback now explicitly looks up rhel-9 (instead of iterating the map) and lives in DeserializeImageMetadata — removed the redundant copy from Lookup() since DeserializeImageMetadata already handles it before returning. Added a comment with the rationale and the Slack reference.
d28408c to
ee41ea1
Compare
| if i.OSStreams == nil { | ||
| return nil, fmt.Errorf("stream %q not found: no multi-stream metadata available", name) | ||
| } | ||
| meta, ok := i.OSStreams[name] |
There was a problem hiding this comment.
for < 5 with explicit rhel9 will need to fallback to return i.StreamMetadata as i.OSStreams[name] won't exist, right?
There was a problem hiding this comment.
You're right — the legacy stream key is guaranteed present until OCP 5.3 (confirmed with MCO team), so the fallback was dead code. And when stream does eventually go away, consumers should explicitly pick their stream via StreamForName() rather than silently getting rhel-9.
Removed the fallback from DeserializeImageMetadata. Updated StreamForName() so that when OSStreams is not nil (>= 5.0 payload), it only looks in OSStreams and errors if the stream isn't found. When OSStreams is nil (< 5.0 payload), it falls back to StreamMetadata as you described.
…m resolution Add support for parsing the new multi-stream boot image ConfigMap format introduced in OCP 5.0 payloads. The ConfigMap now carries a "streams" key alongside the legacy "stream" key, mapping stream names (rhel-9, rhel-10) to per-architecture boot image metadata. - Update DeserializeImageMetadata to parse both "streams" and "stream" keys, returning the parsed OSStreams map alongside the default metadata - Add OSStreams field to ReleaseImage for holding per-stream metadata - Add StreamForName convenience method on ReleaseImage for stream lookup - Add GetRHELStream pure function implementing the stream resolution table from the dual-stream RHEL enhancement - Add 5.0 boot image fixture extracted from 5.0.0-ec.2 release payload - Fallback: when ConfigMap has only "streams" without "stream", populate StreamMetadata from the first available stream to prevent nil panics in platform consumers Signed-off-by: Juan Manuel Parrilla Madrid <[email protected]> Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Juan Manuel Parrilla Madrid <[email protected]>
ee41ea1 to
ba2f870
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre, jparrill The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
Scheduling tests matching the |
|
/retest-required |
|
/retest-required |
|
The PR changes CoreOS metadata parsing and stream resolution code — entirely in the HyperShift operator and release info support layer. These changes are not used during the management cluster installation (IPI install), which is what failed. The failure is in the I now have enough evidence to produce the final report. This is a management cluster infrastructure installation failure — the bootstrap process timed out because the single worker node ( Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe management cluster IPI installation failed during the Root CauseThe root cause is a management cluster bootstrap timeout due to the worker node not joining the cluster. The failure chain is:
Why this is unrelated to PR #8669: The PR modifies CoreOS metadata parsing in This is likely a CI infrastructure flake — Recommendations
Evidence
|
|
/override ci/prow/e2e-kubevirt-aws-ovn-reduced |
|
/verified by unit tests |
|
@jparrill: Overrode contexts on behalf of jparrill: ci/prow/e2e-kubevirt-aws-ovn-reduced DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@jparrill: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jparrill: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
Add multi-stream CoreOS boot image parsing and RHEL stream resolution logic for the dual-stream RHEL 9/10 NodePool feature (Phase 1 of CNTRLPLANE-3018).
DeserializeImageMetadatato parse both the legacystreamkey (OCP < 5.0) and the newstreamskey (OCP >= 5.0) from the boot image ConfigMapOSStreamsfield toReleaseImageandStreamForName()method for stream-specific boot image lookupGetRHELStream()pure function implementing the stream resolution table from the dual-stream RHEL enhancement5.0.0-ec.2release payloadstreamswithoutstream, populate StreamMetadata from the first available stream to prevent nil panics in platform consumersThis is preparatory code — not reachable in production until Phase 2 connects
GetRHELStreaminto the NodePool controller (NewToken(),validMachineConfigCondition). The multi-stream parsing only activates when a >= 5.0 payload carries thestreamsConfigMap key.Stream Resolution Table
""(legacy)rhel-10rhel-9(fallback)rhel-9rhel-9rhel-10rhel-10rhel-10rhel-10Which issue(s) this PR fixes:
Fixes CNTRLPLANE-3552
Special notes for your reviewer:
5.0.0-ec.2payload (not synthetic)GetRHELStreamis exported for future Phase 2 consumersOSStreamsfield lives onReleaseImage(not onstream.Stream) to keep the upstream struct cleanstream.Streamfromcoreos/stream-metadata-go(introduced by CNTRLPLANE-3020: Adopt coreos/stream-metadata-go upstream library #8673)StreamRHEL9/StreamRHEL10exported for reuseChecklist:
Test plan
Unit tests — parsing and stream resolution:
Verification:
🤖 Generated with Claude Code