CNTRLPLANE-3023: Add CEL rule to prevent osImageStream removal#8719
CNTRLPLANE-3023: Add CEL rule to prevent osImageStream removal#8719sdminonne wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@sdminonne: This pull request references CNTRLPLANE-3023 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (5)
📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIn 🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8719 +/- ##
=======================================
Coverage 41.66% 41.66%
=======================================
Files 758 758
Lines 93929 93929
=======================================
Hits 39135 39135
Misses 52046 52046
Partials 2748 2748
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
api/hypershift/v1beta1/nodepool_types.go (1)
242-261: ⚡ Quick winClarify default OS stream behavior in documentation.
Line 252 states "the pool uses the release version's default stream (rhel-9 for OCP < 5.0, rhel-10 for OCP >= 5.0)". This is misleading: for OCP >= 5.0,
AvailableOSImageStreams()returns both["rhel-9", "rhel-10"], not a single default.When
osImageStreamis omitted, the actual default selection is implementation-defined by downstream platform code, not by this API field. Consider revising the documentation to:// When omitted, the pool uses platform-specific default OS images. // For OCP < 5.0, only rhel-9 is available. // For OCP >= 5.0, both rhel-9 and rhel-10 are available; set this field // explicitly to select a non-default stream.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/hypershift/v1beta1/nodepool_types.go` around lines 242 - 261, Doc comment for the OSImageStream field is misleading about defaults; update the comment above the OSImageStream (osImageStream) field to state that when omitted the pool uses platform-specific defaults, that AvailableOSImageStreams() returns only rhel-9 for OCP < 5.0 but both rhel-9 and rhel-10 for OCP >= 5.0, and recommend that callers explicitly set OSImageStream to pick a non-default stream (use the suggested replacement wording from the review to replace the existing paragraph). Ensure references to OSImageStream and AvailableOSImageStreams() remain accurate and keep the CEL validation note intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@api/hypershift/v1beta1/nodepool_types.go`:
- Around line 242-261: Doc comment for the OSImageStream field is misleading
about defaults; update the comment above the OSImageStream (osImageStream) field
to state that when omitted the pool uses platform-specific defaults, that
AvailableOSImageStreams() returns only rhel-9 for OCP < 5.0 but both rhel-9 and
rhel-10 for OCP >= 5.0, and recommend that callers explicitly set OSImageStream
to pick a non-default stream (use the suggested replacement wording from the
review to replace the existing paragraph). Ensure references to OSImageStream
and AvailableOSImageStreams() remain accurate and keep the CEL validation note
intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 212bbd47-181c-4e0a-8242-848f008b0507
⛔ Files ignored due to path filters (20)
api/hypershift/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests.yamlis excluded by!**/zz_generated*api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OSStreams.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**client/applyconfiguration/hypershift/v1beta1/nodepoolspec.gois excluded by!client/**client/applyconfiguration/hypershift/v1beta1/nodepoolstatus.gois excluded by!client/**client/applyconfiguration/hypershift/v1beta1/osimagestreamreference.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**cmd/install/assets/crds/hypershift-operator/payload-manifests/featuregates/featureGate-Hypershift-Default.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/payload-manifests/featuregates/featureGate-Hypershift-TechPreviewNoUpgrade.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/payload-manifests/featuregates/featureGate-SelfManagedHA-Default.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/payload-manifests/featuregates/featureGate-SelfManagedHA-TechPreviewNoUpgrade.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/tests/nodepools.hypershift.openshift.io/featuregated.nodepools.osimagestream.testsuite.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamldocs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_conditions.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests.yamlis excluded by!vendor/**,!**/vendor/**,!**/zz_generated*
📒 Files selected for processing (11)
api/hypershift/v1beta1/featuregates/featureGate-Hypershift-Default.yamlapi/hypershift/v1beta1/featuregates/featureGate-Hypershift-TechPreviewNoUpgrade.yamlapi/hypershift/v1beta1/featuregates/featureGate-SelfManagedHA-Default.yamlapi/hypershift/v1beta1/featuregates/featureGate-SelfManagedHA-TechPreviewNoUpgrade.yamlapi/hypershift/v1beta1/nodepool_conditions.goapi/hypershift/v1beta1/nodepool_types.gohypershift-operator/controllers/nodepool/conditions.gohypershift-operator/controllers/nodepool/conditions_test.gohypershift-operator/controllers/nodepool/nodepool_controller.gosupport/releaseinfo/releaseinfo.gosupport/releaseinfo/releaseinfo_test.go
|
/retest |
bef198c to
76f9b0f
Compare
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks!
| CIDRConflictReason = "CIDRConflict" | ||
| NodePoolKubeVirtLiveMigratableReason = "KubeVirtNodesNotLiveMigratable" | ||
| NodePoolUnsupportedSkewReason = "UnsupportedSkew" | ||
| NodePoolOSImageStreamRemovalReason = "OSImageStreamRemoved" |
There was a problem hiding this comment.
nit: The alignment on these three is off compared to the rest of the const block. The existing constants column-align the = sign — these push it further right. make fmt won't catch it (gofmt doesn't enforce alignment across different-length names in a const block), so it's a manual fix.
Also, the constant name NodePoolOSImageStreamRemovalReason says "Removal" (noun) but its value is "OSImageStreamRemoved" (past tense). The other two are consistent with themselves (DowngradeReason/"...Downgrade", NotInPayloadReason/"...NotInPayload"). Minor, but worth picking one convention.
|
|
||
| // --- Phase 3: Valid. Update status latch and set condition True --- | ||
|
|
||
| if specStream != "" { |
There was a problem hiding this comment.
This is the first condition function in this file that mutates a non-condition status field — all the others only call SetStatusCondition. The latch makes sense here because it has to be set atomically with the validation passing, but it's a departure from the pattern. A one-line comment explaining why would help future readers, e.g.:
// Latch must be set here, not in the reconcile body, because transition guards
// depend on it being updated only after validation passes.| return i.ImageStream.Name | ||
| } | ||
|
|
||
| // AvailableOSImageStreams returns the OS image streams available in this release payload. |
There was a problem hiding this comment.
This is a version heuristic, not actual payload introspection — it doesn't look at StreamMetadata or image tags. That's fine for TechPreview, but worth a TODO so it's not forgotten:
// TODO(CNTRLPLANE-3023): Replace version heuristic with payload metadata
// introspection when release images carry OS stream manifests.Also, consider defining "rhel-9" and "rhel-10" as constants in nodepool_types.go (like ArchitectureAMD64 = "amd64" and UpgradeTypeReplace). They're used here and in conditions.go — centralizing them reduces the chance of typos and makes the hardcoded downgrade check more maintainable.
|
|
||
| // NodePoolValidOSImageStreamConditionType signals if the osImageStream requested in the | ||
| // NodePool spec is valid. This covers two classes of validation: | ||
| // 1. Transition guards (controller-level complement to CEL): prevents removing the field |
There was a problem hiding this comment.
If we want to enforce this "prevents removing the field", this needs to be implemented via CEL
| // 2. Payload validation: verifies that the specified stream is available in the NodePool's | ||
| // release payload (e.g., rhel-10 is only available in 5.0+ payloads). | ||
| // A failure here requires the user to change the osImageStream field to a valid value. | ||
| NodePoolValidOSImageStreamConditionType = "ValidOSImageStream" |
There was a problem hiding this comment.
this is derailing from the enhancement, if something needs adjustment please create a PR against the enhacement
There was a problem hiding this comment.
Right. I'm removing it.
Add a FeatureGateAwareXValidation CEL rule on NodePoolSpec that prevents removing the osImageStream field once set, closing the two-step bypass for optional immutable fields on feature-gated types. Add an envtest case covering the removal scenario. Also add OSImageStreamRHEL9 and OSImageStreamRHEL10 constants for consistent use of stream name strings across the codebase. Co-Authored-By: Claude Opus 4.6 <[email protected]>
4496f29 to
a59f54d
Compare
|
@enxebre removed the erroneous condition definitions. Left only the CEL rule to prevent stream removal once set. |
|
@sdminonne: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre, sdminonne The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
/verified by unit-tests |
|
@sdminonne: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling tests matching the |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
Test Resultse2e-aws
e2e-aks
Failed TestsTotal failed tests: 3
|
|
Now I have a clear picture. Let me assemble the timeline and present the final analysis. The key findings are:
Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThis is an infrastructure/CI environment failure, not a code regression from PR #8719. The cluster was configured with:
The bootstrap process timed out because the ingress controller's router deployment could not start (0/1 replicas available). The router pod requires a schedulable worker node, and the single c5n.metal worker node did not become ready in time. Failure chain:
Why c5n.metal was slow/failed to join: Bare metal instances (c5n.metal) in AWS have significantly longer provisioning times compared to standard virtualized instances. Combined with the full RHCOS image download, ignition processing, and MachineConfig application on a bare-metal instance, the single worker exceeded the 45-minute bootstrap timeout. PR correlation: NONE. The PR adds CEL validation rules to NodePool CRD schemas. These changes:
Recommendations
Evidence
|
|
/retest-required |
Summary
FeatureGateAwareXValidationCEL rule onNodePoolSpecthat prevents removing theosImageStreamfield once setOSImageStreamRHEL9andOSImageStreamRHEL10constants for consistent stream name usageDescription
Optional immutable fields on feature-gated types are subject to a two-step bypass: a user can (1) remove the field, then (2) re-add it with a different value. The existing field-level CEL rule on
osImageStreamprevents single-step downgrades (rhel-10 → rhel-9), but a field-level transition rule (oldSelf/self) does not fire when the field is removed entirely — the validator requires botholdSelfandselfto be present.This change adds a parent-level CEL rule on
NodePoolSpecusing+openshift:validation:FeatureGateAwareXValidation(gated onOSStreams) that rejects any update that removesosImageStreamonce it has been set:The
FeatureGateAwareXValidationmarker is used instead of+kubebuilder:validation:XValidationbecause theosImageStreamfield is stripped from the CRD schema in the Default variant (where the feature gate is disabled). A regular XValidation rule referencingosImageStreamwould fail at CRD installation time in that variant.Test plan
make update && make verifypasses clean🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Enhancements