OCPBUGS-87991: validate additionalNetworks name format in KubeVirt NodePools#8710
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-87991, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughIn 🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8710 +/- ##
=======================================
Coverage 41.66% 41.66%
=======================================
Files 758 758
Lines 93929 93929
=======================================
Hits 39135 39135
Misses 52046 52046
Partials 2748 2748
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go (1)
142-162: ⚡ Quick winConsider validating namespace and name parts against Kubernetes DNS subdomain rules.
The current validation checks that parts are non-empty after trimming, but doesn't enforce Kubernetes DNS subdomain naming conventions (RFC 1123). This means names like
"my ns/nad"(with space) or"my_ns/nad"(with underscore) pass validation but will cause VM startup failures when KubeVirt attempts to reference a non-existent NetworkAttachmentDefinition.Kubernetes resource names must be lowercase alphanumeric with
-and., starting and ending with alphanumeric. Adding a regex check like^[a-z0-9]([-a-z0-9]*[a-z0-9])?$for each part would catch these at admission time with clearer error messages.♻️ Example validation with DNS subdomain rules
+import ( + "regexp" +) + +var dnsSubdomainRegex = regexp.MustCompile(`^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`) + func ValidateAdditionalNetworks(networks []hyperv1.KubevirtNetwork) error { if len(networks) == 0 { return nil } seen := make(map[string]bool, len(networks)) for idx, network := range networks { parts := strings.Split(network.Name, "/") if len(parts) != 2 || strings.TrimSpace(parts[0]) == "" || strings.TrimSpace(parts[1]) == "" { return fmt.Errorf("additionalNetworks[%d].name %q must be in the format <namespace>/<name>", idx, network.Name) } + namespace := strings.TrimSpace(parts[0]) + name := strings.TrimSpace(parts[1]) + if !dnsSubdomainRegex.MatchString(namespace) { + return fmt.Errorf("additionalNetworks[%d].name %q has invalid namespace part %q (must be a valid DNS subdomain)", idx, network.Name, namespace) + } + if !dnsSubdomainRegex.MatchString(name) { + return fmt.Errorf("additionalNetworks[%d].name %q has invalid name part %q (must be a valid DNS subdomain)", idx, network.Name, name) + } if seen[network.Name] { return fmt.Errorf("additionalNetworks[%d].name %q is duplicated", idx, network.Name) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go` around lines 142 - 162, The ValidateAdditionalNetworks function currently only checks for non-empty namespace/name parts; update it to enforce Kubernetes DNS subdomain/name rules by validating each part (the namespace and the name from strings.Split(network.Name, "/")) against the RFC1123 regex (e.g. ^[a-z0-9]([-a-z0-9]*[a-z0-9])?$), and return clear formatted errors like "additionalNetworks[%d].name %q: namespace %q is invalid" or "…: name %q is invalid" when a part fails; keep the existing duplicate check and virtualMachineInterfaceName length check intact, and reference the same symbols (ValidateAdditionalNetworks, virtualMachineInterfaceName) so the change is localized to this function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/kubevirt/kubevirt.go`:
- Around line 142-162: The ValidateAdditionalNetworks function currently only
checks for non-empty namespace/name parts; update it to enforce Kubernetes DNS
subdomain/name rules by validating each part (the namespace and the name from
strings.Split(network.Name, "/")) against the RFC1123 regex (e.g.
^[a-z0-9]([-a-z0-9]*[a-z0-9])?$), and return clear formatted errors like
"additionalNetworks[%d].name %q: namespace %q is invalid" or "…: name %q is
invalid" when a part fails; keep the existing duplicate check and
virtualMachineInterfaceName length check intact, and reference the same symbols
(ValidateAdditionalNetworks, virtualMachineInterfaceName) so the change is
localized to this function.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 098416f3-4a27-47b2-8089-b95cf2b80aec
📒 Files selected for processing (3)
hypershift-operator/controllers/hostedcluster/hostedcluster_webhook.gohypershift-operator/controllers/nodepool/kubevirt/kubevirt.gohypershift-operator/controllers/nodepool/kubevirt/kubevirt_test.go
|
I now have all the information needed to produce the final report. The root cause is clear. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe gitlint check failed because the commit message title Root CauseThe commit message title uses a The openshift/hypershift repository enforces Conventional Commits via gitlint's built-in [general]
contrib=contrib-title-conventional-commits
[contrib-title-conventional-commits]
types = fix,feat,chore,docs,style,refactor,perf,test,revert,ci,buildThe CT1 rule requires the title to match the pattern This is purely a commit message formatting issue — it is not a code or test problem. RecommendationsAmend the commit message to use the Conventional Commits format. Since this PR adds validation logic (a new feature), the title should use the Alternatively, if this is considered a bug fix (enforcing a contract that should have existed), use To amend: Evidence
|
bryan-cox
left a comment
There was a problem hiding this comment.
Thanks for tackling this — the validation gap is real and the user experience improvement is clear. The checks themselves are the right ones to add.
However, this validation should live in the API types (CEL markers) rather than the webhook. Our project rule (.claude/rules/webhook-validation.md) is explicit:
"HyperShift uses CRD CEL validation rules instead of webhooks. The webhook exists only for KubeVirt platform-specific needs (defaulting and JSON patch annotation validation). Do not add new validation or defaulting logic here."
All three checks are expressible declaratively:
1. Format check — CEL regex on KubevirtNetwork.Name:
// +kubebuilder:validation:XValidation:rule="self.matches('^[^/]+/[^/]+$')",message="name must be in the format <namespace>/<name> to reference a multus network attachment definition"There's direct precedent for this pattern in our Azure subnet ID validation (api/hypershift/v1beta1/azure.go).
2. Uniqueness — use Kubernetes list map semantics on AdditionalNetworks:
// +listType=map
// +listMapKey=nameNo CEL needed. Schema-level enforcement with proper server-side-apply merge semantics.
3. Interface name length — tighten MaxLength instead of computing it in Go. The generated name is iface{N}_{name} where the prefix is at most 8 chars (iface20_ at MaxItems=20). So:
// +kubebuilder:validation:MaxLength=55This replaces the entire Go length check. It's slightly conservative but avoids coupling the API to the controller's internal virtualMachineInterfaceName() implementation.
What I'd suggest:
- Move format + uniqueness to API type markers (blocking — per project convention)
- Tighten
MaxLengthfrom 255 to 55 onName - Remove the webhook additions
- Keep the
PlatformValidation()call as a defense-in-depth safety net - Add envtest YAML test coverage for the new CEL rules (see
test/envtest/README.md)
Happy to help if you have questions about the CEL/envtest patterns — there are good examples to follow in the existing codebase.
|
Thanks @bryan-cox — you're right, this belongs in the API types, not the webhook. Redesigning from scratch. Here's the approach I'm planning — wanted to align before writing code, especially since you mentioned there are good examples to follow. Planned changes1. Format check — CEL regex on
|
|
1. Examples to follow for CEL/envtest patterns: For the envtest YAML test suites, the existing NodePool test suites are the best reference:
Create your new suite as For CEL marker patterns on the API types, For the Run 2. MaxLength 255 → 55 ratcheting: This is acceptable. Names >55 already fail at runtime due to the 63-char interface name limit, so you're moving the failure left to admission time. CRD ratcheting handles the transition — unchanged values pass through on update. The math checks out: worst-case prefix is 3. A few things to flag on the approach: CEL regex: Defense-in-depth: I'd push back on keeping Webhook cleanup: Make sure the removal includes reverting any changes to |
The additionalNetworks[].name field in KubeVirt NodePools requires Multus NAD
references in <namespace>/<name> format, but no validation enforces this. Invalid
names are silently accepted — NodePool reports ValidPlatformConfig=True — while
VMs fail to start with errors buried 4 layers deep in the hosted control plane
namespace (VM conditions set by virt-controller).
Add admission-time CEL validation on the KubevirtNetwork API type:
- Format: XValidation regex enforcing exactly one slash with non-empty,
non-whitespace namespace and name segments
- Uniqueness: listType=map with listMapKey=name for schema-level duplicate
rejection
- Length: MaxLength reduced from 255 to 55 to ensure the generated interface
name (iface{idx+1}_{name}, max prefix iface20_ at MaxItems=20) stays within
the 63-character Linux interface name limit
Fixes: https://redhat.atlassian.net/browse/OCPBUGS-87991
53adcee to
9580fed
Compare
|
Thanks @bryan-cox — redesigned from scratch based on your feedback. Force-pushed a single commit. What changedRemoved
Added (API types only)
TestsNew What was NOT keptPer your guidance: no defense-in-depth /cc @bryan-cox |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chdeshpa-hue The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/hypershift/v1beta1/kubevirt.go`:
- Around line 204-205: The XValidation rule introduced in
api/hypershift/v1beta1/kubevirt.go for validating the multus network attachment
definition reference format lacks corresponding envtest coverage. Following the
coding guidelines that require all API CEL validations to be covered with
envtests, add envtest cases to validate both valid and invalid inputs for the
format constraint that ensures the field matches the pattern for namespace/name
format. Refer to test/envtest/README.md for guidance on structuring these tests,
and ensure the tests verify that the validation rule correctly accepts properly
formatted namespace/name references and rejects improperly formatted ones.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 09a3bf35-1698-497b-8b4b-9654ce14e281
⛔ Files ignored due to path filters (9)
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/AAA_ungated.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/GCPPlatform.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OSStreams.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OpenStack.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**cmd/install/assets/crds/hypershift-operator/tests/nodepools.hypershift.openshift.io/stable.nodepools.kubevirt.testsuite.yamlis excluded by!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/kubevirt.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (1)
api/hypershift/v1beta1/kubevirt.go
| // +kubebuilder:validation:MaxLength=55 | ||
| // +kubebuilder:validation:XValidation:rule="self.matches('^[^/\\\\s]+/[^/\\\\s]+$')",message="name must be in the format <namespace>/<name> to reference a multus network attachment definition" |
There was a problem hiding this comment.
Add envtest coverage for the new CEL/list-map validation contract.
This introduces admission-time validation behavior (XValidation) and schema semantics (listType=map/listMapKey=name), but no corresponding envtest evidence is present in the provided changes.
As per coding guidelines, "All API CEL validations must be covered with envtests, see test/envtest/README.md for details."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/hypershift/v1beta1/kubevirt.go` around lines 204 - 205, The XValidation
rule introduced in api/hypershift/v1beta1/kubevirt.go for validating the multus
network attachment definition reference format lacks corresponding envtest
coverage. Following the coding guidelines that require all API CEL validations
to be covered with envtests, add envtest cases to validate both valid and
invalid inputs for the format constraint that ensures the field matches the
pattern for namespace/name format. Refer to test/envtest/README.md for guidance
on structuring these tests, and ensure the tests verify that the validation rule
correctly accepts properly formatted namespace/name references and rejects
improperly formatted ones.
Source: Coding guidelines
|
@chdeshpa-hue: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
KubeVirt NodePools accept invalid
additionalNetworks[].namevalues without any error at admission or reconcile time. The field requires Multus NAD references in<namespace>/<name>format, but no validation enforces this. Users who omit the namespace prefix get a HostedCluster that appears healthy (ValidPlatformConfig: True "All is well") while VMs silently fail to start with errors buried in the hosted control plane namespace.This PR adds
ValidateAdditionalNetworks()called from both:oc applytimePlatformValidation()→ reconcile-time safety netChecks added:
/with non-empty namespace and name segmentsProblem
Without this fix, the debugging path requires navigating 6 layers before finding the root cause (VM conditions in the HC namespace set by virt-controller). NodePool status actively misleads with "All is well". Time to root cause ranges from 30 minutes (expert) to impossible (app team without HC namespace RBAC).
User Experience After Fix
Testing
gofmtandgo vetcleanBackward Compatibility
additionalNetworksconfigured → validation skipped entirelyns/nameentries → pass all checks, zero behavior changeFixes: https://redhat.atlassian.net/browse/OCPBUGS-87991
Made with Cursor
Summary by CodeRabbit