OCPBUGS-77307: Generate KubeVirt nmstate network config conditionally by qinqon · Pull Request #8365 · openshift/hypershift

qinqon · 2026-04-29T08:26:34Z

What this PR does / why we need it:

When a HyperShift KubeVirt NodePool uses the default pod network (AttachDefaultNetwork=true, the default), the MCO templates unconditionally generate nmstate configuration files that disable IPv6 autoconf and set up the fe80::1 ARP proxy gateway route. This is correct for the default pod network where OVN-Kubernetes assigns IPv6 via DHCPv6 stateful.

However, when the NodePool uses multus as the primary network (AttachDefaultNetwork=false), these configurations break SLAAC and prevent nodes from getting IPv6 addresses in dual-stack setups.

This PR moves KubeVirt nmstate network configuration ownership from the MCO templates into the HyperShift nodepool controller, making it conditional:

Default pod network: Generates a MachineConfig (01-kubevirt-network) with the nmstate files that disable IPv6 autoconf and configure the ARP proxy gateway (same behavior as current MCO templates).
Multus primary network: Generates an override MachineConfig that replaces the MCO-rendered nmstate files with no-op content, allowing standard SLAAC to work.

The override approach ensures the fix works immediately even before the corresponding MCO cleanup PR (which removes the unconditional templates) is merged.

Depends-On:

fix(e2e): set NodePoolReplicas=1 for KubeVirt TestNodePool #8380

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-77307

Special notes for your reviewer:

This PR is paired with an MCO PR that removes the unconditional kubevirt nmstate templates: the MCO PR depends on this one being merged first. Once both land:

This PR provides the nmstate config conditionally
The MCO PR removes the now-redundant templates and the GenerateNetworkOverrideMachineConfig function can be removed in a follow-up

Checklist:

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests.

openshift-merge-bot · 2026-04-29T08:26:38Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci-robot · 2026-04-29T08:26:40Z

@qinqon: This pull request references Jira Issue OCPBUGS-77307, which is invalid:

expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

When a HyperShift KubeVirt NodePool uses the default pod network (AttachDefaultNetwork=true, the default), the MCO templates unconditionally generate nmstate configuration files that disable IPv6 autoconf and set up the fe80::1 ARP proxy gateway route. This is correct for the default pod network where OVN-Kubernetes assigns IPv6 via DHCPv6 stateful.

However, when the NodePool uses multus as the primary network (AttachDefaultNetwork=false), these configurations break SLAAC and prevent nodes from getting IPv6 addresses in dual-stack setups.

This PR moves KubeVirt nmstate network configuration ownership from the MCO templates into the HyperShift nodepool controller, making it conditional:

Default pod network: Generates a MachineConfig (01-kubevirt-network) with the nmstate files that disable IPv6 autoconf and configure the ARP proxy gateway (same behavior as current MCO templates).

Multus primary network: Generates an override MachineConfig that replaces the MCO-rendered nmstate files with no-op content, allowing standard SLAAC to work.

The override approach ensures the fix works immediately even before the corresponding MCO cleanup PR (which removes the unconditional templates) is merged.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-77307

Special notes for your reviewer:

This PR is paired with an MCO PR that removes the unconditional kubevirt nmstate templates: the MCO PR depends on this one being merged first. Once both land:

This PR provides the nmstate config conditionally

The MCO PR removes the now-redundant templates and the GenerateNetworkOverrideMachineConfig function can be removed in a follow-up

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-04-29T08:26:46Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-04-29T08:26:48Z

📝 Walkthrough

Walkthrough

The PR adds platform-specific MachineConfig generation for KubeVirt to the NodePool config assembly. generateMCORawConfig now calls getPlatformConfigs, which, for KubeVirt, invokes kubevirtPlatformConfig to produce either a network MachineConfig (when AttachDefaultNetwork is true or nil) embedding nmstate Ignition files, or an override MachineConfig that neutralizes MCO-rendered nmstate (when multus is primary). The generated MachineConfig YAML is wrapped into ConfigMap-like entries and appended before parsing; generation errors are propagated. Unit tests exercise generation, override behavior, and helper utilities.

Sequence Diagram(s)

sequenceDiagram
    participant NodePool
    participant ConfigGen as generateMCORawConfig
    participant PlatformGen as kubevirtPlatformConfig
    participant MachineConfig
    participant MCO

    NodePool->>ConfigGen: request raw MCO config
    ConfigGen->>PlatformGen: getPlatformConfigs(nodePool)
    alt Platform is KubeVirt and AttachDefaultNetwork true/nil
        PlatformGen->>PlatformGen: build Ignition with nmstate files
        PlatformGen->>MachineConfig: serialize MachineConfig (network)
    else Platform is KubeVirt and multus primary
        PlatformGen->>PlatformGen: build no-op nmstate override
        PlatformGen->>MachineConfig: serialize MachineConfig (override)
    else Other platform
        PlatformGen-->>ConfigGen: return empty
    end
    PlatformGen-->>ConfigGen: return ConfigMap/MachineConfig YAML
    ConfigGen->>MachineConfig: append platform config
    ConfigGen-->>NodePool: return combined raw config
    NodePool->>MCO: apply MachineConfig YAML

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (11 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title directly reflects the main change: introducing conditional generation of KubeVirt nmstate network configuration based on the NodePool's primary network attachment setting.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All test names in network_test.go are stable and deterministic, using static descriptive strings with no dynamic content, timestamps, UUIDs, or IP addresses.
Test Structure And Quality	✅ Passed	The custom check is designed for Ginkgo test code, but the PR adds standard Go unit tests using table-driven testing with testing.T. The check is not applicable.
Microshift Test Compatibility	✅ Passed	The pull request adds only standard Go unit tests, not Ginkgo e2e tests, so the custom check for new Ginkgo e2e tests is not applicable.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	PR adds standard Go unit tests using testing package, not Ginkgo e2e tests. SNO compatibility check applies only to Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility	✅ Passed	PR introduces no topology-aware scheduling constraints; changes limited to node-level MachineConfig objects with network configuration for KubeVirt platforms.
Ote Binary Stdout Contract	✅ Passed	This pull request modifies controller code in HyperShift operator, not OTE binary code, with no process-level entries or stdout writes.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	PR adds unit tests (not Ginkgo e2e tests) to network_test.go using standard Go Test* naming conventions. Ginkgo e2e test check is not applicable.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-29T08:30:15Z

Codecov Report

❌ Patch coverage is 83.33333% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.94%. Comparing base (9b67f7b) to head (2885536).

Files with missing lines	Patch %	Lines
...-operator/controllers/nodepool/kubevirt/network.go	86.11%	10 Missing and 5 partials ⚠️
hypershift-operator/controllers/nodepool/config.go	75.00%	6 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8365      +/-   ##
==========================================
+ Coverage   45.84%   45.94%   +0.10%     
==========================================
  Files         440      441       +1     
  Lines       52824    52968     +144     
==========================================
+ Hits        24218    24338     +120     
- Misses      26816    26832      +16     
- Partials     1790     1798       +8

Files with missing lines	Coverage Δ
hypershift-operator/controllers/nodepool/config.go	`84.04% <75.00%> (-1.48%)`	⬇️
...-operator/controllers/nodepool/kubevirt/network.go	`86.11% <86.11%> (ø)`

Flag	Coverage Δ
cpo-hostedcontrolplane	`41.80% <ø> (ø)`
cpo-other	`41.39% <ø> (ø)`
hypershift-operator	`51.02% <83.33%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

🧹 Nitpick comments (1)

hypershift-operator/controllers/nodepool/kubevirt/network.go (1)
225-230: Inconsistent YAML serialization approach.

GenerateNetworkMachineConfig uses api.CompatibleYAMLEncode (line 111) while this function uses api.YamlSerializer.Encode directly. This inconsistency could lead to subtle differences in output format and potentially affect hash stability.
♻️ Suggested fix to use consistent serialization
-	buf := &bytes.Buffer{}
-	if err := api.YamlSerializer.Encode(mc, buf); err != nil {
+	encoded, err := api.CompatibleYAMLEncode(mc, api.YamlSerializer)
+	if err != nil {
 		return "", fmt.Errorf("failed to serialize kubevirt network override machine config: %w", err)
 	}
 
-	return buf.String(), nil
+	return string(encoded), nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/network.go` around lines
225 - 230, The serialization in this function uses api.YamlSerializer.Encode
directly which is inconsistent with GenerateNetworkMachineConfig that uses
api.CompatibleYAMLEncode; update this function to call api.CompatibleYAMLEncode
when encoding the machine config (mc) into the buffer (buf) so the output format
and hash stability match the other code path, and propagate any returned error
in the same manner as the existing fmt.Errorf wrapping.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/kubevirt/network.go`:
- Around line 225-230: The serialization in this function uses
api.YamlSerializer.Encode directly which is inconsistent with
GenerateNetworkMachineConfig that uses api.CompatibleYAMLEncode; update this
function to call api.CompatibleYAMLEncode when encoding the machine config (mc)
into the buffer (buf) so the output format and hash stability match the other
code path, and propagate any returned error in the same manner as the existing
fmt.Errorf wrapping.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 39e42667-f313-4562-94f9-4339d98c2375

📥 Commits

Reviewing files that changed from the base of the PR and between 60802b1 and 10f4aad.

📒 Files selected for processing (3)

hypershift-operator/controllers/nodepool/config.go
hypershift-operator/controllers/nodepool/kubevirt/network.go
hypershift-operator/controllers/nodepool/kubevirt/network_test.go

coderabbitai

🧹 Nitpick comments (1)

hypershift-operator/controllers/nodepool/kubevirt/network.go (1)
225-228: Inconsistent YAML serialization method.

GenerateNetworkMachineConfig (line 111) uses api.CompatibleYAMLEncode(mc, api.YamlSerializer) while this function uses api.YamlSerializer.Encode(mc, buf) directly. Both functions generate the same object type (MachineConfig) and should use consistent serialization to ensure identical YAML formatting behavior.
♻️ Proposed fix for consistency
-	buf := &bytes.Buffer{}
-	if err := api.YamlSerializer.Encode(mc, buf); err != nil {
+	encoded, err := api.CompatibleYAMLEncode(mc, api.YamlSerializer)
+	if err != nil {
 		return "", fmt.Errorf("failed to serialize kubevirt network override machine config: %w", err)
 	}
 
-	return buf.String(), nil
+	return string(encoded), nil
After applying this change, the bytes import on line 4 can be removed if it's no longer used elsewhere in the file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/network.go` around lines
225 - 228, The YAML serialization in this function is inconsistent with
GenerateNetworkMachineConfig: replace the manual bytes.Buffer +
api.YamlSerializer.Encode(mc, buf) pattern with the same helper call used
elsewhere — api.CompatibleYAMLEncode(mc, api.YamlSerializer) — so mc (the
MachineConfig) is encoded with the same formatting behavior; remove the
now-unused bytes import if it is no longer referenced after the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/kubevirt/network.go`:
- Around line 225-228: The YAML serialization in this function is inconsistent
with GenerateNetworkMachineConfig: replace the manual bytes.Buffer +
api.YamlSerializer.Encode(mc, buf) pattern with the same helper call used
elsewhere — api.CompatibleYAMLEncode(mc, api.YamlSerializer) — so mc (the
MachineConfig) is encoded with the same formatting behavior; remove the
now-unused bytes import if it is no longer referenced after the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 645f2beb-0f31-4789-bd8f-fe3ec5e1aa9e

📥 Commits

Reviewing files that changed from the base of the PR and between 10f4aad and 4cdc740.

📒 Files selected for processing (3)

hypershift-operator/controllers/nodepool/config.go
hypershift-operator/controllers/nodepool/kubevirt/network.go
hypershift-operator/controllers/nodepool/kubevirt/network_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

hypershift-operator/controllers/nodepool/kubevirt/network_test.go

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

hypershift-operator/controllers/nodepool/kubevirt/network_test.go (1)
13-26: Decode the generated object structurally instead of scanning YAML lines.

This helper is tied to the current YAML/data-URL formatting, so harmless quoting or wrapping changes can fail the tests even when the MachineConfig is still valid. Parsing the YAML into MachineConfig, then decoding Spec.Config.Raw, would make these assertions much less brittle.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/network_test.go` around
lines 13 - 26, The test helper decodeBase64Content is brittle because it scans
YAML lines; replace its implementation to parse the YAML into a MachineConfig
object and return the config payload from Spec.Config.Raw instead of
string-scanning. Specifically, in decodeBase64Content: unmarshal the config YAML
into the machineconfigv1.MachineConfig type (or a minimal struct with
Spec.Config as a runtime.RawExtension), then return string(mc.Spec.Config.Raw)
(or the Raw field) so the test reads the structured Spec.Config.Raw payload; add
the necessary imports for the MachineConfig type and YAML unmarshalling.
hypershift-operator/controllers/nodepool/kubevirt/network.go (1)
83-116: Extract the shared MachineConfig assembly path.

Lines 83-116 and Lines 198-231 duplicate the same ignition serialization, MachineConfig construction, label defaulting, and YAML encoding. Pulling that into one helper will keep the default and override branches from drifting.

Also applies to: 198-231
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/network.go` around lines 83
- 116, Duplicate logic that serializes an ignition config, constructs a
MachineConfig (including setting Name via kubevirtNetworkMachineConfigName),
calls ignition.SetMachineConfigLabels, sets Spec.Config.Raw, APIVersion and
Kind, and YAML-encodes it should be extracted into a single helper (e.g.,
buildKubevirtNetworkMachineConfig or encodeMachineConfigFromIgnition) that
accepts the ignition.Config or the serialized bytes and returns the encoded YAML
string (or error). Replace the duplicated blocks (the block using
serializeIgnitionConfig, mcfgv1.MachineConfig, ignition.SetMachineConfigLabels,
and api.CompatibleYAMLEncode) with calls to that helper in both places; ensure
the helper preserves setting mc.Spec.Config.Raw = serializedConfig,
mc.ObjectMeta.Name = kubevirtNetworkMachineConfigName, mc.APIVersion =
mcfgv1.SchemeGroupVersion.String(), mc.Kind = "MachineConfig", and forwards
errors from serializeIgnitionConfig and api.CompatibleYAMLEncode.
hypershift-operator/controllers/nodepool/config.go (1)
162-167: This also changes the rollout hash for default-network KubeVirt pools.

Because Line 121 and Line 129 hash cg.mcoRawConfig, appending a platform MachineConfig here will force a rollout for every KubeVirt NodePool, not just the multus-primary ones. If that churn is expected, it would be good to call it out in the upgrade plan/release notes; otherwise this needs version gating around the paired MCO change.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/config.go` around lines 162 - 167,
Appending platform-specific MachineConfigs unconditionally causes
cg.mcoRawConfig-based rollout hashes (see uses at cg.mcoRawConfig) to change for
all KubeVirt NodePools; restrict this so only multus-primary pools cause the
append or add version gating around the paired MCO change. Modify the code
around cg.getPlatformConfigs() and the call site where configs are appended so
you either (a) early-return or skip calling cg.getPlatformConfigs()/appending
platformConfigs unless the NodePool is the multus-primary type (check the
NodePool spec/labels), or (b) guard the append behind a feature/version flag
tied to the MCO rollout change, ensuring cg.mcoRawConfig is not mutated or
included in the rollout hash for default-network pools.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hypershift-operator/controllers/nodepool/kubevirt/network.go`:
- Around line 66-73: Add a nil guard at the start of exported helpers so they
return the neutral result instead of panicking; for example, in
GenerateNetworkMachineConfig check if nodePool == nil and immediately return "",
nil, and apply the same pattern to the other exported helper functions in this
file (the ones around lines 156-164 and 181-189) so they return their respective
neutral values (empty string or false) when nodePool is nil before dereferencing
nodePool.Spec.

---

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/config.go`:
- Around line 162-167: Appending platform-specific MachineConfigs
unconditionally causes cg.mcoRawConfig-based rollout hashes (see uses at
cg.mcoRawConfig) to change for all KubeVirt NodePools; restrict this so only
multus-primary pools cause the append or add version gating around the paired
MCO change. Modify the code around cg.getPlatformConfigs() and the call site
where configs are appended so you either (a) early-return or skip calling
cg.getPlatformConfigs()/appending platformConfigs unless the NodePool is the
multus-primary type (check the NodePool spec/labels), or (b) guard the append
behind a feature/version flag tied to the MCO rollout change, ensuring
cg.mcoRawConfig is not mutated or included in the rollout hash for
default-network pools.

In `@hypershift-operator/controllers/nodepool/kubevirt/network_test.go`:
- Around line 13-26: The test helper decodeBase64Content is brittle because it
scans YAML lines; replace its implementation to parse the YAML into a
MachineConfig object and return the config payload from Spec.Config.Raw instead
of string-scanning. Specifically, in decodeBase64Content: unmarshal the config
YAML into the machineconfigv1.MachineConfig type (or a minimal struct with
Spec.Config as a runtime.RawExtension), then return string(mc.Spec.Config.Raw)
(or the Raw field) so the test reads the structured Spec.Config.Raw payload; add
the necessary imports for the MachineConfig type and YAML unmarshalling.

In `@hypershift-operator/controllers/nodepool/kubevirt/network.go`:
- Around line 83-116: Duplicate logic that serializes an ignition config,
constructs a MachineConfig (including setting Name via
kubevirtNetworkMachineConfigName), calls ignition.SetMachineConfigLabels, sets
Spec.Config.Raw, APIVersion and Kind, and YAML-encodes it should be extracted
into a single helper (e.g., buildKubevirtNetworkMachineConfig or
encodeMachineConfigFromIgnition) that accepts the ignition.Config or the
serialized bytes and returns the encoded YAML string (or error). Replace the
duplicated blocks (the block using serializeIgnitionConfig,
mcfgv1.MachineConfig, ignition.SetMachineConfigLabels, and
api.CompatibleYAMLEncode) with calls to that helper in both places; ensure the
helper preserves setting mc.Spec.Config.Raw = serializedConfig,
mc.ObjectMeta.Name = kubevirtNetworkMachineConfigName, mc.APIVersion =
mcfgv1.SchemeGroupVersion.String(), mc.Kind = "MachineConfig", and forwards
errors from serializeIgnitionConfig and api.CompatibleYAMLEncode.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 79da125b-65ec-4797-9598-6989684bd0d1

📥 Commits

Reviewing files that changed from the base of the PR and between 4cdc740 and efa02a6.

📒 Files selected for processing (3)

hypershift-operator/controllers/nodepool/config.go
hypershift-operator/controllers/nodepool/kubevirt/network.go
hypershift-operator/controllers/nodepool/kubevirt/network_test.go

openshift-ci · 2026-04-29T09:16:11Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: qinqon
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS [qinqon]

Need more approvers for rest parts.

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…onally When a HyperShift KubeVirt NodePool uses the default pod network (AttachDefaultNetwork=true, the default), generate a MachineConfig with nmstate files that disable IPv6 autoconf and set the fe80::1 ARP proxy gateway route. This is required because OVN-Kubernetes assigns IPv6 via DHCPv6 stateful, not SLAAC. When the NodePool uses multus as the primary network (AttachDefaultNetwork=false), generate an override MachineConfig that replaces the MCO-rendered nmstate files with no-op content, allowing standard network auto-configuration (SLAAC) to work. This fixes dual-stack HCP on KubeVirt clusters using multus where nodes were configured with ipv6.method=dhcp instead of ipv6.method=auto, causing SLAAC to fail and nodes not getting IPv6 addresses. Signed-off-by: Enrique Llorente <[email protected]> Co-Authored-By: Claude Opus 4 (claude-opus-4-6) <[email protected]>

…net tests Extend KubeVirtAdvancedMultinetTest and KubeVirtMultinetTest to verify that nmstate network configuration is conditionally applied based on the AttachDefaultNetwork setting. When AttachDefaultNetwork=false (multus primary network), a privileged DaemonSet checks via nmstatectl that autoconf: false is NOT present, confirming the pod-network-specific nmstate config is not applied. When the default network is attached (normal cluster), the DaemonSet verifies that autoconf: false IS present, confirming the nmstate network configuration is correctly applied. Both tests reuse existing e2e infrastructure: CorrelateDaemonSet for node targeting and eventuallyDaemonSetRollsOut for readiness waiting. Commit-Message-Assisted-by: Claude (via Claude Code) Signed-off-by: Enrique Llorente <[email protected]> Co-Authored-By: Claude Opus 4 (claude-opus-4-6) <[email protected]>

…tack On KubeVirt, CNO requires worker nodes to probe the network MTU before deploying its operands (ovnkube-control-plane, network-node-identity, multus-admission-controller). Without at least one worker node, these deployments are never created, causing the CNO RolloutComplete condition to stay False and controlPlaneVersion to remain Partial indefinitely. This is the same issue OpenStack already works around by setting NodePoolReplicas=1. Apply the same workaround for KubeVirt. Co-Authored-By: Claude Opus 4 (claude-opus-4-6) <[email protected]> Signed-off-by: Enrique Llorente <[email protected]>

qinqon · 2026-05-29T09:58:06Z

/test e2e-kubevirt-aws-ovn

qinqon · 2026-05-29T12:07:00Z

/retest

qinqon · 2026-05-29T14:16:35Z

/test e2e-kubevirt-aws-ovn

openshift-ci · 2026-05-29T17:56:45Z

@qinqon: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-kubevirt-aws-ovn	`2885536`	link	false	`/test e2e-kubevirt-aws-ovn`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

hypershift-jira-solve-ci · 2026-05-29T19:07:16Z

The PR's changes to nodepool_test.go only modify the constructor calls for NewKubeVirtMultinetTest and NewKubeVirtAdvancedMultinetTest (adding a hostedClusterClient parameter) and some platform handling. These changes do NOT affect TestNodePoolReplaceUpgrade or TestAdditionalTrustBundlePropagation.

Now I have enough evidence to produce the final report:

Test Failure Analysis Complete

Job Information

Prow Job: pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn
Build ID: 2060364759168978944
Target: e2e-kubevirt-aws-ovn
PR: OCPBUGS-77307: Generate KubeVirt nmstate network config conditionally #8365 (OCPBUGS-77307: Generate KubeVirt nmstate network config conditionally)
Result: 159 tests, 28 skipped, 8 failures (2 root test failures, rest are parent cascade)

Test Failure Analysis

Error

1) TestNodePool/HostedCluster0/Main/TestNodePoolReplaceUpgrade (2703.04s):
   Failed to wait for 1 nodes to become ready for NodePool
   e2e-clusters-sxk46/node-pool-9ld2t-test-replaceupgrade in 45m0s: context deadline exceeded
   observed **v1.Node collection invalid: expected 1 nodes, got 0

2) TestNodePool/HostedCluster2/Main/TestAdditionalTrustBundlePropagation/AdditionalTrustBundlePropagationTest (1210.05s):
   Failed to wait for NodePool e2e-clusters-sm2sq/node-pool-b7x68-test-additional-trust-bundle-propagation
   to stop updating in 20m0s: context deadline exceeded
   wanted UpdatingConfig=False, got UpdatingConfig=True
   AllMachinesReady=False: Unschedulable (Insufficient memory, Insufficient devices.kubevirt.io/kvm)

Summary

Two pre-existing KubeVirt platform flaky tests failed due to infrastructure resource constraints — neither failure is related to the PR #8365 changes. The PR modifies KubeVirt nmstate network config generation (config.go, kubevirt/network.go) and the multinet e2e tests (nodepool_kv_multinet_test.go, nodepool_kv_advanced_multinet_test.go). All tests directly exercising the PR's code changes — KubeVirtNodeMultinetTest (1265s, PASSED) and KubeVirtNodeAdvancedMultinetTest (PASSED) — passed successfully. The two failing tests (TestNodePoolReplaceUpgrade in nodepool_upgrade_test.go and TestAdditionalTrustBundlePropagation in nodepool_additionalTrustBundlePropagation_test.go) reside in files not touched by this PR and failed due to KubeVirt VM scheduling issues (insufficient memory and KVM devices on management cluster nodes).

Root Cause

Both failures stem from KubeVirt VM scheduling resource exhaustion on the management cluster, not from any code change in PR #8365:

Failure 1 — TestNodePoolReplaceUpgrade: During a replace upgrade, the test creates a new NodePool (node-pool-9ld2t-test-replaceupgrade) and waits 45 minutes for 1 node to become ready. The node never materialized — 0 of 1 expected nodes appeared. The AllMachinesReady condition showed VMNotReady, indicating the replacement KubeVirt VM could not be scheduled or provisioned on the management cluster. This is a known flaky pattern on resource-constrained KubeVirt clusters where multiple hosted clusters run in parallel (5 hosted clusters were active simultaneously in this run: TestCreateCluster, TestAutoscaling, and 3 TestNodePool HostedClusters).

Failure 2 — TestAdditionalTrustBundlePropagation: After updating the hosted cluster with an additional trust bundle, the NodePool entered UpdatingConfig=True and never completed the config rollout within 20 minutes. The root cause is explicit in the conditions: AllMachinesReady=False: Unschedulable: 0/4 nodes are available: 1 Insufficient memory, 3 Insufficient devices.kubevirt.io/kvm. The replacement VM created during the config update could not be scheduled because the management cluster had exhausted both memory and KVM device capacity. One machine (8klrvn) was stuck at WaitingForNodeRef (provisioned but no node registered), while another (c8gtsq) was stuck at WaitingForBootstrapData (not even able to begin provisioning).

Why these are unrelated to PR #8365:

TestNodePoolReplaceUpgrade lives in nodepool_upgrade_test.go — not modified by this PR
TestAdditionalTrustBundlePropagation lives in nodepool_additionalTrustBundlePropagation_test.go — not modified by this PR
The PR's changes to nodepool_test.go only add a hostedClusterClient parameter to NewKubeVirtMultinetTest and NewKubeVirtAdvancedMultinetTest constructors — completely unrelated code paths
All tests directly exercising the PR's code (KubeVirtNodeMultinetTest, KubeVirtNodeAdvancedMultinetTest) passed

Recommendations

Re-trigger the job — these are infrastructure-level flakes caused by resource exhaustion, not code regressions. A retry on a less-loaded cluster should pass.
Safe to merge — the PR's functional changes are validated by the passing KubeVirtNodeMultinetTest and KubeVirtNodeAdvancedMultinetTest tests. The two failures are in completely separate test files and code paths.
Consider filing a flake issue for TestNodePoolReplaceUpgrade on KubeVirt — the 45-minute timeout for replace upgrades is borderline when the management cluster runs 5 hosted clusters concurrently with KVM resource contention.

Evidence

Evidence	Detail
Failure 1 test	`TestNodePool/HostedCluster0/Main/TestNodePoolReplaceUpgrade` (2703s)
Failure 1 file	`nodepool_upgrade_test.go` — NOT modified by PR #8365
Failure 1 error	`Failed to wait for 1 nodes to become ready in 45m0s: expected 1 nodes, got 0`
Failure 1 cause	KubeVirt VM stuck in `VMNotReady` — could not schedule on management cluster
Failure 2 test	`TestNodePool/HostedCluster2/Main/TestAdditionalTrustBundlePropagation/AdditionalTrustBundlePropagationTest` (1210s)
Failure 2 file	`nodepool_additionalTrustBundlePropagation_test.go` — NOT modified by PR #8365
Failure 2 error	`UpdatingConfig=True stuck, AllMachinesReady=False: Unschedulable`
Failure 2 cause	`0/4 nodes available: 1 Insufficient memory, 3 Insufficient devices.kubevirt.io/kvm`
PR-related tests	`KubeVirtNodeMultinetTest` ✅ PASSED (1265s), `KubeVirtNodeAdvancedMultinetTest` ✅ PASSED
PR changed files	`config.go`, `config_test.go`, `kubevirt/network.go`, `kubevirt/network_test.go`, `nodepool_kv_multinet_test.go`, `nodepool_kv_advanced_multinet_test.go`, `nodepool_test.go`
Concurrent load	5 hosted clusters active simultaneously (TestCreateCluster, TestAutoscaling, 3× TestNodePool)
Cluster resource	Management cluster had only 4 nodes, exhausted KVM device and memory capacity

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 29, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2026

openshift-ci Bot added the do-not-merge/needs-area label Apr 29, 2026

qinqon mentioned this pull request Apr 29, 2026

NO-JIRA: Remove unconditional KubeVirt nmstate templates openshift/machine-config-operator#5893

Draft

openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform and removed do-not-merge/needs-area labels Apr 29, 2026

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from 10f4aad to 4cdc740 Compare April 29, 2026 08:36

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

qinqon marked this pull request as ready for review April 29, 2026 08:44

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2026

openshift-ci Bot requested review from enxebre and muraee April 29, 2026 08:45

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from 4cdc740 to efa02a6 Compare April 29, 2026 08:50

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread hypershift-operator/controllers/nodepool/kubevirt/network.go

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from f46757b to f3f31bf Compare April 29, 2026 09:15

openshift-ci Bot added the area/api Indicates the PR includes changes for the API label Apr 29, 2026

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch 2 times, most recently from d40cfff to aa28671 Compare April 29, 2026 09:58

qinqon mentioned this pull request Apr 29, 2026

Flaky CI: codecov/project #8367

Closed

openshift-ci Bot added the area/testing Indicates the PR includes changes for e2e testing label Apr 29, 2026

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from b82d83d to e103f0e Compare April 29, 2026 11:34

qinqon mentioned this pull request Apr 30, 2026

Flaky CI: codecov/project #8391

Open

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch 9 times, most recently from f1181d2 to 2d2cfd7 Compare May 4, 2026 11:17

qinqon mentioned this pull request May 5, 2026

Flaky CI: ci/prow/e2e-kubevirt-aws-ovn / TestNodePool/HostedCluster0/Main/KubeVirtCacheTest #8416

Closed

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch 2 times, most recently from 7b1c179 to aebf849 Compare May 6, 2026 15:34

This was referenced May 11, 2026

fix: detect and break review retry loops when agent finds no changes qinqon/oompa#164

Closed

feat: structured CI failure analysis with job info, root cause, and evidence sections qinqon/oompa#177

Closed

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from aebf849 to 1947eb9 Compare May 28, 2026 12:45

qinqon mentioned this pull request May 28, 2026

fix: CI dedup relies solely on comment markers, re-investigates when comments are deleted qinqon/oompa#200

Closed

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from 1947eb9 to 65c4090 Compare May 28, 2026 14:03

This was referenced May 28, 2026

NO-JIRA: feat(skills): add validate-pr-override-images skill #8616

Merged

CNTRLPLANE-2775: Expose KAS availability and latency metrics from the control-plane-operator #7749

Open

OCPBUGS-86661: Konnectivity retry proxy connection on timeout #8579

Open

qinqon and others added 3 commits May 29, 2026 11:32

qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from 65c4090 to 2885536 Compare May 29, 2026 09:32

Conversation

qinqon commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

openshift-merge-bot Bot commented Apr 29, 2026

Uh oh!

openshift-ci-robot commented Apr 29, 2026

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

openshift-ci Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci Bot commented Apr 29, 2026

Uh oh!

qinqon commented May 29, 2026

Uh oh!

qinqon commented May 29, 2026

Uh oh!

qinqon commented May 29, 2026

Uh oh!

openshift-ci Bot commented May 29, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented May 29, 2026

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qinqon commented Apr 29, 2026 •

edited

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

codecov Bot commented Apr 29, 2026 •

edited

Loading