Skip to content

OCPBUGS-77307: Generate KubeVirt nmstate network config conditionally#8365

Open
qinqon wants to merge 3 commits into
openshift:mainfrom
qinqon:OCPBUGS-77307-kubevirt-conditional-nmstate
Open

OCPBUGS-77307: Generate KubeVirt nmstate network config conditionally#8365
qinqon wants to merge 3 commits into
openshift:mainfrom
qinqon:OCPBUGS-77307-kubevirt-conditional-nmstate

Conversation

@qinqon

@qinqon qinqon commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

When a HyperShift KubeVirt NodePool uses the default pod network (AttachDefaultNetwork=true, the default), the MCO templates unconditionally generate nmstate configuration files that disable IPv6 autoconf and set up the fe80::1 ARP proxy gateway route. This is correct for the default pod network where OVN-Kubernetes assigns IPv6 via DHCPv6 stateful.

However, when the NodePool uses multus as the primary network (AttachDefaultNetwork=false), these configurations break SLAAC and prevent nodes from getting IPv6 addresses in dual-stack setups.

This PR moves KubeVirt nmstate network configuration ownership from the MCO templates into the HyperShift nodepool controller, making it conditional:

  • Default pod network: Generates a MachineConfig (01-kubevirt-network) with the nmstate files that disable IPv6 autoconf and configure the ARP proxy gateway (same behavior as current MCO templates).
  • Multus primary network: Generates an override MachineConfig that replaces the MCO-rendered nmstate files with no-op content, allowing standard SLAAC to work.

The override approach ensures the fix works immediately even before the corresponding MCO cleanup PR (which removes the unconditional templates) is merged.

Depends-On:

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-77307

Special notes for your reviewer:

This PR is paired with an MCO PR that removes the unconditional kubevirt nmstate templates: the MCO PR depends on this one being merged first. Once both land:

  • This PR provides the nmstate config conditionally
  • The MCO PR removes the now-redundant templates and the GenerateNetworkOverrideMachineConfig function can be removed in a follow-up

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 29, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@qinqon: This pull request references Jira Issue OCPBUGS-77307, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

When a HyperShift KubeVirt NodePool uses the default pod network (AttachDefaultNetwork=true, the default), the MCO templates unconditionally generate nmstate configuration files that disable IPv6 autoconf and set up the fe80::1 ARP proxy gateway route. This is correct for the default pod network where OVN-Kubernetes assigns IPv6 via DHCPv6 stateful.

However, when the NodePool uses multus as the primary network (AttachDefaultNetwork=false), these configurations break SLAAC and prevent nodes from getting IPv6 addresses in dual-stack setups.

This PR moves KubeVirt nmstate network configuration ownership from the MCO templates into the HyperShift nodepool controller, making it conditional:

  • Default pod network: Generates a MachineConfig (01-kubevirt-network) with the nmstate files that disable IPv6 autoconf and configure the ARP proxy gateway (same behavior as current MCO templates).
  • Multus primary network: Generates an override MachineConfig that replaces the MCO-rendered nmstate files with no-op content, allowing standard SLAAC to work.

The override approach ensures the fix works immediately even before the corresponding MCO cleanup PR (which removes the unconditional templates) is merged.

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-77307

Special notes for your reviewer:

This PR is paired with an MCO PR that removes the unconditional kubevirt nmstate templates: the MCO PR depends on this one being merged first. Once both land:

  • This PR provides the nmstate config conditionally
  • The MCO PR removes the now-redundant templates and the GenerateNetworkOverrideMachineConfig function can be removed in a follow-up

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2026
@openshift-ci

openshift-ci Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The PR adds platform-specific MachineConfig generation for KubeVirt to the NodePool config assembly. generateMCORawConfig now calls getPlatformConfigs, which, for KubeVirt, invokes kubevirtPlatformConfig to produce either a network MachineConfig (when AttachDefaultNetwork is true or nil) embedding nmstate Ignition files, or an override MachineConfig that neutralizes MCO-rendered nmstate (when multus is primary). The generated MachineConfig YAML is wrapped into ConfigMap-like entries and appended before parsing; generation errors are propagated. Unit tests exercise generation, override behavior, and helper utilities.

Sequence Diagram(s)

sequenceDiagram
    participant NodePool
    participant ConfigGen as generateMCORawConfig
    participant PlatformGen as kubevirtPlatformConfig
    participant MachineConfig
    participant MCO

    NodePool->>ConfigGen: request raw MCO config
    ConfigGen->>PlatformGen: getPlatformConfigs(nodePool)
    alt Platform is KubeVirt and AttachDefaultNetwork true/nil
        PlatformGen->>PlatformGen: build Ignition with nmstate files
        PlatformGen->>MachineConfig: serialize MachineConfig (network)
    else Platform is KubeVirt and multus primary
        PlatformGen->>PlatformGen: build no-op nmstate override
        PlatformGen->>MachineConfig: serialize MachineConfig (override)
    else Other platform
        PlatformGen-->>ConfigGen: return empty
    end
    PlatformGen-->>ConfigGen: return ConfigMap/MachineConfig YAML
    ConfigGen->>MachineConfig: append platform config
    ConfigGen-->>NodePool: return combined raw config
    NodePool->>MCO: apply MachineConfig YAML
Loading
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The PR title directly reflects the main change: introducing conditional generation of KubeVirt nmstate network configuration based on the NodePool's primary network attachment setting.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in network_test.go are stable and deterministic, using static descriptive strings with no dynamic content, timestamps, UUIDs, or IP addresses.
Test Structure And Quality ✅ Passed The custom check is designed for Ginkgo test code, but the PR adds standard Go unit tests using table-driven testing with testing.T. The check is not applicable.
Microshift Test Compatibility ✅ Passed The pull request adds only standard Go unit tests, not Ginkgo e2e tests, so the custom check for new Ginkgo e2e tests is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds standard Go unit tests using testing package, not Ginkgo e2e tests. SNO compatibility check applies only to Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces no topology-aware scheduling constraints; changes limited to node-level MachineConfig objects with network configuration for KubeVirt platforms.
Ote Binary Stdout Contract ✅ Passed This pull request modifies controller code in HyperShift operator, not OTE binary code, with no process-level entries or stdout writes.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds unit tests (not Ginkgo e2e tests) to network_test.go using standard Go Test* naming conventions. Ginkgo e2e test check is not applicable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform and removed do-not-merge/needs-area labels Apr 29, 2026
@codecov

codecov Bot commented Apr 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.33333% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.94%. Comparing base (9b67f7b) to head (2885536).

Files with missing lines Patch % Lines
...-operator/controllers/nodepool/kubevirt/network.go 86.11% 10 Missing and 5 partials ⚠️
hypershift-operator/controllers/nodepool/config.go 75.00% 6 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8365      +/-   ##
==========================================
+ Coverage   45.84%   45.94%   +0.10%     
==========================================
  Files         440      441       +1     
  Lines       52824    52968     +144     
==========================================
+ Hits        24218    24338     +120     
- Misses      26816    26832      +16     
- Partials     1790     1798       +8     
Files with missing lines Coverage Δ
hypershift-operator/controllers/nodepool/config.go 84.04% <75.00%> (-1.48%) ⬇️
...-operator/controllers/nodepool/kubevirt/network.go 86.11% <86.11%> (ø)
Flag Coverage Δ
cpo-hostedcontrolplane 41.80% <ø> (ø)
cpo-other 41.39% <ø> (ø)
hypershift-operator 51.02% <83.33%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@qinqon qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from 10f4aad to 4cdc740 Compare April 29, 2026 08:36

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/kubevirt/network.go (1)

225-230: Inconsistent YAML serialization approach.

GenerateNetworkMachineConfig uses api.CompatibleYAMLEncode (line 111) while this function uses api.YamlSerializer.Encode directly. This inconsistency could lead to subtle differences in output format and potentially affect hash stability.

♻️ Suggested fix to use consistent serialization
-	buf := &bytes.Buffer{}
-	if err := api.YamlSerializer.Encode(mc, buf); err != nil {
+	encoded, err := api.CompatibleYAMLEncode(mc, api.YamlSerializer)
+	if err != nil {
 		return "", fmt.Errorf("failed to serialize kubevirt network override machine config: %w", err)
 	}
 
-	return buf.String(), nil
+	return string(encoded), nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/network.go` around lines
225 - 230, The serialization in this function uses api.YamlSerializer.Encode
directly which is inconsistent with GenerateNetworkMachineConfig that uses
api.CompatibleYAMLEncode; update this function to call api.CompatibleYAMLEncode
when encoding the machine config (mc) into the buffer (buf) so the output format
and hash stability match the other code path, and propagate any returned error
in the same manner as the existing fmt.Errorf wrapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/kubevirt/network.go`:
- Around line 225-230: The serialization in this function uses
api.YamlSerializer.Encode directly which is inconsistent with
GenerateNetworkMachineConfig that uses api.CompatibleYAMLEncode; update this
function to call api.CompatibleYAMLEncode when encoding the machine config (mc)
into the buffer (buf) so the output format and hash stability match the other
code path, and propagate any returned error in the same manner as the existing
fmt.Errorf wrapping.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 39e42667-f313-4562-94f9-4339d98c2375

📥 Commits

Reviewing files that changed from the base of the PR and between 60802b1 and 10f4aad.

📒 Files selected for processing (3)
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/kubevirt/network.go
  • hypershift-operator/controllers/nodepool/kubevirt/network_test.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/kubevirt/network.go (1)

225-228: Inconsistent YAML serialization method.

GenerateNetworkMachineConfig (line 111) uses api.CompatibleYAMLEncode(mc, api.YamlSerializer) while this function uses api.YamlSerializer.Encode(mc, buf) directly. Both functions generate the same object type (MachineConfig) and should use consistent serialization to ensure identical YAML formatting behavior.

♻️ Proposed fix for consistency
-	buf := &bytes.Buffer{}
-	if err := api.YamlSerializer.Encode(mc, buf); err != nil {
+	encoded, err := api.CompatibleYAMLEncode(mc, api.YamlSerializer)
+	if err != nil {
 		return "", fmt.Errorf("failed to serialize kubevirt network override machine config: %w", err)
 	}
 
-	return buf.String(), nil
+	return string(encoded), nil

After applying this change, the bytes import on line 4 can be removed if it's no longer used elsewhere in the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/network.go` around lines
225 - 228, The YAML serialization in this function is inconsistent with
GenerateNetworkMachineConfig: replace the manual bytes.Buffer +
api.YamlSerializer.Encode(mc, buf) pattern with the same helper call used
elsewhere — api.CompatibleYAMLEncode(mc, api.YamlSerializer) — so mc (the
MachineConfig) is encoded with the same formatting behavior; remove the
now-unused bytes import if it is no longer referenced after the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/kubevirt/network.go`:
- Around line 225-228: The YAML serialization in this function is inconsistent
with GenerateNetworkMachineConfig: replace the manual bytes.Buffer +
api.YamlSerializer.Encode(mc, buf) pattern with the same helper call used
elsewhere — api.CompatibleYAMLEncode(mc, api.YamlSerializer) — so mc (the
MachineConfig) is encoded with the same formatting behavior; remove the
now-unused bytes import if it is no longer referenced after the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 645f2beb-0f31-4789-bd8f-fe3ec5e1aa9e

📥 Commits

Reviewing files that changed from the base of the PR and between 10f4aad and 4cdc740.

📒 Files selected for processing (3)
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/kubevirt/network.go
  • hypershift-operator/controllers/nodepool/kubevirt/network_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • hypershift-operator/controllers/nodepool/kubevirt/network_test.go

@qinqon qinqon marked this pull request as ready for review April 29, 2026 08:44
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2026
@openshift-ci openshift-ci Bot requested review from enxebre and muraee April 29, 2026 08:45
@qinqon qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from 4cdc740 to efa02a6 Compare April 29, 2026 08:50

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
hypershift-operator/controllers/nodepool/kubevirt/network_test.go (1)

13-26: Decode the generated object structurally instead of scanning YAML lines.

This helper is tied to the current YAML/data-URL formatting, so harmless quoting or wrapping changes can fail the tests even when the MachineConfig is still valid. Parsing the YAML into MachineConfig, then decoding Spec.Config.Raw, would make these assertions much less brittle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/network_test.go` around
lines 13 - 26, The test helper decodeBase64Content is brittle because it scans
YAML lines; replace its implementation to parse the YAML into a MachineConfig
object and return the config payload from Spec.Config.Raw instead of
string-scanning. Specifically, in decodeBase64Content: unmarshal the config YAML
into the machineconfigv1.MachineConfig type (or a minimal struct with
Spec.Config as a runtime.RawExtension), then return string(mc.Spec.Config.Raw)
(or the Raw field) so the test reads the structured Spec.Config.Raw payload; add
the necessary imports for the MachineConfig type and YAML unmarshalling.
hypershift-operator/controllers/nodepool/kubevirt/network.go (1)

83-116: Extract the shared MachineConfig assembly path.

Lines 83-116 and Lines 198-231 duplicate the same ignition serialization, MachineConfig construction, label defaulting, and YAML encoding. Pulling that into one helper will keep the default and override branches from drifting.

Also applies to: 198-231

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/kubevirt/network.go` around lines 83
- 116, Duplicate logic that serializes an ignition config, constructs a
MachineConfig (including setting Name via kubevirtNetworkMachineConfigName),
calls ignition.SetMachineConfigLabels, sets Spec.Config.Raw, APIVersion and
Kind, and YAML-encodes it should be extracted into a single helper (e.g.,
buildKubevirtNetworkMachineConfig or encodeMachineConfigFromIgnition) that
accepts the ignition.Config or the serialized bytes and returns the encoded YAML
string (or error). Replace the duplicated blocks (the block using
serializeIgnitionConfig, mcfgv1.MachineConfig, ignition.SetMachineConfigLabels,
and api.CompatibleYAMLEncode) with calls to that helper in both places; ensure
the helper preserves setting mc.Spec.Config.Raw = serializedConfig,
mc.ObjectMeta.Name = kubevirtNetworkMachineConfigName, mc.APIVersion =
mcfgv1.SchemeGroupVersion.String(), mc.Kind = "MachineConfig", and forwards
errors from serializeIgnitionConfig and api.CompatibleYAMLEncode.
hypershift-operator/controllers/nodepool/config.go (1)

162-167: This also changes the rollout hash for default-network KubeVirt pools.

Because Line 121 and Line 129 hash cg.mcoRawConfig, appending a platform MachineConfig here will force a rollout for every KubeVirt NodePool, not just the multus-primary ones. If that churn is expected, it would be good to call it out in the upgrade plan/release notes; otherwise this needs version gating around the paired MCO change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/nodepool/config.go` around lines 162 - 167,
Appending platform-specific MachineConfigs unconditionally causes
cg.mcoRawConfig-based rollout hashes (see uses at cg.mcoRawConfig) to change for
all KubeVirt NodePools; restrict this so only multus-primary pools cause the
append or add version gating around the paired MCO change. Modify the code
around cg.getPlatformConfigs() and the call site where configs are appended so
you either (a) early-return or skip calling cg.getPlatformConfigs()/appending
platformConfigs unless the NodePool is the multus-primary type (check the
NodePool spec/labels), or (b) guard the append behind a feature/version flag
tied to the MCO rollout change, ensuring cg.mcoRawConfig is not mutated or
included in the rollout hash for default-network pools.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hypershift-operator/controllers/nodepool/kubevirt/network.go`:
- Around line 66-73: Add a nil guard at the start of exported helpers so they
return the neutral result instead of panicking; for example, in
GenerateNetworkMachineConfig check if nodePool == nil and immediately return "",
nil, and apply the same pattern to the other exported helper functions in this
file (the ones around lines 156-164 and 181-189) so they return their respective
neutral values (empty string or false) when nodePool is nil before dereferencing
nodePool.Spec.

---

Nitpick comments:
In `@hypershift-operator/controllers/nodepool/config.go`:
- Around line 162-167: Appending platform-specific MachineConfigs
unconditionally causes cg.mcoRawConfig-based rollout hashes (see uses at
cg.mcoRawConfig) to change for all KubeVirt NodePools; restrict this so only
multus-primary pools cause the append or add version gating around the paired
MCO change. Modify the code around cg.getPlatformConfigs() and the call site
where configs are appended so you either (a) early-return or skip calling
cg.getPlatformConfigs()/appending platformConfigs unless the NodePool is the
multus-primary type (check the NodePool spec/labels), or (b) guard the append
behind a feature/version flag tied to the MCO rollout change, ensuring
cg.mcoRawConfig is not mutated or included in the rollout hash for
default-network pools.

In `@hypershift-operator/controllers/nodepool/kubevirt/network_test.go`:
- Around line 13-26: The test helper decodeBase64Content is brittle because it
scans YAML lines; replace its implementation to parse the YAML into a
MachineConfig object and return the config payload from Spec.Config.Raw instead
of string-scanning. Specifically, in decodeBase64Content: unmarshal the config
YAML into the machineconfigv1.MachineConfig type (or a minimal struct with
Spec.Config as a runtime.RawExtension), then return string(mc.Spec.Config.Raw)
(or the Raw field) so the test reads the structured Spec.Config.Raw payload; add
the necessary imports for the MachineConfig type and YAML unmarshalling.

In `@hypershift-operator/controllers/nodepool/kubevirt/network.go`:
- Around line 83-116: Duplicate logic that serializes an ignition config,
constructs a MachineConfig (including setting Name via
kubevirtNetworkMachineConfigName), calls ignition.SetMachineConfigLabels, sets
Spec.Config.Raw, APIVersion and Kind, and YAML-encodes it should be extracted
into a single helper (e.g., buildKubevirtNetworkMachineConfig or
encodeMachineConfigFromIgnition) that accepts the ignition.Config or the
serialized bytes and returns the encoded YAML string (or error). Replace the
duplicated blocks (the block using serializeIgnitionConfig,
mcfgv1.MachineConfig, ignition.SetMachineConfigLabels, and
api.CompatibleYAMLEncode) with calls to that helper in both places; ensure the
helper preserves setting mc.Spec.Config.Raw = serializedConfig,
mc.ObjectMeta.Name = kubevirtNetworkMachineConfigName, mc.APIVersion =
mcfgv1.SchemeGroupVersion.String(), mc.Kind = "MachineConfig", and forwards
errors from serializeIgnitionConfig and api.CompatibleYAMLEncode.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 79da125b-65ec-4797-9598-6989684bd0d1

📥 Commits

Reviewing files that changed from the base of the PR and between 4cdc740 and efa02a6.

📒 Files selected for processing (3)
  • hypershift-operator/controllers/nodepool/config.go
  • hypershift-operator/controllers/nodepool/kubevirt/network.go
  • hypershift-operator/controllers/nodepool/kubevirt/network_test.go

Comment thread hypershift-operator/controllers/nodepool/kubevirt/network.go
@qinqon qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from f46757b to f3f31bf Compare April 29, 2026 09:15
@openshift-ci openshift-ci Bot added the area/api Indicates the PR includes changes for the API label Apr 29, 2026
@openshift-ci

openshift-ci Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: qinqon
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:
  • OWNERS [qinqon]

    Need more approvers for rest parts.

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@qinqon qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch 2 times, most recently from d40cfff to aa28671 Compare April 29, 2026 09:58
@openshift-ci openshift-ci Bot added the area/testing Indicates the PR includes changes for e2e testing label Apr 29, 2026
@qinqon qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from b82d83d to e103f0e Compare April 29, 2026 11:34
qinqon and others added 3 commits May 29, 2026 11:32
…onally

When a HyperShift KubeVirt NodePool uses the default pod network
(AttachDefaultNetwork=true, the default), generate a MachineConfig
with nmstate files that disable IPv6 autoconf and set the fe80::1
ARP proxy gateway route. This is required because OVN-Kubernetes
assigns IPv6 via DHCPv6 stateful, not SLAAC.

When the NodePool uses multus as the primary network
(AttachDefaultNetwork=false), generate an override MachineConfig
that replaces the MCO-rendered nmstate files with no-op content,
allowing standard network auto-configuration (SLAAC) to work.

This fixes dual-stack HCP on KubeVirt clusters using multus where
nodes were configured with ipv6.method=dhcp instead of
ipv6.method=auto, causing SLAAC to fail and nodes not getting
IPv6 addresses.

Signed-off-by: Enrique Llorente <[email protected]>
Co-Authored-By: Claude Opus 4 (claude-opus-4-6) <[email protected]>
…net tests

Extend KubeVirtAdvancedMultinetTest and KubeVirtMultinetTest to verify
that nmstate network configuration is conditionally applied based on
the AttachDefaultNetwork setting.

When AttachDefaultNetwork=false (multus primary network), a privileged
DaemonSet checks via nmstatectl that autoconf: false is NOT present,
confirming the pod-network-specific nmstate config is not applied.

When the default network is attached (normal cluster), the DaemonSet
verifies that autoconf: false IS present, confirming the nmstate
network configuration is correctly applied.

Both tests reuse existing e2e infrastructure: CorrelateDaemonSet for
node targeting and eventuallyDaemonSetRollsOut for readiness waiting.

Commit-Message-Assisted-by: Claude (via Claude Code)
Signed-off-by: Enrique Llorente <[email protected]>
Co-Authored-By: Claude Opus 4 (claude-opus-4-6) <[email protected]>
…tack

On KubeVirt, CNO requires worker nodes to probe the network MTU
before deploying its operands (ovnkube-control-plane,
network-node-identity, multus-admission-controller). Without at
least one worker node, these deployments are never created, causing
the CNO RolloutComplete condition to stay False and
controlPlaneVersion to remain Partial indefinitely.

This is the same issue OpenStack already works around by setting
NodePoolReplicas=1. Apply the same workaround for KubeVirt.

Co-Authored-By: Claude Opus 4 (claude-opus-4-6) <[email protected]>
Signed-off-by: Enrique Llorente <[email protected]>
@qinqon qinqon force-pushed the OCPBUGS-77307-kubevirt-conditional-nmstate branch from 65c4090 to 2885536 Compare May 29, 2026 09:32
@qinqon

qinqon commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

/test e2e-kubevirt-aws-ovn

@qinqon

qinqon commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@qinqon

qinqon commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

/test e2e-kubevirt-aws-ovn

@openshift-ci

openshift-ci Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

@qinqon: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-kubevirt-aws-ovn 2885536 link false /test e2e-kubevirt-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci

Copy link
Copy Markdown

The PR's changes to nodepool_test.go only modify the constructor calls for NewKubeVirtMultinetTest and NewKubeVirtAdvancedMultinetTest (adding a hostedClusterClient parameter) and some platform handling. These changes do NOT affect TestNodePoolReplaceUpgrade or TestAdditionalTrustBundlePropagation.

Now I have enough evidence to produce the final report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

1) TestNodePool/HostedCluster0/Main/TestNodePoolReplaceUpgrade (2703.04s):
   Failed to wait for 1 nodes to become ready for NodePool
   e2e-clusters-sxk46/node-pool-9ld2t-test-replaceupgrade in 45m0s: context deadline exceeded
   observed **v1.Node collection invalid: expected 1 nodes, got 0

2) TestNodePool/HostedCluster2/Main/TestAdditionalTrustBundlePropagation/AdditionalTrustBundlePropagationTest (1210.05s):
   Failed to wait for NodePool e2e-clusters-sm2sq/node-pool-b7x68-test-additional-trust-bundle-propagation
   to stop updating in 20m0s: context deadline exceeded
   wanted UpdatingConfig=False, got UpdatingConfig=True
   AllMachinesReady=False: Unschedulable (Insufficient memory, Insufficient devices.kubevirt.io/kvm)

Summary

Two pre-existing KubeVirt platform flaky tests failed due to infrastructure resource constraints — neither failure is related to the PR #8365 changes. The PR modifies KubeVirt nmstate network config generation (config.go, kubevirt/network.go) and the multinet e2e tests (nodepool_kv_multinet_test.go, nodepool_kv_advanced_multinet_test.go). All tests directly exercising the PR's code changes — KubeVirtNodeMultinetTest (1265s, PASSED) and KubeVirtNodeAdvancedMultinetTest (PASSED) — passed successfully. The two failing tests (TestNodePoolReplaceUpgrade in nodepool_upgrade_test.go and TestAdditionalTrustBundlePropagation in nodepool_additionalTrustBundlePropagation_test.go) reside in files not touched by this PR and failed due to KubeVirt VM scheduling issues (insufficient memory and KVM devices on management cluster nodes).

Root Cause

Both failures stem from KubeVirt VM scheduling resource exhaustion on the management cluster, not from any code change in PR #8365:

Failure 1 — TestNodePoolReplaceUpgrade: During a replace upgrade, the test creates a new NodePool (node-pool-9ld2t-test-replaceupgrade) and waits 45 minutes for 1 node to become ready. The node never materialized — 0 of 1 expected nodes appeared. The AllMachinesReady condition showed VMNotReady, indicating the replacement KubeVirt VM could not be scheduled or provisioned on the management cluster. This is a known flaky pattern on resource-constrained KubeVirt clusters where multiple hosted clusters run in parallel (5 hosted clusters were active simultaneously in this run: TestCreateCluster, TestAutoscaling, and 3 TestNodePool HostedClusters).

Failure 2 — TestAdditionalTrustBundlePropagation: After updating the hosted cluster with an additional trust bundle, the NodePool entered UpdatingConfig=True and never completed the config rollout within 20 minutes. The root cause is explicit in the conditions: AllMachinesReady=False: Unschedulable: 0/4 nodes are available: 1 Insufficient memory, 3 Insufficient devices.kubevirt.io/kvm. The replacement VM created during the config update could not be scheduled because the management cluster had exhausted both memory and KVM device capacity. One machine (8klrvn) was stuck at WaitingForNodeRef (provisioned but no node registered), while another (c8gtsq) was stuck at WaitingForBootstrapData (not even able to begin provisioning).

Why these are unrelated to PR #8365:

  • TestNodePoolReplaceUpgrade lives in nodepool_upgrade_test.go — not modified by this PR
  • TestAdditionalTrustBundlePropagation lives in nodepool_additionalTrustBundlePropagation_test.go — not modified by this PR
  • The PR's changes to nodepool_test.go only add a hostedClusterClient parameter to NewKubeVirtMultinetTest and NewKubeVirtAdvancedMultinetTest constructors — completely unrelated code paths
  • All tests directly exercising the PR's code (KubeVirtNodeMultinetTest, KubeVirtNodeAdvancedMultinetTest) passed
Recommendations
  1. Re-trigger the job — these are infrastructure-level flakes caused by resource exhaustion, not code regressions. A retry on a less-loaded cluster should pass.
  2. Safe to merge — the PR's functional changes are validated by the passing KubeVirtNodeMultinetTest and KubeVirtNodeAdvancedMultinetTest tests. The two failures are in completely separate test files and code paths.
  3. Consider filing a flake issue for TestNodePoolReplaceUpgrade on KubeVirt — the 45-minute timeout for replace upgrades is borderline when the management cluster runs 5 hosted clusters concurrently with KVM resource contention.
Evidence
Evidence Detail
Failure 1 test TestNodePool/HostedCluster0/Main/TestNodePoolReplaceUpgrade (2703s)
Failure 1 file nodepool_upgrade_test.goNOT modified by PR #8365
Failure 1 error Failed to wait for 1 nodes to become ready in 45m0s: expected 1 nodes, got 0
Failure 1 cause KubeVirt VM stuck in VMNotReady — could not schedule on management cluster
Failure 2 test TestNodePool/HostedCluster2/Main/TestAdditionalTrustBundlePropagation/AdditionalTrustBundlePropagationTest (1210s)
Failure 2 file nodepool_additionalTrustBundlePropagation_test.goNOT modified by PR #8365
Failure 2 error UpdatingConfig=True stuck, AllMachinesReady=False: Unschedulable
Failure 2 cause 0/4 nodes available: 1 Insufficient memory, 3 Insufficient devices.kubevirt.io/kvm
PR-related tests KubeVirtNodeMultinetTest ✅ PASSED (1265s), KubeVirtNodeAdvancedMultinetTest ✅ PASSED
PR changed files config.go, config_test.go, kubevirt/network.go, kubevirt/network_test.go, nodepool_kv_multinet_test.go, nodepool_kv_advanced_multinet_test.go, nodepool_test.go
Concurrent load 5 hosted clusters active simultaneously (TestCreateCluster, TestAutoscaling, 3× TestNodePool)
Cluster resource Management cluster had only 4 nodes, exhausted KVM device and memory capacity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants