Skip to content

CNF-23565: Dedicate CPU resources for DPDK-based vSwitch/vRouter#2001

Open
Tal-or wants to merge 1 commit into
openshift:masterfrom
Tal-or:dedicate_cpus_for_dpdk_vswitch
Open

CNF-23565: Dedicate CPU resources for DPDK-based vSwitch/vRouter#2001
Tal-or wants to merge 1 commit into
openshift:masterfrom
Tal-or:dedicate_cpus_for_dpdk_vswitch

Conversation

@Tal-or
Copy link
Copy Markdown
Contributor

@Tal-or Tal-or commented May 7, 2026

Adds enhancement proposal for dedicating CPUs exclusively for infrastructure networking workloads (OVS-DPDK, OpenPErouter). Introduces two new PerformanceProfile API fields: spec.cpu.dedicated and spec.net.disableOvsDynamicPinning.

Tracking: CNF-22582, RFE-8921

AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak [email protected]

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 7, 2026

@Tal-or: This pull request references CNF-23565 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Adds enhancement proposal for dedicating CPUs exclusively for infrastructure networking workloads (OVS-DPDK, OpenPErouter). Introduces two new PerformanceProfile API fields: spec.cpu.dedicated and spec.net.disableOvsDynamicPinning.

Tracking: CNF-22582, RFE-8921

AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from Miciah and jmguzik May 7, 2026 11:39
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lmzuccarelli for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Tal-or
Copy link
Copy Markdown
Contributor Author

Tal-or commented May 7, 2026

@Tal-or
Copy link
Copy Markdown
Contributor Author

Tal-or commented May 7, 2026

@JM1

…Router

Adds enhancement proposal for dedicating CPUs exclusively for infrastructure
networking workloads (OVS-DPDK, OpenPErouter). Introduces two new PerformanceProfile
API fields: spec.cpu.dedicated and spec.net.disableOvsDynamicPinning.

Tracking: CNF-22582, RFE-8921

AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0

Signed-off-by: Talor Itzhak <[email protected]>
@Tal-or Tal-or force-pushed the dedicate_cpus_for_dpdk_vswitch branch from 192566c to c941e7f Compare May 7, 2026 13:07
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

@Tal-or: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

scheduling (all QoS classes), OS daemons, and kernel housekeeping.
- Automatically ban dedicated CPUs from irqbalance and configure `isolcpus=domain,managed_irq`
to prevent hardware interrupts and kernel scheduler interference on dedicated CPUs.
- Provide the ability to disable OVN-Kubernetes dynamic OVS thread pinning when static CPU
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reword this.

OVS dynamic pinning and OVS-DPDK can coexist, it is just that the user probably does not need two high performance networking stacks. So this option is no related to dedicated cpus per-se.

And disabling dynamic pinning by default would prevent alternative use cases where e.g. industrial controller apps will run on dedicated cpus, but network will still use classic OVS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVS dynamic pinning and OVS-DPDK can coexist, it is just that the user probably does not need two high performance networking stacks. So this option is no related to dedicated cpus per-se.

But having OVS's systemd services affine to the dedicated CPUs, might impact performance if OVS-DPDK is being used (unless the OVS networking stack shut completely)

I agree about the rest

these, Burstable and BestEffort QoS pods can still be scheduled on dedicated CPUs through
kernel cpuset inheritance, breaking the isolation guarantee.

**Note:** Validation webhook enforcement of this prerequisite is deferred to a future iteration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PerformanceProfile could check the infrastructure mode and report an error when dedicated is used and WP is not present.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(better than a hook I think)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, and checking kubelet restrict-reserved option should be possible as well, because we own kubeletconfig.

Are you aware of a case where PP applied but kubeletconfig managed by different component?

Copy link
Copy Markdown
Contributor

@jmencak jmencak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. Have a couple of questions to improve my understanding of the problem and found a few nits.

- Ensure the feature is orthogonal to existing dynamic OVS pinning — both modes must be able to
coexist in the cluster, with the choice made per PerformanceProfile.
- Integrate with TuneD so that dedicated CPUs are added to `isolcpus` and receive the same
kernel-level isolation as existing isolated CPUs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as existing isolated CPUs

I believe I understand what is meant here, but I wonder if it would be clearer if we said something like:

kernel-level isolation as isolated CPU sets.

This would make it clear we talk about the existing reserved vs. isolated PerformanceProfile API functionality -- mentioned in this enhancement above.

## Proposal

This proposal introduces two new fields to the PerformanceProfile API and corresponding changes
to the node-tuning-operator controllers that generate Kubelet configuration, TuneD profiles, and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TuneD profiles

TuneD (daemon) profiles or Tuned resources (tuneds.tuned.openshift.io) or both?

Also:
Nit: s/node-tuning-operator/Node Tuning Operator/g

threads. Reserved CPU 0 and its sibling 4 handle system daemons. The remaining CPUs
(2-3, 6-7) are isolated for application workloads.

3. The node-tuning-operator reconciles the PerformanceProfile and generates:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
s/node-tuning-operator/Node Tuning Operator/g

There are several more occurrences of this throughout this enhancement. Please replace all apart from links of course.


### Topology Considerations

#### Hypershift / Hosted Control Planes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
s/Hypershift/HyperShift/g

The TuneD profile updates the systemd CPU affinity mask to exclude dedicated CPUs. This is done
via the `[sysctl]` or `[systemd]` TuneD plugin, similar to how the existing `cpu-partitioning`
TuneD profile confines system services to housekeeping CPUs
(see [tuned cpu-partitioning profile](https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf#L28)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, however, we're also setting systemd.cpu_affinity kernel command-line parameter. FWIR, this was introduced to help reduce "early timers". It might be worth a mention so that we have more complete information here.


### Non-Goals

- Managing the lifecycle of OVS-DPDK processes themselves (PMD thread creation, DPDK EAL
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is a non-goal, however, I'd like to understand how the OVS-DPDK processes run in OpenShift. So, they're not managed by kubelet at all? Do they run as regular userspace processes outside of OpenShift control?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants