CNF-23565: Dedicate CPU resources for DPDK-based vSwitch/vRouter#2001
CNF-23565: Dedicate CPU resources for DPDK-based vSwitch/vRouter#2001Tal-or wants to merge 1 commit into
Conversation
|
@Tal-or: This pull request references CNF-23565 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…Router Adds enhancement proposal for dedicating CPUs exclusively for infrastructure networking workloads (OVS-DPDK, OpenPErouter). Introduces two new PerformanceProfile API fields: spec.cpu.dedicated and spec.net.disableOvsDynamicPinning. Tracking: CNF-22582, RFE-8921 AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0 Signed-off-by: Talor Itzhak <[email protected]>
192566c to
c941e7f
Compare
|
@Tal-or: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| scheduling (all QoS classes), OS daemons, and kernel housekeeping. | ||
| - Automatically ban dedicated CPUs from irqbalance and configure `isolcpus=domain,managed_irq` | ||
| to prevent hardware interrupts and kernel scheduler interference on dedicated CPUs. | ||
| - Provide the ability to disable OVN-Kubernetes dynamic OVS thread pinning when static CPU |
There was a problem hiding this comment.
I would reword this.
OVS dynamic pinning and OVS-DPDK can coexist, it is just that the user probably does not need two high performance networking stacks. So this option is no related to dedicated cpus per-se.
And disabling dynamic pinning by default would prevent alternative use cases where e.g. industrial controller apps will run on dedicated cpus, but network will still use classic OVS.
There was a problem hiding this comment.
OVS dynamic pinning and OVS-DPDK can coexist, it is just that the user probably does not need two high performance networking stacks. So this option is no related to dedicated cpus per-se.
But having OVS's systemd services affine to the dedicated CPUs, might impact performance if OVS-DPDK is being used (unless the OVS networking stack shut completely)
I agree about the rest
| these, Burstable and BestEffort QoS pods can still be scheduled on dedicated CPUs through | ||
| kernel cpuset inheritance, breaking the isolation guarantee. | ||
|
|
||
| **Note:** Validation webhook enforcement of this prerequisite is deferred to a future iteration |
There was a problem hiding this comment.
The PerformanceProfile could check the infrastructure mode and report an error when dedicated is used and WP is not present.
There was a problem hiding this comment.
(better than a hook I think)
There was a problem hiding this comment.
hmm, and checking kubelet restrict-reserved option should be possible as well, because we own kubeletconfig.
Are you aware of a case where PP applied but kubeletconfig managed by different component?
jmencak
left a comment
There was a problem hiding this comment.
Looks good to me overall. Have a couple of questions to improve my understanding of the problem and found a few nits.
| - Ensure the feature is orthogonal to existing dynamic OVS pinning — both modes must be able to | ||
| coexist in the cluster, with the choice made per PerformanceProfile. | ||
| - Integrate with TuneD so that dedicated CPUs are added to `isolcpus` and receive the same | ||
| kernel-level isolation as existing isolated CPUs. |
There was a problem hiding this comment.
as existing isolated CPUs
I believe I understand what is meant here, but I wonder if it would be clearer if we said something like:
kernel-level isolation as isolated CPU sets.
This would make it clear we talk about the existing reserved vs. isolated PerformanceProfile API functionality -- mentioned in this enhancement above.
| ## Proposal | ||
|
|
||
| This proposal introduces two new fields to the PerformanceProfile API and corresponding changes | ||
| to the node-tuning-operator controllers that generate Kubelet configuration, TuneD profiles, and |
There was a problem hiding this comment.
TuneD profiles
TuneD (daemon) profiles or Tuned resources (tuneds.tuned.openshift.io) or both?
Also:
Nit: s/node-tuning-operator/Node Tuning Operator/g
| threads. Reserved CPU 0 and its sibling 4 handle system daemons. The remaining CPUs | ||
| (2-3, 6-7) are isolated for application workloads. | ||
|
|
||
| 3. The node-tuning-operator reconciles the PerformanceProfile and generates: |
There was a problem hiding this comment.
Nit:
s/node-tuning-operator/Node Tuning Operator/g
There are several more occurrences of this throughout this enhancement. Please replace all apart from links of course.
|
|
||
| ### Topology Considerations | ||
|
|
||
| #### Hypershift / Hosted Control Planes |
There was a problem hiding this comment.
Nit:
s/Hypershift/HyperShift/g
| The TuneD profile updates the systemd CPU affinity mask to exclude dedicated CPUs. This is done | ||
| via the `[sysctl]` or `[systemd]` TuneD plugin, similar to how the existing `cpu-partitioning` | ||
| TuneD profile confines system services to housekeeping CPUs | ||
| (see [tuned cpu-partitioning profile](https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf#L28) |
There was a problem hiding this comment.
This is true, however, we're also setting systemd.cpu_affinity kernel command-line parameter. FWIR, this was introduced to help reduce "early timers". It might be worth a mention so that we have more complete information here.
|
|
||
| ### Non-Goals | ||
|
|
||
| - Managing the lifecycle of OVS-DPDK processes themselves (PMD thread creation, DPDK EAL |
There was a problem hiding this comment.
I understand this is a non-goal, however, I'd like to understand how the OVS-DPDK processes run in OpenShift. So, they're not managed by kubelet at all? Do they run as regular userspace processes outside of OpenShift control?
Adds enhancement proposal for dedicating CPUs exclusively for infrastructure networking workloads (OVS-DPDK, OpenPErouter). Introduces two new PerformanceProfile API fields: spec.cpu.dedicated and spec.net.disableOvsDynamicPinning.
Tracking: CNF-22582, RFE-8921
AIA Human-AI blend, New content, Human-initiated, Reviewed, Claude Opus 4.6 v1.0
Signed-off-by: Talor Itzhak [email protected]