|
1 | 1 | --- |
2 | 2 | title: Memory saturation occurs after upgrade to Kubernetes 1.25 |
3 | 3 | description: Resolve pod failures caused by memory saturation and out-of-memory errors after you upgrade an Azure Kubernetes Service (AKS) cluster to Kubernetes 1.25.x. |
4 | | -ms.date: 06/14/2023 |
5 | | -editor: v-jsitser |
6 | | -ms.reviewer: aritraghosh, cssakscic, v-leedennis |
| 4 | +ms.date: 04/17/2025 |
| 5 | +editor: v-jsitser,momajed |
| 6 | +ms.reviewer: aritraghosh, cssakscic, v-leedennis,momajed |
7 | 7 | ms.service: azure-kubernetes-service |
8 | 8 | ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool) |
9 | 9 | --- |
@@ -40,7 +40,44 @@ Beginning in the release of Kubernetes 1.25, the [cgroup version 2 API](https:// |
40 | 40 |
|
41 | 41 | - Similarly, if you're using .NET, upgrade to [.NET version 5.0](https://devblogs.microsoft.com/dotnet/announcing-net-5-0/#containers) or a later version. |
42 | 42 |
|
43 | | -In addition, to enable pods to use more resources, increase their memory requests and limits. |
| 43 | +- If you see a higher eviction rate on the pods, [use higher limits and requests for pods](/azure/aks/developer-best-practices-resource-management#define-pod-resource-requests-and-limits). |
| 44 | + |
| 45 | +- `cgroup` v2 uses a different API than `cgroup` v1. If there are any applications that directly access the `cgroup` file system, update them to later versions that support `cgroup` v2. For example: |
| 46 | + |
| 47 | + - **Third-party monitoring and security agents**: |
| 48 | + |
| 49 | + Some monitoring and security agents depend on the `cgroup` file system. Update these agents to versions that support `cgroup` v2. |
| 50 | + |
| 51 | + - **Java applications**: |
| 52 | + |
| 53 | + Use versions that fully support `cgroup` v2: |
| 54 | + - OpenJDK/HotSpot: `jdk8u372`, `11.0.16`, `15`, and later versions. |
| 55 | + - IBM Semeru Runtimes: `8.0.382.0`, `11.0.20.0`, `17.0.8.0`, and later versions. |
| 56 | + - IBM Java: `8.0.8.6` and later versions. |
| 57 | + |
| 58 | + - **uber-go/automaxprocs**: |
| 59 | + If you're using the `uber-go/automaxprocs` package, ensure the version is `v1.5.1` or later. |
| 60 | + |
| 61 | +- An alternative temporary solution is to revert the `cgroup` version on your nodes by using the DaemonSet. For more information, see [Revert to cgroup v1 DaemonSet](https://github.com/Azure/AKS/blob/master/examples/cgroups/revert-cgroup-v1.yaml). |
| 62 | + |
| 63 | +> [!IMPORTANT] |
| 64 | +> - Use the DaemonSet cautiously. Test it in a lower environment before applying to production to ensure compatibility and prevent disruptions. |
| 65 | +> - By default, the DaemonSet applies to all nodes in the cluster and reboots them to implement the `cgroup` change. |
| 66 | +> - To control how the DaemonSet is applied, configure a `nodeSelector` to target specific nodes. |
| 67 | +
|
| 68 | + |
| 69 | +> [!NOTE] |
| 70 | +> If you experience only an increase in memory use without any of the other symptoms that are mentioned in the "Symptoms" section, you don't have to take any action. |
| 71 | +
|
| 72 | +## Status |
| 73 | + |
| 74 | +We're actively working with the Kubernetes community to resolve the underlying issue. Progress on this effort can be tracked at [Azure/AKS Issue #3443](https://github.com/kubernetes/kubernetes/issues/118916). |
| 75 | + |
| 76 | +As part of the resolution, we plan to adjust the eviction thresholds or update [resource reservations](/azure/aks/concepts-clusters-workloads#resource-reservations), depending on the outcome of the fix. |
| 77 | + |
| 78 | +## Reference |
| 79 | + |
| 80 | +- [Node memory usage on cgroupv2 reported higher than cgroupv1](https://github.com/kubernetes/kubernetes/issues/118916) (GitHub issue) |
44 | 81 |
|
45 | 82 | [!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)] |
46 | 83 |
|
|
0 commit comments