Skip to content

Commit adee6ae

Browse files
authored
Merge pull request #8731 from mosbahmajed/workitem-86530
AB#5508: Update aks-memory-saturation-after-upgrade.md
2 parents 2b3bfe0 + 4ee662a commit adee6ae

1 file changed

Lines changed: 39 additions & 3 deletions

File tree

support/azure/azure-kubernetes/create-upgrade-delete/aks-memory-saturation-after-upgrade.md

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: Memory saturation occurs after upgrade to Kubernetes 1.25
33
description: Resolve pod failures caused by memory saturation and out-of-memory errors after you upgrade an Azure Kubernetes Service (AKS) cluster to Kubernetes 1.25.x.
4-
ms.date: 06/14/2023
4+
ms.date: 05/06/2025
55
editor: v-jsitser
6-
ms.reviewer: aritraghosh, cssakscic, v-leedennis
6+
ms.reviewer: aritraghosh, cssakscic, v-leedennis, momajed
77
ms.service: azure-kubernetes-service
88
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
99
---
@@ -34,13 +34,49 @@ Performance degradation can occur in apps that run in the following environments
3434
3535
## Solution
3636

37+
> [!NOTE]
38+
> If you only experience increased memory usage and no other symptoms that are mentioned in the [Symptoms](#symptoms) section, no action is needed.
39+
3740
Beginning in the release of Kubernetes 1.25, the [cgroup version 2 API](https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/) has reached general availability (GA). AKS now uses Ubuntu Linux version 22.04. By default, version 22.04 uses cgroup version 2 API. To make sure the cgroup version 2 API is available for use in other environments to prevent the memory saturation issue, follow this guidance:
3841

3942
- If you run Java applications, [upgrade to a Java version that supports cgroup version 2](https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/#migrate-to-cgroup-v2) and follow the guidance in [Containerize your Java applications](/azure/developer/java/containers/overview). You might be able to update the base image in certain versions in which the fix has been backported. Use a version or framework that natively supports cgroup version 2. For Azure customers, Microsoft officially supports [Eclipse Temurin](https://adoptium.net/) binaries (Java 8) and [Microsoft Build of OpenJDK](https://www.microsoft.com/openjdk) binaries (Java 11+).
4043

4144
- Similarly, if you're using .NET, upgrade to [.NET version 5.0](https://devblogs.microsoft.com/dotnet/announcing-net-5-0/#containers) or a later version.
4245

43-
In addition, to enable pods to use more resources, increase their memory requests and limits.
46+
- If you see a higher eviction rate on the pods, [use higher limits and requests for the pods](/azure/aks/developer-best-practices-resource-management#define-pod-resource-requests-and-limits).
47+
48+
- `cgroup` v2 uses a different API than `cgroup` v1. If there are any applications that directly access the `cgroup` file system, update them to later versions that support `cgroup` v2. For example:
49+
50+
- **Third-party monitoring and security agents**:
51+
52+
Some monitoring and security agents depend on the `cgroup` file system. Update these agents to versions that support `cgroup` v2.
53+
54+
- **Java applications**:
55+
56+
Use versions that fully support `cgroup` v2:
57+
- OpenJDK/HotSpot: `jdk8u372`, `11.0.16`, `15`, and later versions.
58+
- IBM Semeru Runtimes: `8.0.382.0`, `11.0.20.0`, `17.0.8.0`, and later versions.
59+
- IBM Java: `8.0.8.6` and later versions.
60+
61+
- **uber-go/automaxprocs**:
62+
If you're using the `uber-go/automaxprocs` package, ensure the version is `v1.5.1` or later.
63+
64+
- An alternative temporary solution is to revert the `cgroup` version on your nodes by using the DaemonSet. For more information, see [Revert to cgroup v1 DaemonSet](https://github.com/Azure/AKS/blob/master/examples/cgroups/revert-cgroup-v1.yaml).
65+
66+
> [!IMPORTANT]
67+
> - Use the DaemonSet cautiously. Test it in a lower environment before applying to production to ensure compatibility and avoid disruptions.
68+
> - By default, the DaemonSet applies to all nodes in the cluster and reboots them to implement the `cgroup` change.
69+
> - To control how the DaemonSet is applied, configure a `nodeSelector` to target specific nodes.
70+
71+
## Status
72+
73+
Microsoft is working with the Kubernetes community to resolve the issue. Track progress at [Azure/AKS Issue #3443](https://github.com/kubernetes/kubernetes/issues/118916).
74+
75+
As part of the resolution, the plan is to adjust the eviction thresholds or update [resource reservations](/azure/aks/concepts-clusters-workloads#resource-reservations), depending on the outcome of the fix.
76+
77+
## Reference
78+
79+
- [Node memory usage on cgroupv2 reported higher than cgroupv1](https://github.com/kubernetes/kubernetes/issues/118916) (GitHub issue)
4480

4581
[!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
4682

0 commit comments

Comments
 (0)