MicrosoftDocs
diff --git a/‎.openpublishing.redirection.json‎
Lines changed: 328 additions & 0 deletions b/‎.openpublishing.redirection.json‎
Lines changed: 328 additions & 0 deletions
diff --git a/‎support/azure/.openpublishing.redirection.azure.json‎
Lines changed: 4 additions & 0 deletions b/‎support/azure/.openpublishing.redirection.azure.json‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎support/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state.md‎
Lines changed: 2 additions & 2 deletions b/‎support/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎support/azure/azure-kubernetes/availability-performance/node-not-ready-then-recovers.md‎
Lines changed: 10 additions & 5 deletions b/‎support/azure/azure-kubernetes/availability-performance/node-not-ready-then-recovers.md‎
Lines changed: 10 additions & 5 deletions
diff --git a/‎support/azure/azure-kubernetes/connectivity/cannot-access-cluster-api-server-using-authorized-ip-ranges.md‎
Lines changed: 5 additions & 3 deletions b/‎support/azure/azure-kubernetes/connectivity/cannot-access-cluster-api-server-using-authorized-ip-ranges.md‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎support/azure/azure-kubernetes/connectivity/error-from-server-error-dialing-backend-dial-tcp.md‎
Lines changed: 2 additions & 2 deletions b/‎support/azure/azure-kubernetes/connectivity/error-from-server-error-dialing-backend-dial-tcp.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md‎
Lines changed: 4 additions & 2 deletions b/‎support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml‎
Lines changed: 7 additions & 5 deletions b/‎support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml‎
Lines changed: 7 additions & 5 deletions
diff --git a/‎support/azure/azure-kubernetes/create-upgrade-delete/aks-increased-memory-usage-cgroup-v2.md‎
Lines changed: 31 additions & 5 deletions b/‎support/azure/azure-kubernetes/create-upgrade-delete/aks-increased-memory-usage-cgroup-v2.md‎
Lines changed: 31 additions & 5 deletions
diff --git a/‎support/azure/azure-kubernetes/create-upgrade-delete/error-code-aksrequeststhrottled.md‎
Lines changed: 50 additions & 0 deletions b/‎support/azure/azure-kubernetes/create-upgrade-delete/error-code-aksrequeststhrottled.md‎
Lines changed: 50 additions & 0 deletions
@@ -6299,6 +6299,10 @@
       {
         "source_path": "virtual-machines/linux/linux-vm-no-boot-hyper-v-driver-issues.md",
         "redirect_url": "/troubleshoot/azure/virtual-machines/linux/troubleshoot-lis-driver-issues-on-linux-vms"
+      },
+      {
+        "source_path": "azure-kubernetes/create-upgrade-delete/error-using-feature-requiring-virtual-machine-scale-set.md",
+        "redirect_url": "/troubleshoot/azure/azure-kubernetes/welcome-azure-kubernetes"
       }
   ]
 }
@@ -1,7 +1,7 @@
 ---
 title: Azure Kubernetes Service cluster/node is in a failed state
 description: Helps troubleshoot an issue where an Azure Kubernetes Service (AKS) cluster/node is in a failed state.
-ms.date: 04/01/2024
+ms.date: 03/10/2025
 ms.reviewer: chiragpa, nickoman, v-weizhu, v-six, aritraghosh
 ms.service: azure-kubernetes-service
 keywords:
@@ -114,7 +114,7 @@ If you prefer to use Azure CLI to view the activity log for a failed cluster, fo
 
 In the Azure portal, navigate to your AKS cluster resource and select **Diagnose and solve problems** from the left menu. You'll see a list of categories and scenarios that you can select to run diagnostic checks and get recommended solutions.
 
-In the Azure CLI, use the `az aks collect` command with the `--name` and `--resource-group` parameters to collect diagnostic data from your cluster nodes. You can also use the `--storage-account` and `--sas-token` parameters to specify an Azure Storage account where the data will be uploaded. The output will include a link to the **Diagnose and Solve Problems** blade where you can view the results and suggested actions.
+In the Azure CLI, use the `az aks kollect` command with the `--name` and `--resource-group` parameters to collect diagnostic data from your cluster nodes. You can also use the `--storage-account` and `--sas-token` parameters to specify an Azure Storage account where the data will be uploaded. The output will include a link to the **Diagnose and Solve Problems** blade where you can view the results and suggested actions.
 
 In the **Diagnose and Solve Problems** blade, you can select **Cluster Issues** as the category. If any issues are detected, you'll see a list of possible solutions that you can follow to fix them.
 
 
@@ -1,19 +1,19 @@
 ---
 title: Node not ready but then recovers
 description: Troubleshoot scenarios in which the status of an AKS cluster node is Node Not Ready, but then the node recovers.
-ms.date: 12/09/2024
-ms.reviewer: rissing, chiragpa, momajed, v-leedennis
+ms.date: 2/25/2024
+ms.reviewer: rissing, chiragpa, momajed, v-leedennis, novictor
 ms.service: azure-kubernetes-service
 #Customer intent: As an Azure Kubernetes user, I want to prevent the Node Not Ready status for nodes that later recover so that I can avoid future errors within an AKS cluster.
 ms.custom: sap:Node/node pool availability and performance
 ---
 # Troubleshoot Node Not Ready failures that are followed by recoveries
 
-This article provides a guide to troubleshoot and resolve "Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "Not Ready" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions. 
+This article provides a guide to troubleshoot and resolve Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "NotReady" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.  
 
 ## Cause
 
-There are several scenarios that could cause a "Not Ready" state to occur:
+There are several scenarios that could cause a "NotReady" state to occur:
 
 - The unavailability of the API server. This causes the readiness probe to fail. This prevents the pod from being attached to the service so that traffic is no longer forwarded to the pod instance.
 
@@ -24,7 +24,12 @@ There are several scenarios that could cause a "Not Ready" state to occur:
 
 ## Resolution
 
-Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
+To resolve this issue, follow these steps:
+
+1. Run `kubectl describe node <node-name>` to review detail information about the node's status. Look for any error messages or warnings that might indicate the root cause of the issue.
+2. Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
+3. Verify the node's network configuration to make sure that there are no connectivity issues.
+4. Check the node's resource usage, such as CPU, memory, and disk, to identify potential constraints. For more informations see [Monitor your Kubernetes cluster performance with Container insights](/azure/azure-monitor/containers/container-insights-analyze#view-performance-directly-from-a-cluster)
 
 For further steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).
 
 
@@ -1,8 +1,8 @@
 ---
 title: Can't access the cluster API server using authorized IP ranges
 description: Troubleshoot problems accessing the cluster API server when you use authorized IP address ranges in Azure Kubernetes Service (AKS).
-ms.date: 11/18/2024
-ms.reviewer: chiragpa, nickoman, v-leedennis
+ms.date: 03/26/2025
+ms.reviewer: chiragpa, nickoman, wonkilee, v-leedennis
 ms.service: azure-kubernetes-service
 keywords:
 #Customer intent: As an Azure Kubernetes user, I want to troubleshoot access issues to the cluster API server when I use authorized IP address ranges so that I can work with my Azure Kubernetes Service (AKS) cluster successfully.
@@ -14,7 +14,9 @@ This article discusses how to resolve a scenario in which you can't use authoriz
 
 ## Symptoms
 
-If you try to create or manage an AKS cluster, you can't access the cluster API server.
+If you try to create or manage resources in an AKS cluster, you can't access the cluster API server. When you run `kubectl`, you receive the following error message:
+
+> Unable to connect to the server: dial tcp x.x.x.x:443: i/o timeout
 
 ## Cause
 
 
@@ -1,8 +1,8 @@
 ---
 title: 'Error from server: error dialing backend: dial tcp'
 description: 'Troubleshoot the error dialing backend: dial tcp error that blocks you from using kubectl commands or other tools when you connect to the API server.'
-ms.date: 10/21/2024
-ms.reviewer: chiragpa, nickoman, v-leedennis, pihe
+ms.date: 03/05/2025
+ms.reviewer: chiragpa, nickoman, v-leedennis, pihe, mariusbutuc
 ms.service: azure-kubernetes-service
 keywords:
 #Customer intent: As an Azure Kubernetes user, I want to troubleshoot the "Error from server: error dialing backend: dial tcp" error so that I can connect to the API server or use the `kubectl logs` command to get logs.
 
@@ -1,8 +1,8 @@
 ---
 title: Tunnel connectivity issues
 description: Resolve communication issues that are related to tunnel connectivity in an Azure Kubernetes Service (AKS) cluster.
-ms.date: 09/26/2024
-ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu
+ms.date: 03/23/2025
+ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu, albarqaw
 ms.service: azure-kubernetes-service
 keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link
 #Customer intent: As an Azure Kubernetes user, I want to avoid tunnel connectivity issues so that I can use an Azure Kubernetes Service (AKS) cluster successfully.
@@ -29,6 +29,8 @@ You receive an error message that resembles the following examples about port 10
 
 > Error from server: error dialing backend: dial tcp \<aks-node-ip>:10250: i/o timeout
 
+> Error from server: Get "https\://\<aks-node-name>:10250/containerLogs/\<namespace>/\<pod-name>/\<container-name>": http: server gave HTTP response to HTTPS client
+
 The Kubernetes API server uses port 10250 to connect to a node's kubelet to retrieve the logs. If port 10250 is blocked, the kubectl logs and other features will only work for pods that run on the nodes in which the tunnel component is scheduled. For more information, see [Kubernetes ports and protocols: Worker nodes](https://kubernetes.io/docs/reference/ports-and-protocols/#node).
 
 Because the tunnel components or the connectivity between the server and client can't be established, functionality such as the following won't work as expected:
 
@@ -3,8 +3,8 @@ metadata:
   title: Azure Kubernetes Service (AKS) common issues FAQ
   description: Review a list of frequently asked questions (FAQ) about common issues when you're working with an Azure Kubernetes Service (AKS) cluster.
   ms.topic: faq
-  ms.date: 11/14/2023
-  ms.reviewer: chiragpa, nickoman, v-leedennis
+  ms.date: 03/06/2025
+  ms.reviewer: chiragpa, nickoman, jotavar, v-leedennis, v-weizhu
   ms.service: azure-kubernetes-service
   ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
 
@@ -26,8 +26,7 @@ sections:
       - question: |
           Can I move my cluster to a different subscription, or move my subscription with my cluster to a new tenant?
         answer: |
-          If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint.
-
+          No. If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint. For more information, see [Operations FAQ](/azure/aks/faq#operations).
       - question: |
           What naming restrictions are enforced for AKS resources and parameters?
         answer: |
@@ -42,7 +41,10 @@ sections:
           - AKS node pool names must be all lowercase. The names must be 1-12 characters in length for Linux node pools and 1-6 characters for Windows node pools. A name must start with a letter, and the only allowed characters are letters and numbers.
 
           - The *admin-username*, which sets the administrator user name for Linux nodes, must start with a letter. This user name may only contain letters, numbers, hyphens, and underscores. It has a maximum length of 32 characters.
-
+          
+          For more information about naming convention. see the following resources:
+           - [Naming rules and restrictions for Azure resources](/azure/azure-resource-manager/management/resource-name-rules#microsoftcontainerservice)
+           - [Abbreviation recommendations for Azure resources](/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations#containers)
 additionalContent: |
   [!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
   
 
@@ -1,8 +1,8 @@
 ---
 title: Increased memory usage reported in Kubernetes 1.25 or later versions
 description: Resolve an increase in memory usage that's reported after you upgrade an Azure Kubernetes Service (AKS) cluster to Kubernetes 1.25.x.
-ms.date: 07/13/2023
-editor: v-jsitser
+ms.date: 03/03/2025
+editor: momajed
 ms.reviewer: aritraghosh, cssakscic, v-leedennis
 ms.service: azure-kubernetes-service
 ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
@@ -23,23 +23,49 @@ You experience one or more of the following symptoms:
 
 ## Cause
 
-This increase is caused by a change in memory accounting within version 2 of the Linux control group (cgroup) API. [Cgroup v2](https://kubernetes.io/docs/concepts/architecture/cgroups/) is now the default cgroup version for Kubernetes 1.25 on AKS.
+This increase is caused by a change in memory accounting within version 2 of the Linux control group (`cgroup`) API. [Cgroup v2](https://kubernetes.io/docs/concepts/architecture/cgroups/) is now the default `cgroup` version for Kubernetes 1.25 on AKS.
 
 > [!NOTE]  
-> This issue is distinct from the memory saturation in nodes that's caused by applications or frameworks that aren't aware of cgroup v2. For more information, see [Memory saturation occurs in pods after cluster upgrade to Kubernetes 1.25](./aks-memory-saturation-after-upgrade.md).
+> This issue is distinct from the memory saturation in nodes that's caused by applications or frameworks that aren't aware of `cgroup` v2. For more information, see [Memory saturation occurs in pods after cluster upgrade to Kubernetes 1.25](./aks-memory-saturation-after-upgrade.md).
 
 ## Solution
 
 - If you observe frequent memory pressure on the nodes, upgrade your subscription to increase the amount of memory that's available to your virtual machines (VMs).
 
 - If you see a higher eviction rate on the pods, [use higher limits and requests for pods](/azure/aks/developer-best-practices-resource-management#define-pod-resource-requests-and-limits).
 
+- `cgroup` v2 uses a different API than `cgroup` v1. If there are any applications that directly access the `cgroup` file system, update them to later versions that support `cgroup` v2. For example:
+
+  - **Third-party monitoring and security agents**:
+
+     Some monitoring and security agents depend on the `cgroup` file system. Update these agents to versions that support `cgroup` v2.
+
+  - **Java applications**:
+
+     Use versions that fully support `cgroup` v2:
+    - OpenJDK/HotSpot: `jdk8u372`, `11.0.16`, `15`, and later versions.
+    - IBM Semeru Runtimes: `8.0.382.0`, `11.0.20.0`, `17.0.8.0`, and later versions.
+    - IBM Java: `8.0.8.6` and later versions.
+
+  - **uber-go/automaxprocs**:  
+    If you're using the `uber-go/automaxprocs` package, ensure the version is `v1.5.1` or later.
+
+- An alternative temporary solution is to revert the `cgroup` version on your nodes by using the DaemonSet. For more information, see [Revert to cgroup v1 DaemonSet](https://github.com/Azure/AKS/blob/master/examples/cgroups/revert-cgroup-v1.yaml).
+
+> [!IMPORTANT]
+> - Use the DaemonSet cautiously. Test it in a lower environment before applying to production to ensure compatibility and prevent disruptions.
+> - By default, the DaemonSet applies to all nodes in the cluster and reboots them to implement the `cgroup` change.  
+> - To control how the DaemonSet is applied, configure a `nodeSelector` to target specific nodes.
+
+
 > [!NOTE]  
 > If you experience only an increase in memory use without any of the other symptoms that are mentioned in the "Symptoms" section, you don't have to take any action.
 
 ## Status
 
-We're actively working with the Kubernetes community to fix the underlying issue, and we'll keep you updated on our progress. We also plan to change the eviction thresholds or [resource reservations](/azure/aks/concepts-clusters-workloads#resource-reservations), depending on the outcome of the fix.
+We're actively working with the Kubernetes community to resolve the underlying issue. Progress on this effort can be tracked at [Azure/AKS Issue #3443](https://github.com/kubernetes/kubernetes/issues/118916). 
+
+As part of the resolution, we plan to adjust the eviction thresholds or update [resource reservations](/azure/aks/concepts-clusters-workloads#resource-reservations), depending on the outcome of the fix.
 
 ## Reference
 
 
@@ -0,0 +1,50 @@
+---
+title: Troubleshoot the Throttled Error Code (429)
+description: Learn how to resolve the Throttled error (status 429) when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
+ms.date: 03/05/2025
+ms.reviewer: jovieir, chiragpa, v-weizhu
+ms.service: azure-kubernetes-service
+#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the Throttled error code (status 429) so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
+ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
+---
+# Troubleshoot the Throttled error code (429)
+
+This article discusses how to identify and resolve the `Throttled` error (status 429) that occurs when you try to create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.
+
+## Symptoms
+
+When you try to create an AKS cluster, you receive the following "The PutManagedClusterHandler.PUT request limit has been exceeded" error message that shows a "SubCode" value of **Throttled** and a "Status" value of **429**:
+
+> Category: ClientError;
+>
+> SubCode: Throttled;
+>
+> OrginalError: autorest/azure: Service returned an error. **Status=429**
+>
+> **Code="Throttled"**
+>
+> Message="> The PutManagedClusterHandler.PUT request limit has been exceeded for SubID='*\<subscription-id-guid>*', please retry again in X seconds. For more information, please visit aka.ms/aks/throttling";
+Request throttling can occur on various Azure components, so the error message might be different depending on the type of resource where this issue occurs.
+
+Resource provider throttling is independent of ARM throttling and is tailored to the operations of a specific resource provider. In this scenario, AKS resource provider throttling is specific to the AKS resource provider and applies only to operations related to AKS resources.
+
+## Cause
+
+AKS requests are throttled. For information about how AKS limits work and the specific limits per hour, see [Throttling limits on AKS resource provider APIs](/azure/aks/quotas-skus-regions#throttling-limits-on-aks-resource-provider-apis).
+
+## Solution
+
+To resolve this issue, examine and modify your access pattern of the throttled subscription. The following table lists the possible access patterns and corresponding solutions.
+
+| Access pattern | Solution |
+| -------------- | -------- |
+| Automated scripts constantly run LIST operations against managedCluster resources. | Run the scripts less frequently. |
+| Users attempt to deploy multiple AKS clusters in a short period of time. | Space out deployments or use different subscriptions.|
+| Users attempt to modify the same AKS cluster multiple times consecutively. | Space out operations. Ensure successful completion before initiating another one.|
+| Users attempt to add, modify, or delete one or more agentPools on the same AKS cluster. | Space out operations. Ensure successful completion before initiating another one. |
+
+## More information
+
+[General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)
+
+[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
Original file line number	Diff line number	Diff line change
`@@ -6299,6 +6299,10 @@`
`6299`	`6299`	`{`
`6300`	`6300`	`"source_path": "virtual-machines/linux/linux-vm-no-boot-hyper-v-driver-issues.md",`
`6301`	`6301`	`"redirect_url": "/troubleshoot/azure/virtual-machines/linux/troubleshoot-lis-driver-issues-on-linux-vms"`
	`6302`	`+ },`
	`6303`	`+ {`
	`6304`	`+ "source_path": "azure-kubernetes/create-upgrade-delete/error-using-feature-requiring-virtual-machine-scale-set.md",`
	`6305`	`+ "redirect_url": "/troubleshoot/azure/azure-kubernetes/welcome-azure-kubernetes"`
`6302`	`6306`	`}`
`6303`	`6307`	`]`
`6304`	`6308`	`}`