Merge pull request #8381 from MicrosoftDocs/main

Deland-Han · web-flow · commit 773bc29775e0 · 2025-03-06T10:00:06.000+08:00
Auto push to live 2025-03-05 18:00:02
diff --git a/support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml b/support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml
@@ -3,8 +3,8 @@ metadata:
   title: Azure Kubernetes Service (AKS) common issues FAQ
   description: Review a list of frequently asked questions (FAQ) about common issues when you're working with an Azure Kubernetes Service (AKS) cluster.
   ms.topic: faq
-  ms.date: 11/14/2023
-  ms.reviewer: chiragpa, nickoman, v-leedennis
+  ms.date: 03/06/2025
+  ms.reviewer: chiragpa, nickoman, jotavar, v-leedennis, v-weizhu
   ms.service: azure-kubernetes-service
   ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
 
@@ -26,8 +26,7 @@ sections:
       - question: |
           Can I move my cluster to a different subscription, or move my subscription with my cluster to a new tenant?
         answer: |
-          If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint.
-
+          No. If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint. For more information, see [Operations FAQ](/azure/aks/faq#operations).
       - question: |
           What naming restrictions are enforced for AKS resources and parameters?
         answer: |
@@ -42,7 +41,10 @@ sections:
           - AKS node pool names must be all lowercase. The names must be 1-12 characters in length for Linux node pools and 1-6 characters for Windows node pools. A name must start with a letter, and the only allowed characters are letters and numbers.
 
           - The *admin-username*, which sets the administrator user name for Linux nodes, must start with a letter. This user name may only contain letters, numbers, hyphens, and underscores. It has a maximum length of 32 characters.
-
+          
+          For more information about naming convention. see the following resources:
+           - [Naming rules and restrictions for Azure resources](/azure/azure-resource-manager/management/resource-name-rules#microsoftcontainerservice)
+           - [Abbreviation recommendations for Azure resources](/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations#containers)
 additionalContent: |
   [!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
   
diff --git a/support/azure/azure-kubernetes/create-upgrade-delete/error-code-aksrequeststhrottled.md b/support/azure/azure-kubernetes/create-upgrade-delete/error-code-aksrequeststhrottled.md
@@ -0,0 +1,50 @@
+---
+title: Troubleshoot the Throttled Error Code (429)
+description: Learn how to resolve the Throttled error (status 429) when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
+ms.date: 03/05/2025
+ms.reviewer: jovieir, chiragpa, v-weizhu
+ms.service: azure-kubernetes-service
+#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the Throttled error code (status 429) so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
+ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
+---
+# Troubleshoot the Throttled error code (429)
+
+This article discusses how to identify and resolve the `Throttled` error (status 429) that occurs when you try to create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.
+
+## Symptoms
+
+When you try to create an AKS cluster, you receive the following "The PutManagedClusterHandler.PUT request limit has been exceeded" error message that shows a "SubCode" value of **Throttled** and a "Status" value of **429**:
+
+> Category: ClientError;
+>
+> SubCode: Throttled;
+>
+> OrginalError: autorest/azure: Service returned an error. **Status=429**
+>
+> **Code="Throttled"**
+>
+> Message="> The PutManagedClusterHandler.PUT request limit has been exceeded for SubID='*\<subscription-id-guid>*', please retry again in X seconds. For more information, please visit aka.ms/aks/throttling";
+Request throttling can occur on various Azure components, so the error message might be different depending on the type of resource where this issue occurs.
+
+Resource provider throttling is independent of ARM throttling and is tailored to the operations of a specific resource provider. In this scenario, AKS resource provider throttling is specific to the AKS resource provider and applies only to operations related to AKS resources.
+
+## Cause
+
+AKS requests are throttled. For information about how AKS limits work and the specific limits per hour, see [Throttling limits on AKS resource provider APIs](/azure/aks/quotas-skus-regions#throttling-limits-on-aks-resource-provider-apis).
+
+## Solution
+
+To resolve this issue, examine and modify your access pattern of the throttled subscription. The following table lists the possible access patterns and corresponding solutions.
+
+| Access pattern | Solution |
+| -------------- | -------- |
+| Automated scripts constantly run LIST operations against managedCluster resources. | Run the scripts less frequently. |
+| Users attempt to deploy multiple AKS clusters in a short period of time. | Space out deployments or use different subscriptions.|
+| Users attempt to modify the same AKS cluster multiple times consecutively. | Space out operations. Ensure successful completion before initiating another one.|
+| Users attempt to add, modify, or delete one or more agentPools on the same AKS cluster. | Space out operations. Ensure successful completion before initiating another one. |
+
+## More information
+
+[General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)
+
+[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
diff --git a/support/azure/azure-kubernetes/create-upgrade-delete/error-code-poddrainfailure.md b/support/azure/azure-kubernetes/create-upgrade-delete/error-code-poddrainfailure.md
@@ -1,9 +1,9 @@
 ---
 title: Troubleshoot UpgradeFailed errors due to eviction failures caused by PDBs
 description: Learn how to troubleshoot UpgradeFailed errors due to eviction failures caused by Pod Disruption Budgets when you try to upgrade an Azure Kubernetes Service cluster.
-ms.date: 02/23/2025
+ms.date: 03/06/2025
 editor: v-jsitser
-ms.reviewer: chiragpa, v-leedennis, v-weizhu
+ms.reviewer: chiragpa, jotavar, v-leedennis, v-weizhu
 ms.service: azure-kubernetes-service
 ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
 #Customer intent: As an Azure Kubernetes Services (AKS) user, I want to troubleshoot an Azure Kubernetes Service cluster upgrade that failed because of eviction failures caused by Pod Disruption Budgets so that I can upgrade the cluster successfully.
@@ -15,44 +15,98 @@ This article discusses how to identify and resolve UpgradeFailed errors due to e
 
 ## Prerequisites
 
-This article requires Azure CLI version 2.0.65 or a later version. To find the version number, run `az --version`. If you have to install or upgrade Azure CLI, see [How to install the Azure CLI](/cli/azure/install-azure-cli).
+This article requires Azure CLI version 2.67.0 or a later version. To find the version number, run `az --version`. If you have to install or upgrade Azure CLI, see [How to install the Azure CLI](/cli/azure/install-azure-cli).
 
 For more detailed information about the upgrade process, see the "Upgrade an AKS cluster" section in [Upgrade an Azure Kubernetes Service (AKS) cluster](/azure/aks/upgrade-cluster#upgrade-an-aks-cluster).
 
 ## Symptoms
 
-An AKS cluster upgrade operation fails with the following error message:
+An AKS cluster upgrade operation fails with one of the following error messages:
 
-> Code: UpgradeFailed  
-> Message: Drain node \<node-name> failed when evicting pod \<pod-name>. Eviction failed with Too many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See `http://aka.ms/aks/debugdrainfailures`. Original error: API call to Kubernetes API Server failed.
+- > (UpgradeFailed) Drain `node aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx` failed when evicting pod `<pod-name>` failed with Too Many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See https://aka.ms/aks/debugdrainfailures. Original error: Cannot evict pod as it would violate the pod's disruption budget.. PDB debug info: `<namespace>/<pod-name>` blocked by pdb `<pdb-name>` with 0 unready pods.
+
+- > Code: UpgradeFailed  
+  > Message: Drain node `aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx` failed when evicting pod `<pod-name>` failed with Too Many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See https://aka.ms/aks/debugdrainfailures. Original error: Cannot evict pod as it would violate the pod's disruption budget.. PDB debug info: `<namespace>/<pod-name>` blocked by pdb `<pdb-name>` with 0 unready pods.
 
 ## Cause
 
-This error might occur if a pod is protected by the Pod Disruption Budget (PDB) policy. In this situation, the pod resists being drained.
+This error might occur if a pod is protected by the Pod Disruption Budget (PDB) policy. In this situation, the pod resists being drained, and after several attempts, the upgrade operation fails, and the cluster/node pool falls into a `Failed` state.
+
+Check the PDB configuration: `ALLOWED DISRUPTIONS` value. The value should be `1` or greater. For more information, see [Plan for availability using pod disruption budgets](/azure/aks/operator-best-practices-scheduler#plan-for-availability-using-pod-disruption-budgets). For example, you can check the workload and its PDB as follows. You should observe the `ALLOWED DISRUPTIONS` column doesn't allow any disruption. If the `ALLOWED DISRUPTIONS` value is `0`, the pods aren't evicted and node drain fails during the upgrade process:
+
+```console
+$ kubectl get deployments.apps nginx
+NAME    READY   UP-TO-DATE   AVAILABLE   AGE
+nginx   2/2     2            2           62s
+
+$ kubectl get pod
+NAME                     READY   STATUS    RESTARTS   AGE
+nginx-7854ff8877-gbr4m   1/1     Running   0          68s
+nginx-7854ff8877-gnltd   1/1     Running   0          68s
+
+$ kubectl get pdb
+NAME        MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
+nginx-pdb   2               N/A               0                     24s
+
+```
 
-To test this situation, run `kubectl get pdb -A`, and then check the **Allowed Disruption** value. The value should be **1** or greater. For more information, see [Plan for availability using pod disruption budgets](/azure/aks/operator-best-practices-scheduler#plan-for-availability-using-pod-disruption-budgets).
+You can also check for any entries in Kubernetes events using the command `kubectl get events | grep -i drain`. A similar output shows the message "Eviction blocked by Too Many Requests (usually a pdb)":
+
+```console
+$ kubectl get events | grep -i drain
+LAST SEEN   TYPE      REASON                    OBJECT                                   MESSAGE
+(...)
+32m         Normal    Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Draining node: aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx
+2m57s       Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+12m         Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+32m         Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+32m         Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+31m         Warning   Drain                     node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx   Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
+```
 
-If the **Allowed Disruption** value is **0**, the node drain will fail during the upgrade process.
 
 To resolve this issue, use one of the following solutions.
 
 ## Solution 1: Enable pods to drain
 
 1. Adjust the PDB to enable pod draining. Generally, The allowed disruption is controlled by the `Min Available / Max unavailable` or `Running pods / Replicas` parameter. You can modify the `Min Available / Max unavailable` parameter at the PDB level or increase the number of `Running pods / Replicas` to push the Allowed Disruption value to **1** or greater.
-2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
+2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process triggers a reconciliation.
+
+   ```console
+   $ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
+   Are you sure you want to perform this operation? (y/N): y
+   Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
+   Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
+   ```
 
 ## Solution 2: Back up, delete, and redeploy the PDB
 
-1. Take a backup of the PDB `kubectl get pdb <pdb-name> -n <pdb-namespace> -o yaml > pdb_backup.yaml`, and then delete the PDB `kubectl delete pdb <pdb-name> -n /<pdb-namespace>`. After the upgrade is finished, you can redeploy the PDB `kubectl apply -f pdb_backup.yaml`.
-2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
+1. Take a backup of the PDB(s) using the command `kubectl get pdb <pdb-name> -n <pdb-namespace> -o yaml > pdb-name-backup.yaml`, and then delete the PDB using the command `kubectl delete pdb <pdb-name> -n <pdb-namespace>`. After the new upgrade attempt is finished, you can redeploy the PDB just applying the backup file using the command `kubectl apply -f pdb-name-backup.yaml`.
+2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process triggers a reconciliation.
+
+   ```console
+   $ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
+   Are you sure you want to perform this operation? (y/N): y
+   Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
+   Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
+   ```
 
-## Solution 3: Delete the pods that can't be drained
+## Solution 3: Delete the pods that can't be drained or scale the workload down to zero (0)
 
 1. Delete the pods that can't be drained.
    
    > [!NOTE]
-   > If the pods were created by a deployment or StatefulSet, they'll be controlled by a ReplicaSet. If that's the case, you might have to delete the deployment or StatefulSet. Before you do that, we recommend that you make a backup: `kubectl get <kubernetes-object> <name> -n <namespace> -o yaml > backup.yaml`.
+   > If the pods are created by a Deployment or StatefulSet, they'll be controlled by a ReplicaSet. If that's the case, you might have to delete or scale the workload replicas to zero (0) of the Deployment or StatefulSet. Before you do that, we recommend that you make a backup: `kubectl get <deployment.apps -or- statefulset.apps> <name> -n <namespace> -o yaml > backup.yaml`.
+
+2. To scale down, you can use `kubectl scale --replicas=0 <deployment.apps -or- statefulset.apps> <name> -n <namespace>` before the reconciliation
+
+3. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process triggers a reconciliation.
 
-2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
+   ```console
+   $ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
+   Are you sure you want to perform this operation? (y/N): y
+   Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
+   Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
+   ```
 
 [!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
diff --git a/support/azure/azure-kubernetes/toc.yml b/support/azure/azure-kubernetes/toc.yml
@@ -39,6 +39,8 @@
       href: create-upgrade-delete/error-code-serviceprincipalvalidationclienterror.md
     - name: SubscriptionRequestsThrottled error (429)
       href: create-upgrade-delete/error-code-subscriptionrequeststhrottled.md
+    - name: Throttled error (429)
+      href: create-upgrade-delete/error-code-aksrequeststhrottled.md
     - name: SubnetWithExternalResourcesCannotBeUsedByOtherResources error
       href: create-upgrade-delete/subnet-with-external-resources-cannot-be-used-by-other-resources.md
     - name: Troubleshoot the AKSCapacityError error code
@@ -370,4 +372,3 @@
       href: error-codes/vmextensionerror-vhdfilenotfound.md
   - name: UnsatisfiablePDB error
     href: error-codes/unsatisfiablepdb-error.md
-