Skip to content

Commit 773bc29

Browse files
authored
Merge pull request #8381 from MicrosoftDocs/main
Auto push to live 2025-03-05 18:00:02
2 parents 865e462 + ffd12bc commit 773bc29

4 files changed

Lines changed: 128 additions & 21 deletions

File tree

support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ metadata:
33
title: Azure Kubernetes Service (AKS) common issues FAQ
44
description: Review a list of frequently asked questions (FAQ) about common issues when you're working with an Azure Kubernetes Service (AKS) cluster.
55
ms.topic: faq
6-
ms.date: 11/14/2023
7-
ms.reviewer: chiragpa, nickoman, v-leedennis
6+
ms.date: 03/06/2025
7+
ms.reviewer: chiragpa, nickoman, jotavar, v-leedennis, v-weizhu
88
ms.service: azure-kubernetes-service
99
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
1010

@@ -26,8 +26,7 @@ sections:
2626
- question: |
2727
Can I move my cluster to a different subscription, or move my subscription with my cluster to a new tenant?
2828
answer: |
29-
If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint.
30-
29+
No. If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint. For more information, see [Operations FAQ](/azure/aks/faq#operations).
3130
- question: |
3231
What naming restrictions are enforced for AKS resources and parameters?
3332
answer: |
@@ -42,7 +41,10 @@ sections:
4241
- AKS node pool names must be all lowercase. The names must be 1-12 characters in length for Linux node pools and 1-6 characters for Windows node pools. A name must start with a letter, and the only allowed characters are letters and numbers.
4342
4443
- The *admin-username*, which sets the administrator user name for Linux nodes, must start with a letter. This user name may only contain letters, numbers, hyphens, and underscores. It has a maximum length of 32 characters.
45-
44+
45+
For more information about naming convention. see the following resources:
46+
- [Naming rules and restrictions for Azure resources](/azure/azure-resource-manager/management/resource-name-rules#microsoftcontainerservice)
47+
- [Abbreviation recommendations for Azure resources](/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations#containers)
4648
additionalContent: |
4749
[!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
4850
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Troubleshoot the Throttled Error Code (429)
3+
description: Learn how to resolve the Throttled error (status 429) when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
4+
ms.date: 03/05/2025
5+
ms.reviewer: jovieir, chiragpa, v-weizhu
6+
ms.service: azure-kubernetes-service
7+
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the Throttled error code (status 429) so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
8+
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
9+
---
10+
# Troubleshoot the Throttled error code (429)
11+
12+
This article discusses how to identify and resolve the `Throttled` error (status 429) that occurs when you try to create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.
13+
14+
## Symptoms
15+
16+
When you try to create an AKS cluster, you receive the following "The PutManagedClusterHandler.PUT request limit has been exceeded" error message that shows a "SubCode" value of **Throttled** and a "Status" value of **429**:
17+
18+
> Category: ClientError;
19+
>
20+
> SubCode: Throttled;
21+
>
22+
> OrginalError: autorest/azure: Service returned an error. **Status=429**
23+
>
24+
> **Code="Throttled"**
25+
>
26+
> Message="> The PutManagedClusterHandler.PUT request limit has been exceeded for SubID='*\<subscription-id-guid>*', please retry again in X seconds. For more information, please visit aka.ms/aks/throttling";
27+
Request throttling can occur on various Azure components, so the error message might be different depending on the type of resource where this issue occurs.
28+
29+
Resource provider throttling is independent of ARM throttling and is tailored to the operations of a specific resource provider. In this scenario, AKS resource provider throttling is specific to the AKS resource provider and applies only to operations related to AKS resources.
30+
31+
## Cause
32+
33+
AKS requests are throttled. For information about how AKS limits work and the specific limits per hour, see [Throttling limits on AKS resource provider APIs](/azure/aks/quotas-skus-regions#throttling-limits-on-aks-resource-provider-apis).
34+
35+
## Solution
36+
37+
To resolve this issue, examine and modify your access pattern of the throttled subscription. The following table lists the possible access patterns and corresponding solutions.
38+
39+
| Access pattern | Solution |
40+
| -------------- | -------- |
41+
| Automated scripts constantly run LIST operations against managedCluster resources. | Run the scripts less frequently. |
42+
| Users attempt to deploy multiple AKS clusters in a short period of time. | Space out deployments or use different subscriptions.|
43+
| Users attempt to modify the same AKS cluster multiple times consecutively. | Space out operations. Ensure successful completion before initiating another one.|
44+
| Users attempt to add, modify, or delete one or more agentPools on the same AKS cluster. | Space out operations. Ensure successful completion before initiating another one. |
45+
46+
## More information
47+
48+
[General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)
49+
50+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
Lines changed: 69 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: Troubleshoot UpgradeFailed errors due to eviction failures caused by PDBs
33
description: Learn how to troubleshoot UpgradeFailed errors due to eviction failures caused by Pod Disruption Budgets when you try to upgrade an Azure Kubernetes Service cluster.
4-
ms.date: 02/23/2025
4+
ms.date: 03/06/2025
55
editor: v-jsitser
6-
ms.reviewer: chiragpa, v-leedennis, v-weizhu
6+
ms.reviewer: chiragpa, jotavar, v-leedennis, v-weizhu
77
ms.service: azure-kubernetes-service
88
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
99
#Customer intent: As an Azure Kubernetes Services (AKS) user, I want to troubleshoot an Azure Kubernetes Service cluster upgrade that failed because of eviction failures caused by Pod Disruption Budgets so that I can upgrade the cluster successfully.
@@ -15,44 +15,98 @@ This article discusses how to identify and resolve UpgradeFailed errors due to e
1515

1616
## Prerequisites
1717

18-
This article requires Azure CLI version 2.0.65 or a later version. To find the version number, run `az --version`. If you have to install or upgrade Azure CLI, see [How to install the Azure CLI](/cli/azure/install-azure-cli).
18+
This article requires Azure CLI version 2.67.0 or a later version. To find the version number, run `az --version`. If you have to install or upgrade Azure CLI, see [How to install the Azure CLI](/cli/azure/install-azure-cli).
1919

2020
For more detailed information about the upgrade process, see the "Upgrade an AKS cluster" section in [Upgrade an Azure Kubernetes Service (AKS) cluster](/azure/aks/upgrade-cluster#upgrade-an-aks-cluster).
2121

2222
## Symptoms
2323

24-
An AKS cluster upgrade operation fails with the following error message:
24+
An AKS cluster upgrade operation fails with one of the following error messages:
2525

26-
> Code: UpgradeFailed
27-
> Message: Drain node \<node-name> failed when evicting pod \<pod-name>. Eviction failed with Too many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See `http://aka.ms/aks/debugdrainfailures`. Original error: API call to Kubernetes API Server failed.
26+
- > (UpgradeFailed) Drain `node aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx` failed when evicting pod `<pod-name>` failed with Too Many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See https://aka.ms/aks/debugdrainfailures. Original error: Cannot evict pod as it would violate the pod's disruption budget.. PDB debug info: `<namespace>/<pod-name>` blocked by pdb `<pdb-name>` with 0 unready pods.
27+
28+
- > Code: UpgradeFailed
29+
> Message: Drain node `aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx` failed when evicting pod `<pod-name>` failed with Too Many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See https://aka.ms/aks/debugdrainfailures. Original error: Cannot evict pod as it would violate the pod's disruption budget.. PDB debug info: `<namespace>/<pod-name>` blocked by pdb `<pdb-name>` with 0 unready pods.
2830
2931
## Cause
3032

31-
This error might occur if a pod is protected by the Pod Disruption Budget (PDB) policy. In this situation, the pod resists being drained.
33+
This error might occur if a pod is protected by the Pod Disruption Budget (PDB) policy. In this situation, the pod resists being drained, and after several attempts, the upgrade operation fails, and the cluster/node pool falls into a `Failed` state.
34+
35+
Check the PDB configuration: `ALLOWED DISRUPTIONS` value. The value should be `1` or greater. For more information, see [Plan for availability using pod disruption budgets](/azure/aks/operator-best-practices-scheduler#plan-for-availability-using-pod-disruption-budgets). For example, you can check the workload and its PDB as follows. You should observe the `ALLOWED DISRUPTIONS` column doesn't allow any disruption. If the `ALLOWED DISRUPTIONS` value is `0`, the pods aren't evicted and node drain fails during the upgrade process:
36+
37+
```console
38+
$ kubectl get deployments.apps nginx
39+
NAME READY UP-TO-DATE AVAILABLE AGE
40+
nginx 2/2 2 2 62s
41+
42+
$ kubectl get pod
43+
NAME READY STATUS RESTARTS AGE
44+
nginx-7854ff8877-gbr4m 1/1 Running 0 68s
45+
nginx-7854ff8877-gnltd 1/1 Running 0 68s
46+
47+
$ kubectl get pdb
48+
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
49+
nginx-pdb 2 N/A 0 24s
50+
51+
```
3252

33-
To test this situation, run `kubectl get pdb -A`, and then check the **Allowed Disruption** value. The value should be **1** or greater. For more information, see [Plan for availability using pod disruption budgets](/azure/aks/operator-best-practices-scheduler#plan-for-availability-using-pod-disruption-budgets).
53+
You can also check for any entries in Kubernetes events using the command `kubectl get events | grep -i drain`. A similar output shows the message "Eviction blocked by Too Many Requests (usually a pdb)":
54+
55+
```console
56+
$ kubectl get events | grep -i drain
57+
LAST SEEN TYPE REASON OBJECT MESSAGE
58+
(...)
59+
32m Normal Drain node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx Draining node: aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx
60+
2m57s Warning Drain node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
61+
12m Warning Drain node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
62+
32m Warning Drain node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
63+
32m Warning Drain node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
64+
31m Warning Drain node/aks-<nodepool-name>-xxxxxxxx-vmssxxxxxx Eviction blocked by Too Many Requests (usually a pdb): <pod-name>
65+
```
3466

35-
If the **Allowed Disruption** value is **0**, the node drain will fail during the upgrade process.
3667

3768
To resolve this issue, use one of the following solutions.
3869

3970
## Solution 1: Enable pods to drain
4071

4172
1. Adjust the PDB to enable pod draining. Generally, The allowed disruption is controlled by the `Min Available / Max unavailable` or `Running pods / Replicas` parameter. You can modify the `Min Available / Max unavailable` parameter at the PDB level or increase the number of `Running pods / Replicas` to push the Allowed Disruption value to **1** or greater.
42-
2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
73+
2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process triggers a reconciliation.
74+
75+
```console
76+
$ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
77+
Are you sure you want to perform this operation? (y/N): y
78+
Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
79+
Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
80+
```
4381

4482
## Solution 2: Back up, delete, and redeploy the PDB
4583

46-
1. Take a backup of the PDB `kubectl get pdb <pdb-name> -n <pdb-namespace> -o yaml > pdb_backup.yaml`, and then delete the PDB `kubectl delete pdb <pdb-name> -n /<pdb-namespace>`. After the upgrade is finished, you can redeploy the PDB `kubectl apply -f pdb_backup.yaml`.
47-
2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
84+
1. Take a backup of the PDB(s) using the command `kubectl get pdb <pdb-name> -n <pdb-namespace> -o yaml > pdb-name-backup.yaml`, and then delete the PDB using the command `kubectl delete pdb <pdb-name> -n <pdb-namespace>`. After the new upgrade attempt is finished, you can redeploy the PDB just applying the backup file using the command `kubectl apply -f pdb-name-backup.yaml`.
85+
2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process triggers a reconciliation.
86+
87+
```console
88+
$ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
89+
Are you sure you want to perform this operation? (y/N): y
90+
Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
91+
Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
92+
```
4893

49-
## Solution 3: Delete the pods that can't be drained
94+
## Solution 3: Delete the pods that can't be drained or scale the workload down to zero (0)
5095

5196
1. Delete the pods that can't be drained.
5297

5398
> [!NOTE]
54-
> If the pods were created by a deployment or StatefulSet, they'll be controlled by a ReplicaSet. If that's the case, you might have to delete the deployment or StatefulSet. Before you do that, we recommend that you make a backup: `kubectl get <kubernetes-object> <name> -n <namespace> -o yaml > backup.yaml`.
99+
> If the pods are created by a Deployment or StatefulSet, they'll be controlled by a ReplicaSet. If that's the case, you might have to delete or scale the workload replicas to zero (0) of the Deployment or StatefulSet. Before you do that, we recommend that you make a backup: `kubectl get <deployment.apps -or- statefulset.apps> <name> -n <namespace> -o yaml > backup.yaml`.
100+
101+
2. To scale down, you can use `kubectl scale --replicas=0 <deployment.apps -or- statefulset.apps> <name> -n <namespace>` before the reconciliation
102+
103+
3. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process triggers a reconciliation.
55104

56-
2. Try again to upgrade the AKS cluster to the same version that you tried to upgrade to previously. This process will trigger a reconciliation.
105+
```console
106+
$ az aks upgrade --name <aksName> --resource-group <resourceGroupName>
107+
Are you sure you want to perform this operation? (y/N): y
108+
Cluster currently in failed state. Proceeding with upgrade to existing version 1.28.3 to attempt resolution of failed cluster state.
109+
Since control-plane-only argument is not specified, this will upgrade the control plane AND all nodepools to version . Continue? (y/N): y
110+
```
57111

58112
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

support/azure/azure-kubernetes/toc.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@
3939
href: create-upgrade-delete/error-code-serviceprincipalvalidationclienterror.md
4040
- name: SubscriptionRequestsThrottled error (429)
4141
href: create-upgrade-delete/error-code-subscriptionrequeststhrottled.md
42+
- name: Throttled error (429)
43+
href: create-upgrade-delete/error-code-aksrequeststhrottled.md
4244
- name: SubnetWithExternalResourcesCannotBeUsedByOtherResources error
4345
href: create-upgrade-delete/subnet-with-external-resources-cannot-be-used-by-other-resources.md
4446
- name: Troubleshoot the AKSCapacityError error code
@@ -370,4 +372,3 @@
370372
href: error-codes/vmextensionerror-vhdfilenotfound.md
371373
- name: UnsatisfiablePDB error
372374
href: error-codes/unsatisfiablepdb-error.md
373-

0 commit comments

Comments
 (0)