Skip to content

Commit af1b766

Browse files
Update zonalallocation-allocationfailed-error.md
1 parent 05cd86a commit af1b766

1 file changed

Lines changed: 35 additions & 28 deletions

File tree

support/azure/azure-kubernetes/error-codes/zonalallocation-allocationfailed-error.md

Lines changed: 35 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ms.date: 09/05/2024
55
author: axelgMS
66
ms.author: axelg
77
editor: v-jsitser
8-
ms.reviewer: rissing, chiragpa, erbookbi, v-weizhu
8+
ms.reviewer: rissing, chiragpa, erbookbi, v-weizhu, v-ryanberg
99
ms.service: azure-kubernetes-service
1010
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
1111
---
@@ -21,39 +21,25 @@ This article describes how to identify and resolve the `ZonalAllocationFailed`,
2121

2222
## Symptoms
2323

24-
When you try to create an AKS cluster, you receive the following error message:
24+
When you try to create, upgrade or scale up a cluster, you receive one of the following error messages:
2525

26-
> Reconcile vmss agent pool error: VMSSAgentPoolReconciler retry failed:
27-
>
28-
> Category: InternalError;
29-
>
30-
> SubCode: ZonalAllocationFailed;
31-
>
32-
> Dependency: Microsoft.Compute/VirtualMachineScaleSet;
33-
>
34-
> OrginalError: Code="ZonalAllocationFailed"
35-
>
36-
> Message="**Allocation failed. We do not have sufficient capacity for the requested VM size in this zone.** Read more about improving likelihood of allocation success at <https://aka.ms/allocation-guidance>";
37-
>
38-
> AKSTeam: NodeProvisioning
26+
Code: `ZonalAllocationFailed`
3927

40-
Or, when you try to upgrade or scale up a cluster, you receive the following error message:
28+
Message: "Allocation failed. We do not have sufficient capacity for the requested VM size in this zone. Read more about improving likelihood of allocation success at https://aka.ms/allocation-guidance. Please note that allocation failures can also arise if a proximity placement group is associated with this VMSS. See https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/error-code-zonalallocationfailed-allocationfailed for more details. This is not AKS controlled behavior, please ask help to VMSS team for allocation failure. If the error is due to capacity constrain, consider upgrade with maxUnavailable instead of maxSurge, details: aka.ms/aks/maxUnavailable."
4129

42-
> Code="OverconstrainedAllocationRequest"
43-
>
44-
> Message="**Allocation failed. VM(s) with the following constraints cannot be allocated, because the condition is too restrictive.** Please remove some constraints and try again."
30+
Code: `AllocationFailed`
4531

46-
Or, when you use dedicated hosts in a cluster and try to create or scale up a node pool, you receive the following error message:
32+
Message: "The VM allocation failed due to an internal error. Please retry later or try deploying to a different location. Please note that allocation failures can also arise if a proximity placement group is associated with this VMSS. See https://learn.microsoft.com/troubleshoot/azure/azure-kubernetes/error-code-zonalallocationfailed-allocationfailed for more details.This is not AKS controlled behavior, please ask help to VMSS team for allocation failure."
4733

48-
> Code="AllocationFailed"
49-
>
50-
> Message="**Allocation failed. VM allocation to the dedicated host failed. Please ensure that the dedicated host has enough capacity or try allocating elsewhere.**"
34+
Code: `OverconstrainedAllocationRequest`
5135

52-
## Cause 1: Limited zone availability in a SKU
36+
Message: "Create or update VMSS failed. Allocation failed. VM(s) with the following constraints cannot be allocated, because the condition is too restrictive. Please remove some constraints and try again. Constraints applied are: - Differencing (Ephemeral) Disks - Networking Constraints (such as Accelerated Networking or IPv6) - VM Size"
37+
38+
### Cause 1: Limited zone availability in a SKU
5339

5440
You're trying to deploy, upgrade or scale up a cluster in a zone that has limited availability for the specific SKU.
5541

56-
## Solution 1: Use a different SKU, zone, or region
42+
### Solution 1: Use a different SKU, zone, or region
5743

5844
Try one or more of the following methods:
5945

@@ -64,6 +50,28 @@ Try one or more of the following methods:
6450

6551
For more information about how to fix this error, see [Resolve errors for SKU not available](/azure/azure-resource-manager/troubleshooting/error-sku-not-available).
6652

53+
### Solution 2: Dynamically scale using Node Auto Provisioning
54+
55+
[Node Auto Provisioning](/azure/aks/node-auto-provisioning) allows you to automatically provision VM SKUs based on your workload needs. If a SKU isn't available due to capacity constraints, Node Auto Provisioning (NAP) selects another SKU type based on the specifications provided in the customer resource definitions (CRDs) like `NodePool` and `AKSNodeClass`. This can be helpful for scaling scenarios when certain SKU capacity becomes limited. For more information on configuring your NAP cluster, see [Configure node pools for node auto-provisioning (NAP) in Azure Kubernetes Service (AKS)](/azure/aks/node-auto-provisioning-node-pools) and [Configure AKSNodeClass resources for node auto-provisioning (NAP) in Azure Kubernetes Service (AKS)](https://learn.microsoft.com/azure/aks/node-auto-provisioning-aksnodeclass).
56+
57+
### Solution 3: Upgrade in place using `MaxUnavailable`
58+
59+
If you don’t need surge nodes during upgrades, see [Customize unavailable nodes](/azure/aks/upgrade-aks-node-pools-rolling#customize-unavailable-nodes) for information on how to upgrade with the existing capacity. Set `MaxUnavailable` to a value greater than 0 and set `MaxSurge` equal to 0. Existing nodes are then cordoned and drained one at a time and pods are evicted to remaining nodes. No buffer node is created.
60+
61+
### Solution 4: Use deployment recommender in portal for new cluster creates
62+
63+
During an AKS cluster creation in the Azure portal, if the selected nodepool SKU isn't available in the chosen region and zones, the deployment recommender recommends an alternative SKU, zones, and region combination that has availability.
64+
65+
## Solution 5: Use priority expanders with cluster-autoscaler
66+
67+
The cluster-autoscaler [priority expander](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/priority/readme.md) lets you define an ordered list of node pools to attempt scaling in sequence. For example: Spot pools first (cost optimization), then on-demand pools (availability fallback). Conditional Access tries to implement the highest priority pool first. If scaling fails (for example, due to allocation failure), it tries the next pool.
68+
69+
**Limitations**
70+
71+
- Conditional Access doesn't create new node pools. It only works with existing pools. If you want dynamic SKU provisioning, use NAP, which can create pools based on SKU availability.
72+
73+
- Priority expander works at node pool level, not SKU level. You must pre-create pools for each SKU family you want to use.
74+
6775
## Cause 2: Too many constraints for a virtual machine to accommodate
6876

6977
If you receive an `OverconstrainedAllocationRequest` error code, the Azure Compute platform can't allocate a new virtual machine (VM) to accommodate the required constraints. These constraints usually (but not always) include the following items:
@@ -75,15 +83,15 @@ If you receive an `OverconstrainedAllocationRequest` error code, the Azure Compu
7583
- Ephemeral disk
7684
- Proximity placement group (PPG)
7785

78-
## Solution 2: Don't associate a proximity placement group with the node pool
86+
## Solution: Don't associate a proximity placement group with the node pool
7987

8088
If you receive an `OverconstrainedAllocationRequest` error code, you can try to create a new node pool that isn't associated with a proximity placement group.
8189

8290
## Cause 3: Not enough dedicated hosts or fault domains
8391

8492
You're trying to deploy a node pool in a dedicated host group that has limited capacity or doesn't satisfy the fault domain constraint.
8593

86-
## Solution 3: Ensure you have enough dedicated hosts for your AKS nodes/VMSS
94+
## Solution: Ensure you have enough dedicated hosts for your AKS nodes/VMSS
8795

8896
As per [Planning for ADH Capacity on AKS](/azure/aks/use-azure-dedicated-hosts#planning-for-adh-capacity-on-aks), you're responsible for planning enough dedicated hosts to span as many fault domains as required by your AKS VMSS. For example, if the AKS VMSS is created with *FaultDomainCount=2*, you need at least two dedicated hosts in different fault domains (*FaultDomain 0* and *FaultDomain 1*).
8997

@@ -101,4 +109,3 @@ We have identified several methods to improve how we load-balance under a high-r
101109

102110
- [Fix an AllocationFailed or ZonalAllocationFailed error when you create, restart, or resize Virtual Machine Scale Sets in Azure](../../virtual-machine-scale-sets/allocationfailed-or-zonalallocationfailed.md)
103111

104-
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

0 commit comments

Comments
 (0)