Skip to content

Commit b81f441

Browse files
authored
Merge branch 'main' into dapr
2 parents 08d4d4c + 47168b0 commit b81f441

337 files changed

Lines changed: 7843 additions & 4383 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.openpublishing.redirection.json

Lines changed: 328 additions & 0 deletions
Large diffs are not rendered by default.

support/azure/.openpublishing.redirection.azure.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6299,6 +6299,10 @@
62996299
{
63006300
"source_path": "virtual-machines/linux/linux-vm-no-boot-hyper-v-driver-issues.md",
63016301
"redirect_url": "/troubleshoot/azure/virtual-machines/linux/troubleshoot-lis-driver-issues-on-linux-vms"
6302+
},
6303+
{
6304+
"source_path": "azure-kubernetes/create-upgrade-delete/error-using-feature-requiring-virtual-machine-scale-set.md",
6305+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/welcome-azure-kubernetes"
63026306
}
63036307
]
63046308
}

support/azure/azure-kubernetes/availability-performance/cluster-node-virtual-machine-failed-state.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Azure Kubernetes Service cluster/node is in a failed state
33
description: Helps troubleshoot an issue where an Azure Kubernetes Service (AKS) cluster/node is in a failed state.
4-
ms.date: 04/01/2024
4+
ms.date: 03/10/2025
55
ms.reviewer: chiragpa, nickoman, v-weizhu, v-six, aritraghosh
66
ms.service: azure-kubernetes-service
77
keywords:
@@ -114,7 +114,7 @@ If you prefer to use Azure CLI to view the activity log for a failed cluster, fo
114114

115115
In the Azure portal, navigate to your AKS cluster resource and select **Diagnose and solve problems** from the left menu. You'll see a list of categories and scenarios that you can select to run diagnostic checks and get recommended solutions.
116116

117-
In the Azure CLI, use the `az aks collect` command with the `--name` and `--resource-group` parameters to collect diagnostic data from your cluster nodes. You can also use the `--storage-account` and `--sas-token` parameters to specify an Azure Storage account where the data will be uploaded. The output will include a link to the **Diagnose and Solve Problems** blade where you can view the results and suggested actions.
117+
In the Azure CLI, use the `az aks kollect` command with the `--name` and `--resource-group` parameters to collect diagnostic data from your cluster nodes. You can also use the `--storage-account` and `--sas-token` parameters to specify an Azure Storage account where the data will be uploaded. The output will include a link to the **Diagnose and Solve Problems** blade where you can view the results and suggested actions.
118118

119119
In the **Diagnose and Solve Problems** blade, you can select **Cluster Issues** as the category. If any issues are detected, you'll see a list of possible solutions that you can follow to fix them.
120120

support/azure/azure-kubernetes/availability-performance/node-not-ready-then-recovers.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
---
22
title: Node not ready but then recovers
33
description: Troubleshoot scenarios in which the status of an AKS cluster node is Node Not Ready, but then the node recovers.
4-
ms.date: 12/09/2024
5-
ms.reviewer: rissing, chiragpa, momajed, v-leedennis
4+
ms.date: 2/25/2024
5+
ms.reviewer: rissing, chiragpa, momajed, v-leedennis, novictor
66
ms.service: azure-kubernetes-service
77
#Customer intent: As an Azure Kubernetes user, I want to prevent the Node Not Ready status for nodes that later recover so that I can avoid future errors within an AKS cluster.
88
ms.custom: sap:Node/node pool availability and performance
99
---
1010
# Troubleshoot Node Not Ready failures that are followed by recoveries
1111

12-
This article provides a guide to troubleshoot and resolve "Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "Not Ready" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.
12+
This article provides a guide to troubleshoot and resolve Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "NotReady" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.
1313

1414
## Cause
1515

16-
There are several scenarios that could cause a "Not Ready" state to occur:
16+
There are several scenarios that could cause a "NotReady" state to occur:
1717

1818
- The unavailability of the API server. This causes the readiness probe to fail. This prevents the pod from being attached to the service so that traffic is no longer forwarded to the pod instance.
1919

@@ -24,7 +24,12 @@ There are several scenarios that could cause a "Not Ready" state to occur:
2424

2525
## Resolution
2626

27-
Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
27+
To resolve this issue, follow these steps:
28+
29+
1. Run `kubectl describe node <node-name>` to review detail information about the node's status. Look for any error messages or warnings that might indicate the root cause of the issue.
30+
2. Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
31+
3. Verify the node's network configuration to make sure that there are no connectivity issues.
32+
4. Check the node's resource usage, such as CPU, memory, and disk, to identify potential constraints. For more informations see [Monitor your Kubernetes cluster performance with Container insights](/azure/azure-monitor/containers/container-insights-analyze#view-performance-directly-from-a-cluster)
2833

2934
For further steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).
3035

support/azure/azure-kubernetes/connectivity/cannot-access-cluster-api-server-using-authorized-ip-ranges.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: Can't access the cluster API server using authorized IP ranges
33
description: Troubleshoot problems accessing the cluster API server when you use authorized IP address ranges in Azure Kubernetes Service (AKS).
4-
ms.date: 11/18/2024
5-
ms.reviewer: chiragpa, nickoman, v-leedennis
4+
ms.date: 03/26/2025
5+
ms.reviewer: chiragpa, nickoman, wonkilee, v-leedennis
66
ms.service: azure-kubernetes-service
77
keywords:
88
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot access issues to the cluster API server when I use authorized IP address ranges so that I can work with my Azure Kubernetes Service (AKS) cluster successfully.
@@ -14,7 +14,9 @@ This article discusses how to resolve a scenario in which you can't use authoriz
1414

1515
## Symptoms
1616

17-
If you try to create or manage an AKS cluster, you can't access the cluster API server.
17+
If you try to create or manage resources in an AKS cluster, you can't access the cluster API server. When you run `kubectl`, you receive the following error message:
18+
19+
> Unable to connect to the server: dial tcp x.x.x.x:443: i/o timeout
1820
1921
## Cause
2022

support/azure/azure-kubernetes/connectivity/error-from-server-error-dialing-backend-dial-tcp.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: 'Error from server: error dialing backend: dial tcp'
33
description: 'Troubleshoot the error dialing backend: dial tcp error that blocks you from using kubectl commands or other tools when you connect to the API server.'
4-
ms.date: 10/21/2024
5-
ms.reviewer: chiragpa, nickoman, v-leedennis, pihe
4+
ms.date: 03/05/2025
5+
ms.reviewer: chiragpa, nickoman, v-leedennis, pihe, mariusbutuc
66
ms.service: azure-kubernetes-service
77
keywords:
88
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the "Error from server: error dialing backend: dial tcp" error so that I can connect to the API server or use the `kubectl logs` command to get logs.

support/azure/azure-kubernetes/connectivity/tunnel-connectivity-issues.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: Tunnel connectivity issues
33
description: Resolve communication issues that are related to tunnel connectivity in an Azure Kubernetes Service (AKS) cluster.
4-
ms.date: 09/26/2024
5-
ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu
4+
ms.date: 03/23/2025
5+
ms.reviewer: chiragpa, andbar, v-leedennis, v-weizhu, albarqaw
66
ms.service: azure-kubernetes-service
77
keywords: Azure Kubernetes Service, AKS cluster, Kubernetes cluster, tunnels, connectivity, tunnel-front, aks-link
88
#Customer intent: As an Azure Kubernetes user, I want to avoid tunnel connectivity issues so that I can use an Azure Kubernetes Service (AKS) cluster successfully.
@@ -29,6 +29,8 @@ You receive an error message that resembles the following examples about port 10
2929
3030
> Error from server: error dialing backend: dial tcp \<aks-node-ip>:10250: i/o timeout
3131
32+
> Error from server: Get "https\://\<aks-node-name>:10250/containerLogs/\<namespace>/\<pod-name>/\<container-name>": http: server gave HTTP response to HTTPS client
33+
3234
The Kubernetes API server uses port 10250 to connect to a node's kubelet to retrieve the logs. If port 10250 is blocked, the kubectl logs and other features will only work for pods that run on the nodes in which the tunnel component is scheduled. For more information, see [Kubernetes ports and protocols: Worker nodes](https://kubernetes.io/docs/reference/ports-and-protocols/#node).
3335

3436
Because the tunnel components or the connectivity between the server and client can't be established, functionality such as the following won't work as expected:

support/azure/azure-kubernetes/create-upgrade-delete/aks-common-issues-faq.yml

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ metadata:
33
title: Azure Kubernetes Service (AKS) common issues FAQ
44
description: Review a list of frequently asked questions (FAQ) about common issues when you're working with an Azure Kubernetes Service (AKS) cluster.
55
ms.topic: faq
6-
ms.date: 11/14/2023
7-
ms.reviewer: chiragpa, nickoman, v-leedennis
6+
ms.date: 03/06/2025
7+
ms.reviewer: chiragpa, nickoman, jotavar, v-leedennis, v-weizhu
88
ms.service: azure-kubernetes-service
99
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
1010

@@ -26,8 +26,7 @@ sections:
2626
- question: |
2727
Can I move my cluster to a different subscription, or move my subscription with my cluster to a new tenant?
2828
answer: |
29-
If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint.
30-
29+
No. If you've moved your AKS cluster to a different subscription or the cluster's subscription to a new tenant, the cluster won't function because of missing cluster identity permissions. AKS doesn't support moving clusters across subscriptions or tenants because of this constraint. For more information, see [Operations FAQ](/azure/aks/faq#operations).
3130
- question: |
3231
What naming restrictions are enforced for AKS resources and parameters?
3332
answer: |
@@ -42,7 +41,10 @@ sections:
4241
- AKS node pool names must be all lowercase. The names must be 1-12 characters in length for Linux node pools and 1-6 characters for Windows node pools. A name must start with a letter, and the only allowed characters are letters and numbers.
4342
4443
- The *admin-username*, which sets the administrator user name for Linux nodes, must start with a letter. This user name may only contain letters, numbers, hyphens, and underscores. It has a maximum length of 32 characters.
45-
44+
45+
For more information about naming convention. see the following resources:
46+
- [Naming rules and restrictions for Azure resources](/azure/azure-resource-manager/management/resource-name-rules#microsoftcontainerservice)
47+
- [Abbreviation recommendations for Azure resources](/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations#containers)
4648
additionalContent: |
4749
[!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
4850

support/azure/azure-kubernetes/create-upgrade-delete/aks-increased-memory-usage-cgroup-v2.md

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: Increased memory usage reported in Kubernetes 1.25 or later versions
33
description: Resolve an increase in memory usage that's reported after you upgrade an Azure Kubernetes Service (AKS) cluster to Kubernetes 1.25.x.
4-
ms.date: 07/13/2023
5-
editor: v-jsitser
4+
ms.date: 03/03/2025
5+
editor: momajed
66
ms.reviewer: aritraghosh, cssakscic, v-leedennis
77
ms.service: azure-kubernetes-service
88
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
@@ -23,23 +23,49 @@ You experience one or more of the following symptoms:
2323

2424
## Cause
2525

26-
This increase is caused by a change in memory accounting within version 2 of the Linux control group (cgroup) API. [Cgroup v2](https://kubernetes.io/docs/concepts/architecture/cgroups/) is now the default cgroup version for Kubernetes 1.25 on AKS.
26+
This increase is caused by a change in memory accounting within version 2 of the Linux control group (`cgroup`) API. [Cgroup v2](https://kubernetes.io/docs/concepts/architecture/cgroups/) is now the default `cgroup` version for Kubernetes 1.25 on AKS.
2727

2828
> [!NOTE]
29-
> This issue is distinct from the memory saturation in nodes that's caused by applications or frameworks that aren't aware of cgroup v2. For more information, see [Memory saturation occurs in pods after cluster upgrade to Kubernetes 1.25](./aks-memory-saturation-after-upgrade.md).
29+
> This issue is distinct from the memory saturation in nodes that's caused by applications or frameworks that aren't aware of `cgroup` v2. For more information, see [Memory saturation occurs in pods after cluster upgrade to Kubernetes 1.25](./aks-memory-saturation-after-upgrade.md).
3030
3131
## Solution
3232

3333
- If you observe frequent memory pressure on the nodes, upgrade your subscription to increase the amount of memory that's available to your virtual machines (VMs).
3434

3535
- If you see a higher eviction rate on the pods, [use higher limits and requests for pods](/azure/aks/developer-best-practices-resource-management#define-pod-resource-requests-and-limits).
3636

37+
- `cgroup` v2 uses a different API than `cgroup` v1. If there are any applications that directly access the `cgroup` file system, update them to later versions that support `cgroup` v2. For example:
38+
39+
- **Third-party monitoring and security agents**:
40+
41+
Some monitoring and security agents depend on the `cgroup` file system. Update these agents to versions that support `cgroup` v2.
42+
43+
- **Java applications**:
44+
45+
Use versions that fully support `cgroup` v2:
46+
- OpenJDK/HotSpot: `jdk8u372`, `11.0.16`, `15`, and later versions.
47+
- IBM Semeru Runtimes: `8.0.382.0`, `11.0.20.0`, `17.0.8.0`, and later versions.
48+
- IBM Java: `8.0.8.6` and later versions.
49+
50+
- **uber-go/automaxprocs**:
51+
If you're using the `uber-go/automaxprocs` package, ensure the version is `v1.5.1` or later.
52+
53+
- An alternative temporary solution is to revert the `cgroup` version on your nodes by using the DaemonSet. For more information, see [Revert to cgroup v1 DaemonSet](https://github.com/Azure/AKS/blob/master/examples/cgroups/revert-cgroup-v1.yaml).
54+
55+
> [!IMPORTANT]
56+
> - Use the DaemonSet cautiously. Test it in a lower environment before applying to production to ensure compatibility and prevent disruptions.
57+
> - By default, the DaemonSet applies to all nodes in the cluster and reboots them to implement the `cgroup` change.
58+
> - To control how the DaemonSet is applied, configure a `nodeSelector` to target specific nodes.
59+
60+
3761
> [!NOTE]
3862
> If you experience only an increase in memory use without any of the other symptoms that are mentioned in the "Symptoms" section, you don't have to take any action.
3963
4064
## Status
4165

42-
We're actively working with the Kubernetes community to fix the underlying issue, and we'll keep you updated on our progress. We also plan to change the eviction thresholds or [resource reservations](/azure/aks/concepts-clusters-workloads#resource-reservations), depending on the outcome of the fix.
66+
We're actively working with the Kubernetes community to resolve the underlying issue. Progress on this effort can be tracked at [Azure/AKS Issue #3443](https://github.com/kubernetes/kubernetes/issues/118916).
67+
68+
As part of the resolution, we plan to adjust the eviction thresholds or update [resource reservations](/azure/aks/concepts-clusters-workloads#resource-reservations), depending on the outcome of the fix.
4369

4470
## Reference
4571

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: Troubleshoot the Throttled Error Code (429)
3+
description: Learn how to resolve the Throttled error (status 429) when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
4+
ms.date: 03/05/2025
5+
ms.reviewer: jovieir, chiragpa, v-weizhu
6+
ms.service: azure-kubernetes-service
7+
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the Throttled error code (status 429) so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
8+
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
9+
---
10+
# Troubleshoot the Throttled error code (429)
11+
12+
This article discusses how to identify and resolve the `Throttled` error (status 429) that occurs when you try to create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.
13+
14+
## Symptoms
15+
16+
When you try to create an AKS cluster, you receive the following "The PutManagedClusterHandler.PUT request limit has been exceeded" error message that shows a "SubCode" value of **Throttled** and a "Status" value of **429**:
17+
18+
> Category: ClientError;
19+
>
20+
> SubCode: Throttled;
21+
>
22+
> OrginalError: autorest/azure: Service returned an error. **Status=429**
23+
>
24+
> **Code="Throttled"**
25+
>
26+
> Message="> The PutManagedClusterHandler.PUT request limit has been exceeded for SubID='*\<subscription-id-guid>*', please retry again in X seconds. For more information, please visit aka.ms/aks/throttling";
27+
Request throttling can occur on various Azure components, so the error message might be different depending on the type of resource where this issue occurs.
28+
29+
Resource provider throttling is independent of ARM throttling and is tailored to the operations of a specific resource provider. In this scenario, AKS resource provider throttling is specific to the AKS resource provider and applies only to operations related to AKS resources.
30+
31+
## Cause
32+
33+
AKS requests are throttled. For information about how AKS limits work and the specific limits per hour, see [Throttling limits on AKS resource provider APIs](/azure/aks/quotas-skus-regions#throttling-limits-on-aks-resource-provider-apis).
34+
35+
## Solution
36+
37+
To resolve this issue, examine and modify your access pattern of the throttled subscription. The following table lists the possible access patterns and corresponding solutions.
38+
39+
| Access pattern | Solution |
40+
| -------------- | -------- |
41+
| Automated scripts constantly run LIST operations against managedCluster resources. | Run the scripts less frequently. |
42+
| Users attempt to deploy multiple AKS clusters in a short period of time. | Space out deployments or use different subscriptions.|
43+
| Users attempt to modify the same AKS cluster multiple times consecutively. | Space out operations. Ensure successful completion before initiating another one.|
44+
| Users attempt to add, modify, or delete one or more agentPools on the same AKS cluster. | Space out operations. Ensure successful completion before initiating another one. |
45+
46+
## More information
47+
48+
[General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)
49+
50+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

0 commit comments

Comments
 (0)