Skip to content

Commit 37918d5

Browse files
author
Simonx Xu
authored
Merge pull request #7824 from MicrosoftDocs/repo_sync_working_branch
Confirm merge from repo_sync_working_branch to main to sync with https://github.com/MicrosoftDocs/SupportArticles-docs (branch main)
2 parents 7ba4ac8 + 0e27e63 commit 37918d5

1 file changed

Lines changed: 13 additions & 18 deletions

File tree

Lines changed: 13 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,36 @@
11
---
22
title: Node not ready but then recovers
3-
description: Troubleshoot scenarios in the status of an Azure Kubernetes Service (AKS) cluster node is Node Not Ready, but then the node recovers.
4-
ms.date: 04/15/2022
3+
description: Troubleshoot scenarios in which the status of an AKS cluster node is Node Not Ready, but then the node recovers.
4+
ms.date: 12/09/2024
55
ms.reviewer: rissing, chiragpa, momajed, v-leedennis
66
ms.service: azure-kubernetes-service
7-
#Customer intent: As an Azure Kubernetes user, I want to prevent the Node Not Ready status for nodes that later recover so that I can avoid future errors within an Azure Kubernetes Service (AKS) cluster.
7+
#Customer intent: As an Azure Kubernetes user, I want to prevent the Node Not Ready status for nodes that later recover so that I can avoid future errors within an AKS cluster.
88
ms.custom: sap:Node/node pool availability and performance
99
---
1010
# Troubleshoot Node Not Ready failures that are followed by recoveries
1111

12-
This article helps troubleshoot scenarios in which a node within a Microsoft Azure Kubernetes Service (AKS) cluster shows the Node Not Ready status, but then automatically recovers to a healthy state.
13-
14-
## Symptoms
15-
16-
You notice that your application stops responding while the node is reporting that it has a Not Ready status. However, the node recovers automatically, and now, it's looking for a root cause analysis (RCA).
12+
This article provides a guide to troubleshoot and resolve "Node Not Ready" issues in Azure Kubernetes Service (AKS) clusters. When a node enters a "Not Ready" state, it can disrupt the application's functionality and cause it to stop responding. Typically, the node recovers automatically after a short period. However, to prevent recurring issues and maintain a stable environment, it's important to understand the underlying causes to be able to implement effective resolutions.
1713

1814
## Cause
1915

20-
Possible causes of this issue include the following scenarios:
16+
There are several scenarios that could cause a "Not Ready" state to occur:
2117

22-
- The API server isn't available, and you're using a readiness probe for the deployment.
18+
- The unavailability of the API server. This causes the readiness probe to fail. This prevents the pod from being attached to the service so that traffic is no longer forwarded to the pod instance.
2319

24-
If a pod is running but isn't ready, that situation means that the readiness probe is failing. If the readiness probe fails, the pod isn't attached to the service, and traffic isn't forwarded to the pod instance.
25-
26-
- Virtual machine (VM) host faults occur. To determine whether VM host faults occurred, check the following information sources:
20+
- Virtual machine (VM) host faults. To determine whether VM host faults occurred, check the following information sources:
2721
- [AKS diagnostics](/azure/aks/concepts-diagnostics)
2822
- [Azure status](https://status.azure.com/)
2923
- Azure notifications (for any recent outages or maintenance periods)
3024

25+
## Resolution
26+
27+
Check the API server availability by running the `kubectl get apiservices` command. Make sure that the readiness probe is correctly configured in the deployment YAML file.
28+
29+
For further steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).
30+
3131
## Prevention
3232

3333
To prevent this issue from occurring in the future, take one or more of the following actions:
3434

3535
- Make sure that your service tier is fully paid for.
3636
- Reduce the number of `watch` and `get` requests to the API server.
37-
- Replace the node pool with a healthy node pool.
38-
39-
## More information
40-
41-
- For general troubleshooting steps, see [Basic troubleshooting of Node Not Ready failures](node-not-ready-basic-troubleshooting.md).

0 commit comments

Comments
 (0)