Skip to content

Commit 37040d5

Browse files
Merge pull request #9467 from claudiogodoy99/main
AB#7010: Troubleshoot Performance Issues with the Managed NGINX Ingress Controller in AKS
2 parents 89cc765 + 8b54dec commit 37040d5

2 files changed

Lines changed: 163 additions & 0 deletions

File tree

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
---
2+
title: Troubleshoot Performance Issues in the Managed NGINX Ingress Controller in AKS
3+
description: Step-by-step guide to identify and resolve performance issues in the Managed NGINX Ingress Controller in AKS.
4+
ms.reviewer: claudiogodoy
5+
ms.service: azure-kubernetes-service
6+
ms.date: 05/24/2025
7+
---
8+
# Troubleshoot issues in Managed NGINX ingress controller
9+
10+
The [Managed NGINX ingress controller](/azure/aks/app-routing) is a routing add-on that enables the routing of HTTP and HTTPS traffic to applications that run on an [Azure Kubernetes Service (AKS)](/azure/aks/) cluster.
11+
12+
The routing system might be the root cause of performance-related problems. This article provides step-by-step guidance to troubleshoot NGINX ingress controller performance issues. This article also discusses common symptoms, root cause analysis, and configuration adjustments.
13+
14+
## Prerequisites
15+
16+
Before you start, make sure that you have the following tool installed:
17+
18+
- Kubernetes CLI (`kubectl`)
19+
20+
Use Azure CLI, and run the `az aks install-cli` command.
21+
22+
## Symptoms
23+
24+
You receive an HTTP gateway error code like `502` or `504` or a response time error message when you run the NGINX ingress controller. This can indicate an NGINX exhaustion problem.
25+
26+
You encounter a significant difference between your service response time and the end-to-end response time. This can indicate a latency added by NGINX and an NGINX exhaustion problem.
27+
28+
## Cause
29+
30+
The most common cause of performance issues on the NGINX is CPU exhaustion. During a load spike in the system, a good troubleshooting method is to monitor [HPA](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) behavior. By default, the routing add-on creates a [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) that's named `app-routing-system`.
31+
32+
## Resolution
33+
34+
To troubleshoot the issue, follow these steps.
35+
36+
### Step 1: Verify horizontal pod autoscaler (HPA) behavior
37+
38+
1. Get the HPA name:
39+
40+
```console
41+
kubectl get hpa -n app-routing-system
42+
```
43+
44+
2. Monitor the `HPA` behavior:
45+
46+
```console
47+
kubectl get hpa <HPA_NAME> -n app-routing-system -w
48+
```
49+
50+
3. Evaluate the results:
51+
52+
```console
53+
$ kubectl get hpa <HPA_NAME> -n app-routing-system -w
54+
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
55+
nginx Deployment/nginx cpu: 83%/70% 1 2 1 77m
56+
nginx Deployment/nginx cpu: 83%/70% 1 2 2 77m
57+
nginx Deployment/nginx cpu: 106%/70% 1 2 2 79m
58+
nginx Deployment/nginx cpu: 133%/70% 1 2 2 80m
59+
```
60+
61+
The **TARGETS** column shows the CPU threshold at which the `HPA` is triggered to scale up the pods. This behavior has several possible causes:
62+
63+
- The `HPA` reached the maximum number of pods.
64+
- No nodes are available to use to schedule the pods.
65+
66+
### Step 2: Look for pods in a pending state
67+
68+
If your evaluation reveals that `NGINX HPA` didn't reach the maximum number of pods, the [kube-scheduler](https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/#kube-scheduler) might not be able to find available nodes to use to schedule the `NGINX pods`. To find pending pods, run the following command:
69+
70+
```console
71+
kubectl get pod --field-selector=status.phase=Pending -n app-routing-system
72+
```
73+
74+
> [!NOTE]
75+
> If there are pending pods, the cluster might experience a resource exhaustion problem. For more information, see [Troubleshoot pod scheduler errors in Azure Kubernetes Service](../availability-performance/troubleshoot-pod-scheduler-errors.md).
76+
77+
### Step 3: Check whether limits are applied to the NGINX deployment
78+
79+
Any misconfiguration on the `NGINX` [resource limits or requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) can cause the `HPA` to scale up more pods than it needs. To check the limits, follow these steps:
80+
81+
1. Describe the NGINX deployment:
82+
83+
```console
84+
kubectl describe deploy nginx -n app-routing-system
85+
```
86+
87+
2. Verify the requests and limits:
88+
89+
```console
90+
$ kubectl describe deploy nginx -n app-routing-system
91+
Name: nginx
92+
....
93+
Selector: app=nginx
94+
....
95+
Pod Template:
96+
....
97+
Containers:
98+
controller:
99+
...
100+
Limits:
101+
...
102+
Requests:
103+
...
104+
```
105+
106+
## More information
107+
108+
By default, the current version of the NGINX ingress controller doesn't set limits for NGINX pods. The controller requests `500m` CPU to be used by the `HPA`. We recommend that you don't change these settings directly in the deployment definition.
109+
110+
If the `HPA` reaches the maximum number of pods, and the deployment requests and limits remain unchanged, configure the [custom resource definition (CRD)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) that's named [NginxIngressController](https://github.com/Azure/aks-app-routing-operator/blob/main/config/crd/bases/approuting.kubernetes.azure.com_nginxingresscontrollers.yaml).
111+
112+
### Configuration options
113+
114+
The following configuration options directly affect the `HPA` behavior.
115+
116+
| Property | Type | Description | Required | Default |
117+
|----------------|---------|-----------------------------------------------------------------------------|----------|-----------|
118+
| `scaling` | object | Configuration for scaling the controller. Contains nested properties. | No | Not applicable |
119+
| `maxReplicas` | integer | Upper limit for replicas. | No | 100 |
120+
| `minReplicas` | integer | Lower limit for replicas. | No | 2 |
121+
| `threshold` | string | Scaling threshold that defines how aggressively to scale. Options include: `rapid`, `steady`, `balanced`. | No | balanced |
122+
123+
### Apply the configuration
124+
125+
1. Edit the NginxIngressController CRD:
126+
127+
```console
128+
kubectl edit nginxingresscontroller -n app-routing-system
129+
```
130+
131+
2. Add or modify the scaling configuration:
132+
133+
```yaml
134+
spec:
135+
scaling:
136+
maxReplicas: 10
137+
minReplicas: 2
138+
threshold: "balanced"
139+
```
140+
141+
3. To apply the changes, save them, and then exit the editor.
142+
143+
4. Verify the changes:
144+
145+
```console
146+
kubectl get hpa -n app-routing-system
147+
```
148+
149+
The HPA automatically updates, based on your new configuration. The NGINX ingress controller scales according to the specified parameters.
150+
151+
## References
152+
153+
- [Learn more about Azure Kubernetes Service (AKS) best practices](/azure/aks/best-practices)
154+
- [Monitor your Kubernetes cluster performance with Container insights](/azure/azure-monitor/containers/container-insights-analyze)
155+
- [NGINX ingress controller](https://github.com/kubernetes/ingress-nginx)
156+
157+
[!INCLUDE [Third-party information disclaimer](../../../includes/third-party-disclaimer.md)]
158+
159+
[!INCLUDE [Third-party contact information disclaimer](../../../includes/third-party-contact-disclaimer.md)]
160+
161+
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]

support/azure/azure-kubernetes/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,8 @@ items:
153153
href: load-bal-ingress-c/create-unmanaged-ingress-controller.md
154154
- name: Troubleshoot Application Gateway Ingress Controller connectivity
155155
href: load-bal-ingress-c/troubleshoot-app-gateway-ingress-controller-connectivity-issues.md
156+
- name: Troubleshoot Performance Issues with the Managed NGINX Ingress Controller in AKS
157+
href: load-bal-ingress-c/troubleshoot-performance-ingress.md
156158

157159
- name: Troubleshoot Kubernetes control plane
158160
items:

0 commit comments

Comments
 (0)