You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Istio Service Mesh Add-on Gateway API ingress Troubleshooting
3
+
description: Learn how to do Gateway API ingress troubleshooting on the Istio service mesh add-on for Azure Kubernetes Service (AKS).
4
+
ms.date: 08/26/2025
5
+
author: nshankar13
6
+
ms.author: nshankar
7
+
ms.reviewer: jkatariya
8
+
ms.service: azure-kubernetes-service
9
+
ms.topic: troubleshooting-general
10
+
ms.custom: sap:Extensions, Policies and Add-Ons
11
+
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot Gateway-API based ingress gateways of the Istio add-on so that I can use the Istio service mesh successfully.
12
+
---
13
+
14
+
# Istio service mesh add-on gateway api ingress troubleshooting
15
+
16
+
This article discusses how to troubleshoot ingress gateways that are configured by using the [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) for the Istio service mesh add-on.
17
+
18
+
## Overview
19
+
20
+
Similar to the [classic Istio ingress gateways](./istio-add-on-ingress-gateway.md), Gateway API-based ingress gateways for the Istio add-on are Envoy-based reverse proxies. Users must have the [AKS Managed Gateway API CRDs](/azure/aks/managed-gateway-api) installed on their cluster before they can use the Istio add-on for Gateway API-based ingress.
21
+
22
+
## Before troubleshooting
23
+
24
+
Before you proceed, take the following actions:
25
+
26
+
- Install the [Managed Gateway API CRDs](/azure/aks/managed-gateway-api) on your cluster.
27
+
- Make sure that you have the Istio add-on installed and are on ASM minor revision `asm-1-26` or a later revision. Follow the [installation guide](/azure/aks/istio-deploy-addon) to enable the Istio add-on and the [upgrade documentation](/azure/aks/istio-upgrade) to upgrade your mesh to `asm-1-26` if you're on an earlier revision.
28
+
29
+
## Networking, firewall, and load balancer errors troubleshooting
30
+
31
+
### Step 1: Make sure that Azure Load Balancer health probes are configured appropriately
32
+
33
+
In some cases, traffic from Azure Load Balancer to the Istio Gateway API Deployment is blocked because of failing health probes. You can address this issue by adding [Azure LoadBalancer annotations](https://cloud-provider-azure.sigs.k8s.io/topics/loadbalancer/) for the health probe path/port/protocol directly to the `Gateway` object, or by [customizing](#gateway-resource-customization-troubleshooting) the `GatewayClass`-level ConfigMap or the per-`Gateway` ConfigMap.
You can also see whether health probes are failing by inspecting the `LoadBalancer` in the infrastructure resource group for the cluster on Azure Portal under `Settings/Properties`.
65
+
66
+
### Step 2: Make sure no firewall or NSG rules block ingress traffic
67
+
68
+
Verify that no [firewall](/azure/firewall/protect-azure-kubernetes-service) or [Network Security Group (NSG) rules](/azure/virtual-network/network-security-groups-overview) rules block traffic to the ingress gateway.
69
+
70
+
Double check whether you set restrictions to allow traffic to only the subnets of your user node pools. If the Gateway API pods are scheduled onto [system node pools](/azure/aks/use-system-pools?tabs=azure-cli), incoming traffic to these pods could be blocked. You can address this issue by allowing traffic to the subnets of your system node pools.
71
+
72
+
## Gateway configuration troubleshooting
73
+
74
+
### Step 1: Make sure the gatewayClassName is set to `istio`
75
+
76
+
Verify that all `Gateways` you created have the `spec.gatewayClassName` set to `istio`.
77
+
78
+
### Step 2: Verify cross-namespace references
79
+
80
+
Depending on the namespace that the `Gateway` and respective Routes are deployed in, the `Gateway` `spec.listeners.allowedRoutes` value should be set accordingly to allow Routes from only the same namespace or across different namespaces. Likewise, the `spec.parentRefs` value for Routes should reference the correct `Gateway` and provide the appropriate namespace for cross-namespace `Gateway` references. For more information, see the Gateway API docs on [cross-namespace routing](https://gateway-api.sigs.k8s.io/guides/multiple-ns/).
81
+
82
+
### Step 3: Inspect the `Gateway` for programming errors
83
+
84
+
If the `Gateway` has a programmed status of `failed` or `unknown`, you should inspect the `Gateway` object for more details. You can take this step by running `kubectl get gateway <gateway-name> -n <gateway-namespace> -o yaml` and `kubectl describe gateway <gateway-name> -n <gateway-namespace> `.
85
+
86
+
### Step 4: Inspect `istiod` and `Gateway` logs for errors
87
+
88
+
The `istiod` logs may have additional details about `Gateway` programming-related errors. If the gateway is programmed successfully, and the pod deployments are created, but other issues occur, try inspecting the `Gateway` pod logs for any potential errors. The `Gateway` pod deployment name follows the format, `<gateway-name>-istio`.
89
+
90
+
## Minor revision upgrades and revision label troubleshooting
91
+
92
+
By default during an [Istio add-on minor revision upgrade](/azure/aks/istio-upgrade), if two control planes are deployed on the cluster simultaneously, the higher revision takes ownership of the `Gateway` resources if the gateways aren't labeled with a specific ASM revision:
93
+
94
+
```yaml
95
+
apiVersion: gateway.networking.k8s.io/v1
96
+
kind: Gateway
97
+
metadata:
98
+
name: httpbin-gateway
99
+
labels:
100
+
istio.io/rev: asm-1-26
101
+
spec:
102
+
gatewayClassName: istio
103
+
```
104
+
105
+
During the minor revision upgrade, verify that the pods and deployments for the gateway are automatically updated to have the new proxy minor image version that corresponds to the later control plane minor revision. If this condition isn't true, try to restart the Deployment.
106
+
107
+
If your gateways are labeled explicitly with an ASM revision, relabel them accordingly before you finish or roll back the upgrade operation.
108
+
109
+
## Gateway resource customization troubleshooting
110
+
111
+
The Istio add-on supports [customization of the resources](/azure/aks/istio-gateway-api#resource-customizations) that are created for the gateways, as follows:
112
+
113
+
- Deployment
114
+
- Service
115
+
- Horizontal Pod Autoscaler (HPA)
116
+
- PodDisruptionBudget (PDB)
117
+
118
+
Follow these troubleshooting steps for issues that relate to configuring the `Gateway` resources.
119
+
120
+
### Step 1: Make sure that customization fields are on the allowlist
121
+
122
+
Make sure that the customizations for both `GatewayClass`-level ConfigMaps and `Gateway`-level ConfigMaps include only fields that are on the [allowlist](/azure/aks/istio-gateway-api#resource-customization-allowlist) for the specific resource.
123
+
124
+
### Step 2: Make sure that GatewayClass-level ConfigMap is configured correctly
125
+
126
+
`GatewayClass`-level ConfigMap `istio-gateway-class-defaults` is automatically deployed in the `aks-istio-system` namespace by the Istio add-on when the Managed Gateway API installation is enabled on the cluster. Notice that it could take up to five minutes for the `istio-gateway-class-defaults` ConfigMap to be deployed after you install the Managed Gateway API CRDs.
127
+
128
+
If you're editing this ConfigMap, make sure that you keep the `gateway.istio.io/defaults-for-class` label set to `istio`. You can have only one `GatewayClass`-level ConfigMap deployed at a time.
If both the `GatewayClass`-level ConfigMap and a `Gateway`-level ConfigMap are deployed, the `Gateway`-level ConfigMap customizations take precedence. Make sure that the desired resource customizations for the gateway are set in the `Gateway`-level ConfigMap. Also, verify that the `spec.infrastructure.parametersRef` field references the correct ConfigMap for that gateway.
If the `Gateway` customizations don't propagate to their respective resources, verify that the ConfigMap spec is valid in terms of indentation, correct field names, spelling, and so on. You should also inspect the `istiod` logs to see whether any issues affect template rendering or resource creation for the gateways.
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/extensions/istio-add-on-general-troubleshooting.md
+22-12Lines changed: 22 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: General Istio service mesh add-on troubleshooting
2
+
title: General Istio Service Mesh Add-on Troubleshooting
3
3
description: Learn how to do general troubleshooting of the Istio service mesh add-on for Azure Kubernetes Service (AKS).
4
4
ms.date: 03/18/2025
5
5
author: nshankar13
@@ -13,7 +13,7 @@ ms.custom: sap:Extensions, Policies and Add-Ons
13
13
---
14
14
# General troubleshooting of the Istio service mesh add-on
15
15
16
-
This article discusses general strategies (that use `kubectl`, `istioctl`, and other tools) to troubleshoot issues that are related to the Istio service mesh add-on for Microsoft Azure Kubernetes Service (AKS). This article also provides a list of possible error messages, reasons for error occurrences, and recommendations to resolve these errors.
16
+
This article discusses general strategies (that use `kubectl`, `istioctl`, and other tools) to troubleshoot issues that are related to the Istio service mesh add-on for Microsoft Azure Kubernetes Service (AKS). This article also provides a list of possible error messages, reasons for error occurrences, and recommendations to resolve these errors.
The Istio pod is managed by a deployment. It's automatically re-created and redeployed after you delete it directly. Therefore, deleting the pod is an alternative method for restarting the pod.
51
51
52
52
> [!NOTE]
53
-
> Alternatively, you can restart the deployment directly by running the following [kubectl rollout restart](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_rollout/kubectl_rollout_restart/) command:
53
+
> You can also restart the deployment directly by running the following [kubectl rollout restart](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_rollout/kubectl_rollout_restart/) command:
If Istiod isn't scheduled, or if the pod isn't responding, you might want to check the status of the deployment and the replica sets. To do this, run the [kubectl get](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_get/) command:
61
+
If Istiod isn't scheduled, or if the pod isn't responding, you might want to check the status of the deployment and the replica sets. To do this step, run the [kubectl get](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_get/) command:
62
62
63
63
```bash
64
64
kubectl get <resource-type> [[--selector app=istiod] | [<resource-name>]]
The following troubleshooting steps describe how to collect information and debug your mesh environment by running various `istioctl` commands.
115
+
The following troubleshooting steps discuss how to collect information and debug your mesh environment by running various `istioctl` commands.
116
116
117
117
All `istioctl` commands must be run together with the `--istioNamespace aks-istio-system` flag to point to the AKS add-on installation of Istio.
118
118
@@ -229,25 +229,35 @@ To address common traffic management and security misconfiguration issues that I
229
229
230
230
For links to discussion about other issues, such as sidecar injection, observability, and upgrades, see [Common problems](https://istio.io/latest/docs/ops/common-problems/) on the Istio documentation site.
231
231
232
-
### Step 3: Avoid CoreDNS overload
232
+
### Step 3: Verify protocol selection
233
+
234
+
Although Istio can automatically detect any TCP-based protocol, in certain cases, the protocol in the `Service` spec may have to be [explicitly declared](https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/) to unblock communication issues. This can be done by setting the protocol in the port `name` or in `appProtocol`. In this case, `appProtocol` takes precedence. For instance, certain scenarios might require you to set the protocol to `tcp` to proxy traffic as raw TCP, as opposed to HTTP or HTTPS.
235
+
236
+
### Step 4: Avoid CoreDNS overload
233
237
234
238
Issues that relate to CoreDNS overload might require you to change certain Istio DNS settings, such as the `dnsRefreshRate` field in the Istio MeshConfig definition.
235
239
236
-
### Step 4: Fix pod and sidecar race conditions
240
+
### Step 5: Fix pod and sidecar race conditions
241
+
242
+
If your application pod starts before the Envoy sidecar starts, the application might become unresponsive, or it might restart. For instructions to avoid this problem, see [Pod or containers start with network issues if istio-proxy is not ready](https://istio.io/latest/docs/ops/common-problems/injection/#pod-or-containers-start-with-network-issues-if-istio-proxy-is-not-ready). Specifically, setting the `holdApplicationUntilProxyStarts` MeshConfig field under `defaultConfig` to `true` can help prevent these race conditions.
243
+
244
+
### Step 6: Verify OutboundTrafficPolicy mode and Service Entry configuration for outbound access
245
+
246
+
Issues that relate to outbound access or [egress gateways](./istio-add-on-egress-gateway.md) might occur because of certain Istio configurations that pertain to external service configuration. Verify whether the `outboundTrafficPolicy.mode` either in the [Shared MeshConfig](./istio-add-on-meshconfig.md) or `Sidecar` custom resources is set to `REGISTRY_ONLY`. If so, then a `ServiceEntry` must be explicitly declared for external service to enable outbound access. When you use egress gateways, the resolution for the ServiceEntry must be set to `DNS`.
237
247
238
-
If your application pod starts before the Envoy sidecar starts, the application might become unresponsive, or it might restart. For instructions about how to avoid this problem, see [Pod or containers start with network issues if istio-proxy is not ready](https://istio.io/latest/docs/ops/common-problems/injection/#pod-or-containers-start-with-network-issues-if-istio-proxy-is-not-ready). Specifically, setting the `holdApplicationUntilProxyStarts` MeshConfig field under `defaultConfig` to `true` can help prevent these race conditions.
248
+
Also, keep in mind that, by default, `ServiceEntries` is exported across all namespaces. To restrict the scope of a `ServiceEntry` to a particular namespace, you should use the `exportTo` field in the [spec](https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-export_to).
239
249
240
-
### Step 5: Configure a Service Entry when using an HTTP proxy for outbound traffic
250
+
### Step 7: Configure a Service Entry when using an HTTP proxy for outbound traffic
241
251
242
-
If your cluster uses an HTTP proxy for outbound internet access, you'll have to configure a Service Entry. For more information, see [HTTP proxy support in Azure Kubernetes Service](/azure/aks/http-proxy#istio-add-on-http-proxy-for-external-services).
252
+
If your cluster uses an HTTP proxy for outbound internet access, you have to configure a Service Entry. For more information, see [HTTP proxy support in Azure Kubernetes Service](/azure/aks/http-proxy#istio-add-on-http-proxy-for-external-services).
243
253
244
-
### Step 6: Enable Envoy access logging
254
+
### Step 8: Enable Envoy access logging
245
255
246
256
Enabling Envoy [access logging](https://istio.io/latest/docs/tasks/observability/logs/access-log/) helps identify and pinpoint issues in the gateways and sidecar proxies. For more information about logging and telemetry collection for the Istio add-on, see the documentation on [mesh configuration](/azure/aks/istio-meshconfig), [Telemetry API](/azure/aks/istio-telemetry), and [Istio metrics collection](/azure/aks/istio-metrics-managed-prometheus).
247
257
248
258
## Error messages
249
259
250
-
The following table contains a list of possible error messages (for deploying the add-on, enabling ingress gateways, and performing upgrades), the reason why an error occurred, and recommendations for resolving the error.
260
+
The following table contains a list of possible error messages (for deploying the add-on, enabling ingress gateways, and performing upgrades), the reason why an error occurs, and recommendations to resolve the error.
0 commit comments