Skip to content

Commit e3272d0

Browse files
Merge pull request #9612 from nshankar13/nshankar/add-gateway-api-tsgs
AB#8042: Add user-facing TSG for Managed Gateway API and Istio Gateway API
2 parents 8386694 + b16fb30 commit e3272d0

6 files changed

Lines changed: 257 additions & 62 deletions

File tree

support/azure/azure-kubernetes/extensions/istio-add-on-egress-gateway.md

Lines changed: 44 additions & 40 deletions
Large diffs are not rendered by default.
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
title: Istio Service Mesh Add-on Gateway API ingress Troubleshooting
3+
description: Learn how to do Gateway API ingress troubleshooting on the Istio service mesh add-on for Azure Kubernetes Service (AKS).
4+
ms.date: 08/26/2025
5+
author: nshankar13
6+
ms.author: nshankar
7+
ms.reviewer: jkatariya
8+
ms.service: azure-kubernetes-service
9+
ms.topic: troubleshooting-general
10+
ms.custom: sap:Extensions, Policies and Add-Ons
11+
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot Gateway-API based ingress gateways of the Istio add-on so that I can use the Istio service mesh successfully.
12+
---
13+
14+
# Istio service mesh add-on gateway api ingress troubleshooting
15+
16+
This article discusses how to troubleshoot ingress gateways that are configured by using the [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) for the Istio service mesh add-on.
17+
18+
## Overview
19+
20+
Similar to the [classic Istio ingress gateways](./istio-add-on-ingress-gateway.md), Gateway API-based ingress gateways for the Istio add-on are Envoy-based reverse proxies. Users must have the [AKS Managed Gateway API CRDs](/azure/aks/managed-gateway-api) installed on their cluster before they can use the Istio add-on for Gateway API-based ingress.
21+
22+
## Before troubleshooting
23+
24+
Before you proceed, take the following actions:
25+
26+
- Install the [Managed Gateway API CRDs](/azure/aks/managed-gateway-api) on your cluster.
27+
- Make sure that you have the Istio add-on installed and are on ASM minor revision `asm-1-26` or a later revision. Follow the [installation guide](/azure/aks/istio-deploy-addon) to enable the Istio add-on and the [upgrade documentation](/azure/aks/istio-upgrade) to upgrade your mesh to `asm-1-26` if you're on an earlier revision.
28+
29+
## Networking, firewall, and load balancer errors troubleshooting
30+
31+
### Step 1: Make sure that Azure Load Balancer health probes are configured appropriately
32+
33+
In some cases, traffic from Azure Load Balancer to the Istio Gateway API Deployment is blocked because of failing health probes. You can address this issue by adding [Azure LoadBalancer annotations](https://cloud-provider-azure.sigs.k8s.io/topics/loadbalancer/) for the health probe path/port/protocol directly to the `Gateway` object, or by [customizing](#gateway-resource-customization-troubleshooting) the `GatewayClass`-level ConfigMap or the per-`Gateway` ConfigMap.
34+
35+
Gateway customization:
36+
37+
```yaml
38+
apiVersion: gateway.networking.k8s.io/v1
39+
kind: Gateway
40+
...
41+
...
42+
spec:
43+
infrastructure:
44+
annotations:
45+
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: "/healthz/ready"
46+
service.beta.kubernetes.io/port_80_health-probe_protocol: http
47+
service.beta.kubernetes.io/port_80_health-probe_port: "15021"
48+
```
49+
50+
ConfigMap customization:
51+
52+
```yaml
53+
apiVersion: v1
54+
kind: ConfigMap
55+
data:
56+
service: |
57+
metadata:
58+
annotations:
59+
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: "/healthz/ready"
60+
service.beta.kubernetes.io/port_80_health-probe_protocol: http
61+
service.beta.kubernetes.io/port_80_health-probe_port: "15021"
62+
```
63+
64+
You can also see whether health probes are failing by inspecting the `LoadBalancer` in the infrastructure resource group for the cluster on Azure Portal under `Settings/Properties`.
65+
66+
### Step 2: Make sure no firewall or NSG rules block ingress traffic
67+
68+
Verify that no [firewall](/azure/firewall/protect-azure-kubernetes-service) or [Network Security Group (NSG) rules](/azure/virtual-network/network-security-groups-overview) rules block traffic to the ingress gateway.
69+
70+
Double check whether you set restrictions to allow traffic to only the subnets of your user node pools. If the Gateway API pods are scheduled onto [system node pools](/azure/aks/use-system-pools?tabs=azure-cli), incoming traffic to these pods could be blocked. You can address this issue by allowing traffic to the subnets of your system node pools.
71+
72+
## Gateway configuration troubleshooting
73+
74+
### Step 1: Make sure the gatewayClassName is set to `istio`
75+
76+
Verify that all `Gateways` you created have the `spec.gatewayClassName` set to `istio`.
77+
78+
### Step 2: Verify cross-namespace references
79+
80+
Depending on the namespace that the `Gateway` and respective Routes are deployed in, the `Gateway` `spec.listeners.allowedRoutes` value should be set accordingly to allow Routes from only the same namespace or across different namespaces. Likewise, the `spec.parentRefs` value for Routes should reference the correct `Gateway` and provide the appropriate namespace for cross-namespace `Gateway` references. For more information, see the Gateway API docs on [cross-namespace routing](https://gateway-api.sigs.k8s.io/guides/multiple-ns/).
81+
82+
### Step 3: Inspect the `Gateway` for programming errors
83+
84+
If the `Gateway` has a programmed status of `failed` or `unknown`, you should inspect the `Gateway` object for more details. You can take this step by running `kubectl get gateway <gateway-name> -n <gateway-namespace> -o yaml` and `kubectl describe gateway <gateway-name> -n <gateway-namespace> `.
85+
86+
### Step 4: Inspect `istiod` and `Gateway` logs for errors
87+
88+
The `istiod` logs may have additional details about `Gateway` programming-related errors. If the gateway is programmed successfully, and the pod deployments are created, but other issues occur, try inspecting the `Gateway` pod logs for any potential errors. The `Gateway` pod deployment name follows the format, `<gateway-name>-istio`.
89+
90+
## Minor revision upgrades and revision label troubleshooting
91+
92+
By default during an [Istio add-on minor revision upgrade](/azure/aks/istio-upgrade), if two control planes are deployed on the cluster simultaneously, the higher revision takes ownership of the `Gateway` resources if the gateways aren't labeled with a specific ASM revision:
93+
94+
```yaml
95+
apiVersion: gateway.networking.k8s.io/v1
96+
kind: Gateway
97+
metadata:
98+
name: httpbin-gateway
99+
labels:
100+
istio.io/rev: asm-1-26
101+
spec:
102+
gatewayClassName: istio
103+
```
104+
105+
During the minor revision upgrade, verify that the pods and deployments for the gateway are automatically updated to have the new proxy minor image version that corresponds to the later control plane minor revision. If this condition isn't true, try to restart the Deployment.
106+
107+
If your gateways are labeled explicitly with an ASM revision, relabel them accordingly before you finish or roll back the upgrade operation.
108+
109+
## Gateway resource customization troubleshooting
110+
111+
The Istio add-on supports [customization of the resources](/azure/aks/istio-gateway-api#resource-customizations) that are created for the gateways, as follows:
112+
113+
- Deployment
114+
- Service
115+
- Horizontal Pod Autoscaler (HPA)
116+
- PodDisruptionBudget (PDB)
117+
118+
Follow these troubleshooting steps for issues that relate to configuring the `Gateway` resources.
119+
120+
### Step 1: Make sure that customization fields are on the allowlist
121+
122+
Make sure that the customizations for both `GatewayClass`-level ConfigMaps and `Gateway`-level ConfigMaps include only fields that are on the [allowlist](/azure/aks/istio-gateway-api#resource-customization-allowlist) for the specific resource.
123+
124+
### Step 2: Make sure that GatewayClass-level ConfigMap is configured correctly
125+
126+
`GatewayClass`-level ConfigMap `istio-gateway-class-defaults` is automatically deployed in the `aks-istio-system` namespace by the Istio add-on when the Managed Gateway API installation is enabled on the cluster. Notice that it could take up to five minutes for the `istio-gateway-class-defaults` ConfigMap to be deployed after you install the Managed Gateway API CRDs.
127+
128+
If you're editing this ConfigMap, make sure that you keep the `gateway.istio.io/defaults-for-class` label set to `istio`. You can have only one `GatewayClass`-level ConfigMap deployed at a time.
129+
130+
### Step 3: Verify gateway-level ConfigMap customizations
131+
132+
If both the `GatewayClass`-level ConfigMap and a `Gateway`-level ConfigMap are deployed, the `Gateway`-level ConfigMap customizations take precedence. Make sure that the desired resource customizations for the gateway are set in the `Gateway`-level ConfigMap. Also, verify that the `spec.infrastructure.parametersRef` field references the correct ConfigMap for that gateway.
133+
134+
### Step 4: Inspect gateway resource propagation errors
135+
136+
If the `Gateway` customizations don't propagate to their respective resources, verify that the ConfigMap spec is valid in terms of indentation, correct field names, spelling, and so on. You should also inspect the `istiod` logs to see whether any issues affect template rendering or resource creation for the gateways.

support/azure/azure-kubernetes/extensions/istio-add-on-general-troubleshooting.md

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: General Istio service mesh add-on troubleshooting
2+
title: General Istio Service Mesh Add-on Troubleshooting
33
description: Learn how to do general troubleshooting of the Istio service mesh add-on for Azure Kubernetes Service (AKS).
44
ms.date: 03/18/2025
55
author: nshankar13
@@ -13,7 +13,7 @@ ms.custom: sap:Extensions, Policies and Add-Ons
1313
---
1414
# General troubleshooting of the Istio service mesh add-on
1515

16-
This article discusses general strategies (that use `kubectl`, `istioctl`, and other tools) to troubleshoot issues that are related to the Istio service mesh add-on for Microsoft Azure Kubernetes Service (AKS). This article also provides a list of possible error messages, reasons for error occurrences, and recommendations to resolve these errors.
16+
This article discusses general strategies (that use `kubectl`, `istioctl`, and other tools) to troubleshoot issues that are related to the Istio service mesh add-on for Microsoft Azure Kubernetes Service (AKS). This article also provides a list of possible error messages, reasons for error occurrences, and recommendations to resolve these errors.
1717

1818
## Prerequisites
1919

@@ -50,15 +50,15 @@ kubectl delete pods <istio-pod> --namespace aks-istio-system
5050
The Istio pod is managed by a deployment. It's automatically re-created and redeployed after you delete it directly. Therefore, deleting the pod is an alternative method for restarting the pod.
5151

5252
> [!NOTE]
53-
> Alternatively, you can restart the deployment directly by running the following [kubectl rollout restart](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_rollout/kubectl_rollout_restart/) command:
53+
> You can also restart the deployment directly by running the following [kubectl rollout restart](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_rollout/kubectl_rollout_restart/) command:
5454
>
5555
> ```bash
5656
> kubectl rollout restart deployment <istiod-asm-revision> --namespace aks-istio-system
5757
> ```
5858
5959
### Step 3: Check the status of resources
6060
61-
If Istiod isn't scheduled, or if the pod isn't responding, you might want to check the status of the deployment and the replica sets. To do this, run the [kubectl get](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_get/) command:
61+
If Istiod isn't scheduled, or if the pod isn't responding, you might want to check the status of the deployment and the replica sets. To do this step, run the [kubectl get](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_get/) command:
6262
6363
```bash
6464
kubectl get <resource-type> [[--selector app=istiod] | [<resource-name>]]
@@ -112,7 +112,7 @@ kubectl logs <pod-name> --namespace <pod-namespace> --container istio-proxy
112112

113113
## Troubleshooting checklist: Using istioctl
114114

115-
The following troubleshooting steps describe how to collect information and debug your mesh environment by running various `istioctl` commands.
115+
The following troubleshooting steps discuss how to collect information and debug your mesh environment by running various `istioctl` commands.
116116

117117
All `istioctl` commands must be run together with the `--istioNamespace aks-istio-system` flag to point to the AKS add-on installation of Istio.
118118

@@ -229,25 +229,35 @@ To address common traffic management and security misconfiguration issues that I
229229
230230
For links to discussion about other issues, such as sidecar injection, observability, and upgrades, see [Common problems](https://istio.io/latest/docs/ops/common-problems/) on the Istio documentation site.
231231
232-
### Step 3: Avoid CoreDNS overload
232+
### Step 3: Verify protocol selection
233+
234+
Although Istio can automatically detect any TCP-based protocol, in certain cases, the protocol in the `Service` spec may have to be [explicitly declared](https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/) to unblock communication issues. This can be done by setting the protocol in the port `name` or in `appProtocol`. In this case, `appProtocol` takes precedence. For instance, certain scenarios might require you to set the protocol to `tcp` to proxy traffic as raw TCP, as opposed to HTTP or HTTPS.
235+
236+
### Step 4: Avoid CoreDNS overload
233237
234238
Issues that relate to CoreDNS overload might require you to change certain Istio DNS settings, such as the `dnsRefreshRate` field in the Istio MeshConfig definition.
235239
236-
### Step 4: Fix pod and sidecar race conditions
240+
### Step 5: Fix pod and sidecar race conditions
241+
242+
If your application pod starts before the Envoy sidecar starts, the application might become unresponsive, or it might restart. For instructions to avoid this problem, see [Pod or containers start with network issues if istio-proxy is not ready](https://istio.io/latest/docs/ops/common-problems/injection/#pod-or-containers-start-with-network-issues-if-istio-proxy-is-not-ready). Specifically, setting the `holdApplicationUntilProxyStarts` MeshConfig field under `defaultConfig` to `true` can help prevent these race conditions.
243+
244+
### Step 6: Verify OutboundTrafficPolicy mode and Service Entry configuration for outbound access
245+
246+
Issues that relate to outbound access or [egress gateways](./istio-add-on-egress-gateway.md) might occur because of certain Istio configurations that pertain to external service configuration. Verify whether the `outboundTrafficPolicy.mode` either in the [Shared MeshConfig](./istio-add-on-meshconfig.md) or `Sidecar` custom resources is set to `REGISTRY_ONLY`. If so, then a `ServiceEntry` must be explicitly declared for external service to enable outbound access. When you use egress gateways, the resolution for the ServiceEntry must be set to `DNS`.
237247
238-
If your application pod starts before the Envoy sidecar starts, the application might become unresponsive, or it might restart. For instructions about how to avoid this problem, see [Pod or containers start with network issues if istio-proxy is not ready](https://istio.io/latest/docs/ops/common-problems/injection/#pod-or-containers-start-with-network-issues-if-istio-proxy-is-not-ready). Specifically, setting the `holdApplicationUntilProxyStarts` MeshConfig field under `defaultConfig` to `true` can help prevent these race conditions.
248+
Also, keep in mind that, by default, `ServiceEntries` is exported across all namespaces. To restrict the scope of a `ServiceEntry` to a particular namespace, you should use the `exportTo` field in the [spec](https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-export_to).
239249
240-
### Step 5: Configure a Service Entry when using an HTTP proxy for outbound traffic
250+
### Step 7: Configure a Service Entry when using an HTTP proxy for outbound traffic
241251
242-
If your cluster uses an HTTP proxy for outbound internet access, you'll have to configure a Service Entry. For more information, see [HTTP proxy support in Azure Kubernetes Service](/azure/aks/http-proxy#istio-add-on-http-proxy-for-external-services).
252+
If your cluster uses an HTTP proxy for outbound internet access, you have to configure a Service Entry. For more information, see [HTTP proxy support in Azure Kubernetes Service](/azure/aks/http-proxy#istio-add-on-http-proxy-for-external-services).
243253
244-
### Step 6: Enable Envoy access logging
254+
### Step 8: Enable Envoy access logging
245255
246256
Enabling Envoy [access logging](https://istio.io/latest/docs/tasks/observability/logs/access-log/) helps identify and pinpoint issues in the gateways and sidecar proxies. For more information about logging and telemetry collection for the Istio add-on, see the documentation on [mesh configuration](/azure/aks/istio-meshconfig), [Telemetry API](/azure/aks/istio-telemetry), and [Istio metrics collection](/azure/aks/istio-metrics-managed-prometheus).
247257
248258
## Error messages
249259
250-
The following table contains a list of possible error messages (for deploying the add-on, enabling ingress gateways, and performing upgrades), the reason why an error occurred, and recommendations for resolving the error.
260+
The following table contains a list of possible error messages (for deploying the add-on, enabling ingress gateways, and performing upgrades), the reason why an error occurs, and recommendations to resolve the error.
251261
252262
| Error | Reason | Recommendations |
253263
|--|--|--|

0 commit comments

Comments
 (0)