Skip to content

Commit 3cfa05a

Browse files
committed
AKS: Consolidate DNS troubleshooting guides
Signed-off-by: Qasim Sarfraz <[email protected]>
1 parent 71784ec commit 3cfa05a

7 files changed

Lines changed: 338 additions & 208 deletions

.openpublishing.redirection.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13745,6 +13745,18 @@
1374513745
{
1374613746
"source_path": "support/dynamics-365/commerce/ecommerce-storefront/pickup-store-link-missing.md",
1374713747
"redirect_url": "/previous-versions/troubleshoot/dynamics-365/commerce/ecommerce-storefront/pickup-store-link-missing"
13748+
},
13749+
{
13750+
"source_path": "support/azure/azure-kubernetes/connectivity/basic-troubleshooting-dns-resolution-problems.md",
13751+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/connectivity/dns/basic-troubleshooting-dns-resolution-problems"
13752+
},
13753+
{
13754+
"source_path": "support/azure/azure-kubernetes/connectivity/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md",
13755+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/connectivity/dns/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time"
13756+
},
13757+
{
13758+
"source_path": "support/azure/azure-kubernetes/connectivity/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md",
13759+
"redirect_url": "/troubleshoot/azure/azure-kubernetes/connectivity/dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node"
1374813760
}
1374913761
]
1375013762
}

support/azure/azure-kubernetes/connectivity/basic-troubleshooting-outbound-connections.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ For basic troubleshooting for egress traffic from an AKS cluster, follow these s
116116

117117
1. [Check whether the cluster can reach any other external endpoint](./troubleshoot-connections-endpoints-outside-virtual-network.md).
118118

119-
1. [Check whether a network policy is blocking the traffic](./troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
119+
1. [Check whether a network policy is blocking the traffic](./dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
120120

121121
1. [Check whether an NSG is blocking the traffic](./traffic-between-node-pools-is-blocked.md).
122122

@@ -278,7 +278,7 @@ To verify that the endpoint is reachable from the node where the problematic pod
278278
IP4Address : 23.200.197.152
279279
```
280280

281-
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider [checking DNS resolution failures from inside the pod but not from the worker node](troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md). If you want to inspect DNS resolution for an endpoint across the cluster, you can consider [checking DNS resolution status across the cluster](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md#step-3-verify-the-health-of-the-upstream-dns-servers).
281+
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider [checking DNS resolution failures from inside the pod but not from the worker node](dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md). If you want to inspect DNS resolution for an endpoint across the cluster, you can consider [checking DNS resolution status across the cluster](dns/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md#step-3-verify-the-health-of-the-upstream-dns-servers).
282282

283283
If the DNS resolution is successful, continue to the network tests. Otherwise, verify the DNS configuration for the cluster.
284284

support/azure/azure-kubernetes/connectivity/basic-troubleshooting-dns-resolution-problems.md renamed to support/azure/azure-kubernetes/connectivity/dns/basic-troubleshooting-dns-resolution-problems.md

Lines changed: 98 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,14 @@ description: Learn how to create a troubleshooting workflow to fix DNS resolutio
44
author: sturrent
55
ms.author: seturren
66
ms.date: 08/09/2024
7-
ms.reviewer: v-rekhanain, v-leedennis, josebl, v-weizhu
7+
ms.reviewer: v-rekhanain, v-leedennis, josebl, v-weizhu, qasimsarfraz
88
editor: v-jsitser
99
ms.service: azure-kubernetes-service
1010
ms.custom: sap:Connectivity
1111
ms.topic: troubleshooting-general
1212
#Customer intent: As an Azure Kubernetes user, I want to learn how to create a troubleshooting workflow so that I can fix DNS resolution problems in Azure Kubernetes Service (AKS).
1313
---
14-
# Basic troubleshooting of DNS resolution problems in AKS
14+
# Troubleshoot DNS resolution problems in AKS
1515

1616
This article discusses how to create a troubleshooting workflow to fix Domain Name System (DNS) resolution problems in Microsoft Azure Kubernetes Service (AKS).
1717

@@ -82,9 +82,9 @@ To start the process, run tests from a test pod against each layer.
8282
spec:
8383
containers:
8484
- name: aks-test
85-
image: contoso/debian-ssh
85+
image: debian:stable
8686
command: ["/bin/sh"]
87-
args: ["-c", "while true; do sleep 1000; done"]
87+
args: ["-c", "apt-get update && apt-get install -y dnsutils && while true; do sleep 1000; done"]
8888
EOF
8989
```
9090
@@ -94,7 +94,7 @@ To start the process, run tests from a test pod against each layer.
9494
kubectl get pod --namespace kube-system --selector k8s-app=kube-dns --output wide
9595
```
9696
97-
1. Connect to the test pod and test the DNS resolution against each CoreDNS pod IP address by running the following commands:
97+
1. Connect to the test pod (using `kubectl exec -it aks-test -- bash`) and test the DNS resolution against each CoreDNS pod IP address by running the following commands:
9898
9999
```bash
100100
# Placeholder values
@@ -109,6 +109,8 @@ To start the process, run tests from a test pod against each layer.
109109
done
110110
```
111111
112+
For more information about troubleshooting DNS resolution problems from the pod level, see [Troubleshoot DNS resolution failures from inside the pod](troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
113+
112114
##### Test the DNS resolution at CoreDNS service level
113115
114116
1. Retrieve the CoreDNS service IP address by running the following `kubectl get` command:
@@ -161,7 +163,50 @@ To start the process, run tests from a test pod against each layer.
161163
162164
Examine the DNS server configuration of the virtual network, and determine whether the servers can resolve the record in question.
163165
164-
#### Part 2: Review the health and performance of nodes
166+
#### Part 2: Review the health and performance of CoreDNS pods and nodes
167+
168+
##### Review the health and performance of CoreDNS pods
169+
170+
You can use kubectl commands to check the health and performance of CoreDNS pods. Start by verifying that the CoreDNS pods are running:
171+
172+
```bash
173+
kubectl get pods -l k8s-app=kube-dns -n kube-system
174+
```
175+
176+
Check whether the CoreDNS pods are overused:
177+
178+
```bash
179+
kubectl top pods -n kube-system -l k8s-app=kube-dns
180+
```
181+
182+
```output
183+
NAME CPU(cores) MEMORY(bytes)
184+
coredns-dc97c5f55-424f7 3m 23Mi
185+
coredns-dc97c5f55-wbh4q 3m 25Mi
186+
```
187+
188+
Verify that the nodes that host the CoreDNS pods aren't overused. Also, get the nodes that are hosting the CoreDNS pods:
189+
190+
```bash
191+
kubectl get pods -n kube-system -l k8s-app=kube-dns -o jsonpath='{.items[*].spec.nodeName}'
192+
```
193+
194+
Check the usage of these nodes:
195+
196+
```bash
197+
kubectl top nodes
198+
```
199+
200+
Verify the logs for the CoreDNS pods:
201+
202+
```bash
203+
kubectl logs -l k8s-app=kube-dns -n kube-system
204+
```
205+
206+
> [!NOTE]
207+
> To see more debugging information, enable verbose logs in CoreDNS. To enable verbose logging in CoreDNS, see [Troubleshooting CoreDNS customizations in AKS](/azure/aks/coredns-custom#troubleshooting).
208+
209+
##### Review the health and performance of nodes
165210
166211
You might first notice DNS resolution performance problems as intermittent errors, such as time-outs. The main causes of this problem include resource exhaustion and I/O throttling within nodes that host the CoreDNS pods or the client pod.
167212
@@ -213,21 +258,57 @@ Allocated resources:
213258
214259
To get a better picture of resource usage at the pod and node level, you can also use Container insights and other cloud-native tools in Azure. For more information, see [Monitor Kubernetes clusters using Azure services and cloud native tools](/azure/azure-monitor/containers/monitor-kubernetes).
215260
216-
#### Part 3: Capture DNS traffic and review DNS resolution performance
261+
#### Part 3: Analyze DNS traffic and review DNS resolution performance
262+
263+
Analyzing DNS traffic can help you understand how your AKS cluster is handling the DNS queries. Ideally, you want to reproduce the problem on a test pod while you capture the traffic from this test pod and on each of the CoreDNS pods.
217264
218-
A network traffic capture can help you understand how your AKS cluster is handling the DNS queries. Ideally, you want to reproduce the problem on a test pod while you capture the traffic from this test pod and on each of the CoreDNS pods.
265+
There are two main ways to analyze DNS traffic:
219266
220-
Many traffic-capturing tools are available to assist this process, including the following tools:
267+
- Using real-time DNS analysis tools (e.g. [Inspektor Gadget](../../logs/capture-system-insights-from-aks.md#what-is-inspektor-gadget)) to analyze the DNS traffic in real time.
268+
- Using traffic capture tools (e.g. [Retina Capture](https://retina.sh/docs/Troubleshooting/capture), [Dumpy](https://github.com/larryTheSlap/dumpy)) to collect the DNS traffic and analyze the traffic in a network packet analyzer tool, such as Wireshark.
221269
222-
- [Retina Capture](https://retina.sh/docs/Troubleshooting/capture)
270+
In both the approaches the goal would be to understand the health and performance of DNS responses using DNS response codes, response times, and other metrics. You are free to choose the approach that best fits your needs.
223271
224-
- [Dumpy](https://github.com/larryTheSlap/dumpy) - an open source traffic capture plug-in for Kubernetes
272+
##### Real-time DNS traffic analysis
273+
274+
In this section, we will use [Inspektor Gadget](../../logs/capture-system-insights-from-aks.md#what-is-inspektor-gadget) to analyze the DNS traffic in real time. Follow this [guide](../../logs/capture-system-insights-from-aks.md#how-to-install-inspektor-gadget-in-an-aks-cluster) to install Inspektor Gadget to your cluster.
275+
We can use the following command to trace DNS traffic across all namespaces
276+
277+
```bash
278+
# Get the version of Gadget
279+
GADGET_VERSION=$(kubectl gadget version | grep Server | awk '{print $3}')
280+
# Run the trace_dns gadget
281+
kubectl gadget run trace_dns:$GADGET_VERSION --all-namespaces --fields "src,dst,name,qr,qtype,id,rcode,latency_ns"
282+
```
283+
284+
Where `--fields` is a comma-separated list of fields to be displayed. The following table describes the fields that are used in the command:
285+
- `src`: The source of the request with Kubernetes information (`<kind>/<namespace>/<name>:<port>`).
286+
- `dst`: The destination of the request with Kubernetes information (`<kind>/<namespace>/<name>:<port>`).
287+
- `name`: The name of the DNS request.
288+
- `qr`: The query/response flag.
289+
- `qtype`: The type of the DNS request.
290+
- `id`: The ID of the DNS request, which is used to match the request and response.
291+
- `rcode`: The response code of the DNS request.
292+
- `latency_ns`: The latency of the DNS request.
293+
294+
The output of the command will look like the following:
295+
296+
```output
297+
SRC DST NAME QR QTYPE ID RCODE LATENCY_NS
298+
p/default/aks-test:33141 p/kube-system/coredns-57d886c994-r2… db.contoso.com. Q A c215 0ns
299+
p/kube-system/coredns-57d886c994-r2… 168.63.129.16:53 db.contoso.com. Q A 323c 0ns
300+
168.63.129.16:53 p/kube-system/coredns-57d886c994-r2… db.contoso.com. R A 323c NameErr… 13.64ms
301+
p/kube-system/coredns-57d886c994-r2… p/default/aks-test:33141 db.contoso.com. R A c215 NameErr… 0ns
302+
p/default/aks-test:56921 p/kube-system/coredns-57d886c994-r2… db.contoso.com. Q A 6574 0ns
303+
p/kube-system/coredns-57d886c994-r2… p/default/aks-test:56921 db.contoso.com. R A 6574 NameErr… 0ns
304+
```
225305
226-
- [Inspektor Gadget](https://go.microsoft.com/fwlink/?linkid=2260072) - allows checking DNS problems in real time. For more information, see [Troubleshoot DNS failures across an AKS cluster in real time](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md).
306+
Here you can use `ID` to identify if a query has a response or not. The `RCODE` field will show you the response code of the DNS request. The `LATENCY_NS` field will show you the latency of the DNS request in nanoseconds. Together, these fields can help you understand the health and performance of DNS responses.
307+
For more information about real-time DNS analysis, see [Troubleshoot DNS failures across an AKS cluster in real time](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md)
227308
228-
In this article, we use Dumpy as an example of how to collect DNS traffic captures from each CoreDNS pod and a client DNS pod (in this case, the `aks-test` pod).
309+
##### Capture DNS traffic
229310
230-
##### Network traffic capture commands
311+
In this section, we use Dumpy as an example of how to collect DNS traffic captures from each CoreDNS pod and a client DNS pod (in this case, the `aks-test` pod).
231312
232313
To collect the captures from the test client pod, run the following Dumpy command:
233314
@@ -511,9 +592,9 @@ Observe the results of implementing your action plan. At this point, your action
511592
512593
If these troubleshooting steps don't resolve the problem, repeat the troubleshooting steps as necessary.
513594
514-
[!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
595+
[!INCLUDE [Third-party disclaimer](../../../../includes/third-party-disclaimer.md)]
515596
516-
[!INCLUDE [Third-party contact disclaimer](../../../includes/third-party-contact-disclaimer.md)]
597+
[!INCLUDE [Third-party contact disclaimer](../../../../includes/third-party-contact-disclaimer.md)]
517598
518-
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
599+
[!INCLUDE [Azure Help Support](../../../../includes/azure-help-support.md)]
519600

support/azure/azure-kubernetes/connectivity/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md renamed to support/azure/azure-kubernetes/connectivity/dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -261,6 +261,6 @@ We recommend that you don't combine Azure DNS with custom DNS servers in the vir
261261
262262
For more information, see [Name resolution that uses your own DNS server](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances#name-resolution-that-uses-your-own-dns-server).
263263
264-
[!INCLUDE [Third-party contact disclaimer](../../../includes/third-party-contact-disclaimer.md)]
264+
[!INCLUDE [Third-party contact disclaimer](../../../../includes/third-party-contact-disclaimer.md)]
265265
266-
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
266+
[!INCLUDE [Azure Help Support](../../../../includes/azure-help-support.md)]

0 commit comments

Comments
 (0)