You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/connectivity/basic-troubleshooting-outbound-connections.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,7 +116,7 @@ For basic troubleshooting for egress traffic from an AKS cluster, follow these s
116
116
117
117
1.[Check whether the cluster can reach any other external endpoint](./troubleshoot-connections-endpoints-outside-virtual-network.md).
118
118
119
-
1.[Check whether a network policy is blocking the traffic](./troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
119
+
1.[Check whether a network policy is blocking the traffic](./dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
120
120
121
121
1.[Check whether an NSG is blocking the traffic](./traffic-between-node-pools-is-blocked.md).
122
122
@@ -278,7 +278,7 @@ To verify that the endpoint is reachable from the node where the problematic pod
278
278
IP4Address : 23.200.197.152
279
279
```
280
280
281
-
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider [checking DNS resolution failures from inside the pod but not from the worker node](troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md). If you want to inspect DNS resolution for an endpoint across the cluster, you can consider [checking DNS resolution status across the cluster](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md#step-3-verify-the-health-of-the-upstream-dns-servers).
281
+
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider [checking DNS resolution failures from inside the pod but not from the worker node](dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md). If you want to inspect DNS resolution for an endpoint across the cluster, you can consider [checking DNS resolution status across the cluster](dns/troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md#step-3-verify-the-health-of-the-upstream-dns-servers).
282
282
283
283
If the DNS resolution is successful, continue to the network tests. Otherwise, verify the DNS configuration for the cluster.
#Customer intent: As an Azure Kubernetes user, I want to learn how to create a troubleshooting workflow so that I can fix DNS resolution problems in Azure Kubernetes Service (AKS).
13
13
---
14
-
# Basic troubleshooting of DNS resolution problems in AKS
14
+
# Troubleshoot DNS resolution problems in AKS
15
15
16
16
This article discusses how to create a troubleshooting workflow to fix Domain Name System (DNS) resolution problems in Microsoft Azure Kubernetes Service (AKS).
17
17
@@ -82,9 +82,9 @@ To start the process, run tests from a test pod against each layer.
82
82
spec:
83
83
containers:
84
84
- name: aks-test
85
-
image: contoso/debian-ssh
85
+
image: debian:stable
86
86
command: ["/bin/sh"]
87
-
args: ["-c", "while true; do sleep 1000; done"]
87
+
args: ["-c", "apt-get update && apt-get install -y dnsutils && while true; do sleep 1000; done"]
88
88
EOF
89
89
```
90
90
@@ -94,7 +94,7 @@ To start the process, run tests from a test pod against each layer.
94
94
kubectl get pod --namespace kube-system --selector k8s-app=kube-dns --output wide
95
95
```
96
96
97
-
1. Connect to the test pod and test the DNS resolution against each CoreDNS pod IP address by running the following commands:
97
+
1. Connect to the test pod (using `kubectl exec -it aks-test -- bash`) and test the DNS resolution against each CoreDNS pod IP address by running the following commands:
98
98
99
99
```bash
100
100
# Placeholder values
@@ -109,6 +109,8 @@ To start the process, run tests from a test pod against each layer.
109
109
done
110
110
```
111
111
112
+
For more information about troubleshooting DNS resolution problems from the pod level, see [Troubleshoot DNS resolution failures from inside the pod](troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md).
113
+
112
114
##### Test the DNS resolution at CoreDNS service level
113
115
114
116
1. Retrieve the CoreDNS service IP address by running the following `kubectl get` command:
@@ -161,7 +163,50 @@ To start the process, run tests from a test pod against each layer.
161
163
162
164
Examine the DNS server configuration of the virtual network, and determine whether the servers can resolve the record in question.
163
165
164
-
#### Part 2: Review the health and performance of nodes
166
+
#### Part 2: Review the health and performance of CoreDNS pods and nodes
167
+
168
+
##### Review the health and performance of CoreDNS pods
169
+
170
+
You can use kubectl commands to check the health and performance of CoreDNS pods. Start by verifying that the CoreDNS pods are running:
171
+
172
+
```bash
173
+
kubectl get pods -l k8s-app=kube-dns -n kube-system
174
+
```
175
+
176
+
Check whether the CoreDNS pods are overused:
177
+
178
+
```bash
179
+
kubectl top pods -n kube-system -l k8s-app=kube-dns
180
+
```
181
+
182
+
```output
183
+
NAME CPU(cores) MEMORY(bytes)
184
+
coredns-dc97c5f55-424f7 3m 23Mi
185
+
coredns-dc97c5f55-wbh4q 3m 25Mi
186
+
```
187
+
188
+
Verify that the nodes that host the CoreDNS pods aren't overused. Also, get the nodes that are hosting the CoreDNS pods:
189
+
190
+
```bash
191
+
kubectl get pods -n kube-system -l k8s-app=kube-dns -o jsonpath='{.items[*].spec.nodeName}'
192
+
```
193
+
194
+
Check the usage of these nodes:
195
+
196
+
```bash
197
+
kubectl top nodes
198
+
```
199
+
200
+
Verify the logs for the CoreDNS pods:
201
+
202
+
```bash
203
+
kubectl logs -l k8s-app=kube-dns -n kube-system
204
+
```
205
+
206
+
> [!NOTE]
207
+
> To see more debugging information, enable verbose logs in CoreDNS. To enable verbose logging in CoreDNS, see [Troubleshooting CoreDNS customizations in AKS](/azure/aks/coredns-custom#troubleshooting).
208
+
209
+
##### Review the health and performance of nodes
165
210
166
211
You might first notice DNS resolution performance problems as intermittent errors, such as time-outs. The main causes of this problem include resource exhaustion and I/O throttling within nodes that host the CoreDNS pods or the client pod.
167
212
@@ -213,21 +258,57 @@ Allocated resources:
213
258
214
259
To get a better picture of resource usage at the pod and node level, you can also use Container insights and other cloud-native tools in Azure. For more information, see [Monitor Kubernetes clusters using Azure services and cloud native tools](/azure/azure-monitor/containers/monitor-kubernetes).
215
260
216
-
#### Part 3: Capture DNS traffic and review DNS resolution performance
261
+
#### Part 3: Analyze DNS traffic and review DNS resolution performance
262
+
263
+
Analyzing DNS traffic can help you understand how your AKS cluster is handling the DNS queries. Ideally, you want to reproduce the problem on a test pod while you capture the traffic from this test pod and on each of the CoreDNS pods.
217
264
218
-
A network traffic capture can help you understand how your AKS cluster is handling the DNS queries. Ideally, you want to reproduce the problem on a test pod while you capture the traffic from this test pod and on each of the CoreDNS pods.
265
+
There are two main ways to analyze DNS traffic:
219
266
220
-
Many traffic-capturing tools are available to assist this process, including the following tools:
267
+
- Using real-time DNS analysis tools (e.g. [Inspektor Gadget](../../logs/capture-system-insights-from-aks.md#what-is-inspektor-gadget)) to analyze the DNS traffic in real time.
268
+
- Using traffic capture tools (e.g. [Retina Capture](https://retina.sh/docs/Troubleshooting/capture), [Dumpy](https://github.com/larryTheSlap/dumpy)) to collect the DNS traffic and analyze the traffic in a network packet analyzer tool, such as Wireshark.
In both the approaches the goal would be to understand the health and performance of DNS responses using DNS response codes, response times, and other metrics. You are free to choose the approach that best fits your needs.
223
271
224
-
- [Dumpy](https://github.com/larryTheSlap/dumpy) - an open source traffic capture plug-in for Kubernetes
272
+
##### Real-time DNS traffic analysis
273
+
274
+
In this section, we will use [Inspektor Gadget](../../logs/capture-system-insights-from-aks.md#what-is-inspektor-gadget) to analyze the DNS traffic in real time. Follow this [guide](../../logs/capture-system-insights-from-aks.md#how-to-install-inspektor-gadget-in-an-aks-cluster) to install Inspektor Gadget to your cluster.
275
+
We can use the following command to trace DNS traffic across all namespaces
276
+
277
+
```bash
278
+
# Get the version of Gadget
279
+
GADGET_VERSION=$(kubectl gadget version | grep Server | awk '{print $3}')
280
+
# Run the trace_dns gadget
281
+
kubectl gadget run trace_dns:$GADGET_VERSION --all-namespaces --fields "src,dst,name,qr,qtype,id,rcode,latency_ns"
282
+
```
283
+
284
+
Where `--fields` is a comma-separated list of fields to be displayed. The following table describes the fields that are used in the command:
285
+
- `src`: The source of the request with Kubernetes information (`<kind>/<namespace>/<name>:<port>`).
286
+
- `dst`: The destination of the request with Kubernetes information (`<kind>/<namespace>/<name>:<port>`).
287
+
- `name`: The name of the DNS request.
288
+
- `qr`: The query/response flag.
289
+
- `qtype`: The type of the DNS request.
290
+
- `id`: The ID of the DNS request, which is used to match the request and response.
291
+
- `rcode`: The response code of the DNS request.
292
+
- `latency_ns`: The latency of the DNS request.
293
+
294
+
The output of the command will look like the following:
295
+
296
+
```output
297
+
SRC DST NAME QR QTYPE ID RCODE LATENCY_NS
298
+
p/default/aks-test:33141 p/kube-system/coredns-57d886c994-r2… db.contoso.com. Q A c215 0ns
299
+
p/kube-system/coredns-57d886c994-r2… 168.63.129.16:53 db.contoso.com. Q A 323c 0ns
300
+
168.63.129.16:53 p/kube-system/coredns-57d886c994-r2… db.contoso.com. R A 323c NameErr… 13.64ms
301
+
p/kube-system/coredns-57d886c994-r2… p/default/aks-test:33141 db.contoso.com. R A c215 NameErr… 0ns
302
+
p/default/aks-test:56921 p/kube-system/coredns-57d886c994-r2… db.contoso.com. Q A 6574 0ns
303
+
p/kube-system/coredns-57d886c994-r2… p/default/aks-test:56921 db.contoso.com. R A 6574 NameErr… 0ns
304
+
```
225
305
226
-
- [Inspektor Gadget](https://go.microsoft.com/fwlink/?linkid=2260072) - allows checking DNS problems in real time. For more information, see [Troubleshoot DNS failures across an AKS cluster in real time](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md).
306
+
Here you can use `ID` to identify if a query has a response or not. The `RCODE` field will show you the response code of the DNS request. The `LATENCY_NS` field will show you the latency of the DNS request in nanoseconds. Together, these fields can help you understand the health and performance of DNS responses.
307
+
For more information about real-time DNS analysis, see [Troubleshoot DNS failures across an AKS cluster in real time](troubleshoot-dns-failures-across-an-aks-cluster-in-real-time.md)
227
308
228
-
In this article, we use Dumpy as an example of how to collect DNS traffic captures from each CoreDNS pod and a client DNS pod (in this case, the `aks-test` pod).
309
+
##### Capture DNS traffic
229
310
230
-
##### Network traffic capture commands
311
+
In this section, we use Dumpy as an example of how to collect DNS traffic captures from each CoreDNS pod and a client DNS pod (in this case, the `aks-test` pod).
231
312
232
313
To collect the captures from the test client pod, run the following Dumpy command:
233
314
@@ -511,9 +592,9 @@ Observe the results of implementing your action plan. At this point, your action
511
592
512
593
If these troubleshooting steps don't resolve the problem, repeat the troubleshooting steps as necessary.
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/connectivity/dns/troubleshoot-dns-failure-from-pod-but-not-from-worker-node.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -261,6 +261,6 @@ We recommend that you don't combine Azure DNS with custom DNS servers in the vir
261
261
262
262
For more information, see [Name resolution that uses your own DNS server](/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances#name-resolution-that-uses-your-own-dns-server).
0 commit comments