You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-Reviewing[Network Sescurity Group][network-security-group-docs] rules for required traffic.
201
-
-Verifying subnet configuration in `AKSNodeClass`. For more information, see [AKSNodeClass documentation][aksnodeclass-subnet-config].
202
-
-Restarting CNI plugin pods.
203
-
-Checking`CoreDNS` configuration. For more information, see [CoreDNS documentation][coredns-troubleshoot].
200
+
-Review[Network Sescurity Group][network-security-group-docs] rules for required traffic.
201
+
-Verify subnet configuration in `AKSNodeClass`. For more information, see [AKSNodeClass documentation][aksnodeclass-subnet-config].
202
+
-Restart CNI plugin pods.
203
+
-Check`CoreDNS` configuration. For more information, see [CoreDNS documentation][coredns-troubleshoot].
204
204
205
205
### DNS service IP issues
206
206
207
207
>[!NOTE]
208
-
>The `--dns-service-ip` parameter is only supported for NAP (Node Auto Provisioning) clusters and is not available for self-hosted Karpenter installations.
208
+
>The `--dns-service-ip` parameter is only supported for NAP clusters and isn't available for self-hosted Karpenter installations.
209
209
210
-
**Symptoms**: Pods can't resolve DNS names or kubelet fails to register with API server due to DNS resolution failures.
210
+
**Symptoms**
211
+
212
+
Pods can't resolve DNS names or kubelet fails to register with API server due to DNS resolution failures.
213
+
214
+
**Debugging steps**
215
+
216
+
1.**Check kubelet DNS configuration**
211
217
212
-
**Debugging Steps**:
218
+
Run the following command:
213
219
214
-
1.**Check kubelet DNS configuration**:
215
220
```azurecli-interactive
216
221
# SSH to the Karpenter node and check kubelet config
217
222
sudo cat /var/lib/kubelet/config.yaml | grep -A 5 clusterDNS
5.**Validate network connectivity to DNS service**:
269
+
5.**Validate network connectivity to DNS service**
270
+
271
+
Run the following command:
272
+
256
273
```azurecli-interactive
257
274
# From the Karpenter node, test connectivity to DNS service
258
275
telnet 10.0.0.10 53 # Replace with your actual DNS service IP
259
276
# Or using nc if telnet is not available
260
277
nc -zv 10.0.0.10 53
261
278
```
262
279
263
-
**Common Causes**:
264
-
- Incorrect `--dns-service-ip` parameter in AKSNodeClass
265
-
- DNS service IP not in the service CIDR range
266
-
- Network connectivity issues between node and DNS service
267
-
- CoreDNS pods not running or misconfigured
268
-
- Firewall rules blocking DNS traffic
280
+
**Common causes**
269
281
270
-
**Solutions**:
271
-
- Verify `--dns-service-ip` matches the actual DNS service: `kubectl get svc -n kube-system kube-dns -o jsonpath='{.spec.clusterIP}'`
272
-
- Ensure DNS service IP is within the service CIDR range specified during cluster creation
273
-
- Check that Karpenter nodes can reach the service subnet
274
-
- Restart CoreDNS pods if they're in error state: `kubectl rollout restart deployment/coredns -n kube-system`
275
-
- Verify NSG rules allow traffic on port 53 (TCP/UDP)
276
-
- Run a connectivity analysis with the [Azure Virtual Network Verifier][connectivity-tool] tool to validate outbound connectivity
282
+
Common causes include:
277
283
278
-
## Azure-Specific Issues
284
+
- Incorrect `--dns-service-ip` parameter in `AKSNodeClass`.
285
+
- DNS service IP isn't in the service Classless Inter-Domain Routing (CIDR) range.
286
+
- Network connectivity issues between node and DNS service.
287
+
-`CoreDNS` pods not running or misconfigured.
288
+
- Firewall rules block DNS traffic.
279
289
280
-
### Spot VM Issues
290
+
**Solutions**
291
+
292
+
Solutions include:
281
293
282
-
**Symptoms**: Unexpected node terminations when using spot instances.
294
+
- Verify `--dns-service-ip` matches the actual DNS service. Do this with the following command: `kubectl get svc -n kube-system kube-dns -o jsonpath='{.spec.clusterIP}'`
295
+
- Ensure DNS service IP is within the service CIDR range specified during cluster creation.
296
+
- Check Karpenter nodes can reach the service subnets
297
+
- Restart `CoreDNS pods` if they're in error state. Do this with the following command: `kubectl rollout restart deployment/coredns -n kube-system`
298
+
- Verify NSG rules allow traffic on port 53 (TCP/User Datagram Protocol (UDP)).
299
+
- Run a connectivity analysis with [Azure Virtual Network Verifier](/azure/virtual-network-manager/overview) to validate outbound connectivity.
283
300
284
-
**Debugging Steps**:
301
+
## Azure-specific issues
285
302
286
-
1.**Check node events**:
303
+
### Spot virtual machine (VM) issues
304
+
305
+
**Symptoms**
306
+
307
+
Unexpected node terminations occur when using spot instances.
308
+
309
+
**Debugging steps**
310
+
311
+
1.**Check node events**
312
+
313
+
Run the following command:
287
314
288
315
```azurecli-interactive
289
316
kubectl get events | grep -i "spot\|evict"
290
317
```
291
318
292
-
2.**Monitor spot VM pricing**:
319
+
2.**Monitor spot VM pricing**
320
+
321
+
Run the following command:
293
322
294
323
```azurecli-interactive
295
324
az vm list-sizes --location <region> --query "[?contains(name, 'Standard_D2s_v3')]"
296
325
```
297
326
298
-
**Solutions**:
299
-
- Use diverse instance types for better availability
300
-
- Implement proper pod disruption budgets
301
-
- Consider mixed spot/on-demand strategies
302
-
- Use workloads tolerant of node preemption
327
+
**Solutions**
328
+
329
+
Solutions include:
330
+
331
+
- Use diverse instance types for better availability.
332
+
- Implement proper pod disruption budgets.
333
+
- Consider mixed spot and on-demand strategies.
334
+
- Use workloads tolerant of node preemption.
303
335
304
-
### Quota Exceeded
336
+
### Quota exceeded
305
337
306
-
**Symptoms**: VM creation fails with quota exceeded errors.
338
+
**Symptoms**
339
+
340
+
VM creation fails with quota exceeded errors.
307
341
308
-
**Debugging Steps**:
342
+
**Debugging steps**
309
343
310
-
1.**Check current quota usage**:
344
+
1.**Check current quota usage**
345
+
346
+
Run the following command:
347
+
311
348
```azurecli-interactive
312
349
az vm list-usage --location <region> --query "[?currentValue >= limit]"
313
350
```
314
351
315
-
**Solutions**:
316
-
- Request quota increases through Azure portal
317
-
- Expand NodePool CRD to more VM sizes. See [NodePool configuration documentation][nap-nodepool-docs] for details. For example, A NodePool specification which allows for D-family virtual machines is less likely to hit quota errors that stop VM creation, compared to a NodePool specification specific to only one exact VM Size.
318
-
319
-
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
- Expand nodepool custom resource definitions (CRDs) to more VM sizes. For more information, see [NodePool configuration documentation][nap-nodepool-docs]. For example, a nodepool specification that allows for D-family VM is less likely to hit quota errors that stop VM creation compared to a nodepool specification specific to only one exact VM size.
333
358
359
+
[!INCLUDE [Azure Help Support](~/includes/azure-help-support.md)]
0 commit comments