You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/connectivity/cannot-access-cluster-api-server-using-authorized-ip-ranges.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
---
2
2
title: Can't access the cluster API server using authorized IP ranges
3
3
description: Troubleshoot problems accessing the cluster API server when you use authorized IP address ranges in Azure Kubernetes Service (AKS).
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot access issues to the cluster API server when I use authorized IP address ranges so that I can work with my Azure Kubernetes Service (AKS) cluster successfully.
@@ -14,7 +14,9 @@ This article discusses how to resolve a scenario in which you can't use authoriz
14
14
15
15
## Symptoms
16
16
17
-
If you try to create or manage an AKS cluster, you can't access the cluster API server.
17
+
If you try to create or manage resources in an AKS cluster, you can't access the cluster API server. When you run `kubectl`, you receive the following error message:
18
+
19
+
> Unable to connect to the server: dial tcp x.x.x.x:443: i/o timeout
Copy file name to clipboardExpand all lines: support/azure/azure-kubernetes/connectivity/error-from-server-error-dialing-backend-dial-tcp.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
---
2
2
title: 'Error from server: error dialing backend: dial tcp'
3
3
description: 'Troubleshoot the error dialing backend: dial tcp error that blocks you from using kubectl commands or other tools when you connect to the API server.'
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the "Error from server: error dialing backend: dial tcp" error so that I can connect to the API server or use the `kubectl logs` command to get logs.
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot errors that occur after I restrict egress traffic so that I can access my AKS cluster successfully.
@@ -18,19 +18,21 @@ Certain commands of the [kubectl](https://kubernetes.io/docs/reference/kubectl/)
18
18
19
19
## Cause
20
20
21
-
When you restrict egress traffic from an AKS cluster, your settings must comply with [required Outbound network and FQDN rules for AKS clusters](/azure/aks/outbound-rules-control-egress). If your settings are in conflict with any of these rules, the symptoms of egress traffic restriction issues occur.
21
+
When you restrict egress traffic from an AKS cluster, your settings must comply with required Outbound network and FQDN (fully qualified domain names) rules for AKS clusters. If your settings are in conflict with any of these rules, the egress traffic restriction issues occur.
22
22
23
23
## Solution
24
24
25
-
Verify that your configuration doesn't conflict with any of the [required Outbound network and FQDN rules for AKS clusters](/azure/aks/outbound-rules-control-egress) for the following items:
25
+
Verify that your configuration doesn't conflict with any of the [required Outbound network and FQDN (fully qualified domain names) rules for AKS clusters](/azure/aks/outbound-rules-control-egress) for the following items:
26
26
27
27
- Outbound ports
28
28
- Network rules
29
-
-Fully qualified domain names (FQDNs)
29
+
- FQDNs
30
30
- Application rules
31
31
32
+
Check for conflicts with the rules that might occur in the NSG (network security group), firewall, or appliance that AKS traffic passes through according to the configuration.
33
+
32
34
> [!NOTE]
33
-
> The AKS outbound dependencies are almost entirely defined by using FQDNs. These FQDNs don't have static addresses behind them. The lack of static addresses means that you can't use network security groups (NSGs) to restrict outbound traffic from an AKS cluster.
35
+
> The AKS outbound dependencies are almost entirely defined by using FQDNs. These FQDNs don't have static addresses behind them. The lack of static addresses means that you can't use NSGs to restrict outbound traffic from an AKS cluster. Additionally, scenarios that allow only IPs that are obtained from required FQDNs after all deny in NSG are not enough to restrict outbound traffic. Because the IPs are not static, issues might occur later.
#Customer intent: As an Azure Kubernetes user, I want to avoid tunnel connectivity issues so that I can use an Azure Kubernetes Service (AKS) cluster successfully.
9
9
ms.custom: sap:Connectivity
10
10
---
@@ -251,6 +251,80 @@ If everything is OK within the application, you'll have to adjust the allocated
251
251
252
252
You can set up a new cluster to use a Managed Network Address Translation (NAT) Gateway for outbound connections. For more information, see [Create an AKS cluster with a Managed NAT Gateway](/azure/aks/nat-gateway#create-an-aks-cluster-with-a-managed-nat-gateway).
253
253
254
+
## Cause 6: Konnectivity Agents performance issues with Cluster growth
255
+
256
+
As the cluster grows, the performance of Konnectivity Agents might degrade because of increased network traffic, more requests, or resource constraints.
257
+
258
+
> [!NOTE]
259
+
> This cause applies to only the `Konnectivity-agent` pods.
260
+
261
+
### Solution 6: Cluster Proportional Autoscaler for Konnectivity Agent
262
+
263
+
To manage scalability challenges in large clusters, we implement the Cluster Proportional Autoscaler for our Konnectivity Agents. This approach aligns with industry standards and best practices. It ensures optimal resource usage and enhanced performance.
264
+
265
+
**Why this change was made**
266
+
Previously, the Konnectivity agent had a fixed replica count that could create a bottleneck as the cluster grew. By implementating the Cluster Proportional Autoscaler, we enable the replica count to adjust dynamically, based on node-scaling rules, to provide optimal performance and resource usage.
267
+
268
+
**How the Cluster Proportional Autoscaler works**
269
+
The Cluster Proportional Autoscaler work uses a ladder configuration to determine the number of Konnectivity agent replicas based on the cluster size. The ladder configuration is defined in the konnectivity-agent-autoscaler configmap in the kube-system namespace. Here is an example of the ladder configuration:
270
+
271
+
```
272
+
nodesToReplicas": [
273
+
[1, 2],
274
+
[100, 3],
275
+
[250, 4],
276
+
[500, 5],
277
+
[1000, 6],
278
+
[5000, 10]
279
+
]
280
+
```
281
+
282
+
This configuration makes sure that the number of replicas scales appropriately with the number of nodes in the cluster to provide optimal resource allocation and improved networking reliability.
283
+
284
+
**How to use the Cluster Proportional Autoscaler?**
285
+
You can override default values by updating the konnectivity-agent-autoscaler configmap in the kube-system namespace. Here is a sample command to update the configmap:
286
+
287
+
```bash
288
+
kubectl edit configmap <pod-name> -n kube-system
289
+
```
290
+
This command opens the configmap in an editor to enable you to make the necessary changes.
291
+
292
+
**What you should check**
293
+
294
+
You have to monitor for Out Of Memory (OOM) kills on the nodes because misconfiguration of the Cluster Proportional Autoscaler can cause insufficient memory allocation for the Konnectivity agents. This misconfiguration occurs for the following key reasons:
295
+
296
+
**High Memory Usage:** As the cluster grows, the memory usage of Konnectivity agents can increase significantly. This increase can occur especially during peak loads or when handling large numbers of connections. If the Cluster Proportional Autoscaler configuration does not scale the replicas appropriately, the agents may run out of memory.
297
+
298
+
**Fixed Resource Limits:** If the resource requests and limits for the Konnectivity agents are set too low, they might not have enough memory to handle the workload, leading to OOM kills. Misconfigured Cluster Proportional Autoscaler settings can exacerbate this issue by not providing enough replicas to distribute the load.
299
+
300
+
**Cluster Size and Workload Variability:** The CPU and memory that are needed by the Konnectivity agents can vary widely depending on the size of the cluster and the workload. If the Cluster Proportional Autoscaler ladder configuration is not right-sized and adaptively resized for the cluster's usage patterns, it can cause memory overcommitment and OOM kills.
301
+
302
+
To identify and troubleshoot OOM kills, follow these steps:
303
+
304
+
1. Check for OOM Kills on nodes: Use the following command to check for OOM Kills on your nodes:
305
+
306
+
```
307
+
kubectl get events --all-namespaces | grep -i 'oomkill'
308
+
```
309
+
310
+
2. Inspect Node Resource Usage: Verify the resource usage on your nodes to make sure that they aren't running out of memory:
311
+
312
+
```
313
+
kubectl top nodes
314
+
```
315
+
316
+
3. Review Pod Resource Requests and Limits: Make sure that the Konnectivity agent pods have appropriate resource requests and limits set to prevent OOM Kills:
317
+
318
+
```
319
+
kubectl get pod <pod-name> -n kube-system -o yaml | grep -A5 "resources:"
320
+
```
321
+
322
+
4. Adjust Resource Requests and Limits: If necessary, adjust the resource requests and limits for the Konnectivity agent pods by editing the deployment:
0 commit comments