Merge branch 'MicrosoftDocs:main' into Branch-CI5828

AmandaAZ · web-flow · commit 4ffe66e539da · 2025-05-22T09:02:56.000+08:00
diff --git a/support/azure/azure-kubernetes/availability-performance/identify-high-cpu-consuming-containers-aks.md b/support/azure/azure-kubernetes/availability-performance/identify-high-cpu-consuming-containers-aks.md
@@ -1,5 +1,5 @@
 ---
-title: Identify CPU saturation in AKS clusters
+title: Identify high CPU utilization in AKS clusters
 description: Troubleshoot high CPU that the node and containers consume in an AKS cluster.
 ms.date: 08/30/2024
 ms.reviewer: chiragpa, v-weizhu
@@ -8,6 +8,9 @@ ms.custom: sap:Node/node pool availability and performance
 ---
 # Troubleshoot high CPU usage in AKS clusters
 
+> [!NOTE]
+> This article discusses high CPU utilization. In many situations, CPU Pressure Stall Information (PSI) metrics provide a more accurate indication of CPU Pressure than utilization alone. For more information, see [Troubleshoot CPU pressure in AKS clusters using PSI metrics](troubleshoot-node-cpu-pressure-psi.md).
+
 High CPU usage is a symptom of one or more applications or processes that require so much CPU time that the performance or usability of the machine is impacted. High CPU usage can occur in many ways, but it's mostly caused by user configuration.
 
 When a node in an [Azure Kubernetes Service (AKS)](/azure/aks/intro-kubernetes) cluster experiences high CPU usage, the applications running on it can experience degradation in performance and reliability. Applications or processes also become unstable, which may lead to issues beyond slow responses.
diff --git a/support/azure/azure-kubernetes/availability-performance/media/troubleshoot-node-cpu-pressure-psi/configure-azure-monitor-for-containers.png b/support/azure/azure-kubernetes/availability-performance/media/troubleshoot-node-cpu-pressure-psi/configure-azure-monitor-for-containers.png
diff --git a/support/azure/azure-kubernetes/availability-performance/media/troubleshoot-node-cpu-pressure-psi/node-level-cpu-pressure.png b/support/azure/azure-kubernetes/availability-performance/media/troubleshoot-node-cpu-pressure-psi/node-level-cpu-pressure.png
diff --git a/support/azure/azure-kubernetes/availability-performance/troubleshoot-node-cpu-pressure-psi.md b/support/azure/azure-kubernetes/availability-performance/troubleshoot-node-cpu-pressure-psi.md
@@ -0,0 +1,154 @@
+---
+title: Troubleshoot CPU Pressure in AKS Clusters Using PSI Metrics
+description: Provides troubleshoot guidance on CPU pressure using PSI metrics in an AKS cluster.
+ms.date: 05/21/2025
+ms.reviewer: aritraghosh, dafell, alvinli, v-weizhu
+ms.service: azure-kubernetes-service
+ms.custom: sap:Node/node pool availability and performance
+---
+# Troubleshoot CPU pressure in AKS clusters using PSI metrics
+
+CPU pressure is a more accurate indicator of resource contention than traditional CPU utilization metrics. While high CPU usage shows resource consumption, it doesn't necessarily indicate performance problems. In an Azure Kubernetes Service (AKS) cluster, understanding CPU pressure through Pressure Stall Information (PSI) metrics helps identify true resource contention issues.
+
+When a node in an AKS cluster experiences CPU pressure, applications might suffer from poor performance even when CPU utilization appears moderate. PSI metrics provide insight into actual resource contention by measuring task delays rather than just resource consumption.
+
+This article helps you monitor CPU pressure using PSI metrics and provides best practices to resolve resource contention issues.
+
+## Symptoms
+
+The following table outlines the common symptoms of CPU pressure:
+
+|Symptom | Description |
+|---|---|
+|Increased application latency|Services respond slower even when CPU utilization appears moderate.|
+|Throttled containers|Containers experience delays in processing despite having CPU resources available on the node.|
+|Degraded performance|Applications experience unpredictable performance variations that don't correlate with CPU usage percentages.|
+
+## Troubleshooting checklist
+
+To identify and resolve CPU pressure issues, follow these steps:
+
+### Step 1: Enable and monitor PSI metrics
+
+Use one of the following methods to access PSI metrics:
+
+- In a web browser, use Azure Monitoring Managed Prometheus or other monitoring solution to query PSI metrics.
+- In a console, use the Kubernetes command-line tool (`kubectl`).
+
+### [Browser](#tab/browser)
+
+Azure Monitoring Managed Prometheus provides a way to monitor PSI metrics:
+
+1. Enable Azure Monitoring Managed Prometheus for your AKS cluster by following the instructions in [Enable Prometheus and Grafana](/azure/azure-monitor/containers/kubernetes-monitoring-enable#enable-prometheus-and-grafana).
+
+    To enable customized scrape metrics for Prometheus, see [Scrape configs](/azure/azure-monitor/containers/prometheus-metrics-scrape-configuration#scrape-configs). We recommend setting `minumum ingestion profile` to `false` and `node-exporter` to `true`.
+
+2. Navigate to the Azure Monitor workspace associated with the AKS cluster from the [Azure portal](https://portal.azure.com).
+
+    :::image type="content" source="media/troubleshoot-node-cpu-pressure-psi/configure-azure-monitor-for-containers.png" alt-text="Screenshot that shows how to navigate to the Azure Monitor workspace." lightbox="media/troubleshoot-node-cpu-pressure-psi/configure-azure-monitor-for-containers.png":::
+
+3. Under **Monitoring**, select **Metrics**.
+
+4. Select **Prometheus metrics** as the data source.
+
+    > [!NOTE]
+    > To use the metrics, you need to enable them in Azure Monitoring Managed Prometheus. These metrics are exposed by Node Exporter or cAdvisor.
+
+5. Query specific PSI metrics in Prometheus explorer:
+
+   - For node-level CPU pressure, use the `node_pressure_cpu_waiting_seconds_total` Prometheus Query Language (PromQL).
+
+      :::image type="content" source="media/troubleshoot-node-cpu-pressure-psi/node-level-cpu-pressure.png" alt-text="Screenshot that shows how to query node-level CPU pressure." lightbox="media/troubleshoot-node-cpu-pressure-psi/node-level-cpu-pressure.png":::
+
+   - For pod-level CPU pressure, use the `container_cpu_cfs_throttled_seconds_total` PromQL.
+
+6. Calculate the PSI-some percentage (percentage of time at least one task is stalled on CPU):
+
+   `rate(node_pressure_cpu_waiting_seconds_total[5m]) * 100`
+
+> [!NOTE]
+> Some of the container level metrics such as `container_pressure_cpu_waiting_seconds_total` and `container_pressure_cpu_stalled_seconds_total` aren't available in AKS as they're part of the Kubelet PSI feature gate that is in alpha state. AKS begins supporting the use of the feature when it reaches beta stage.
+
+### [Command Line](#tab/command-line)
+
+Access PSI metrics safely using kubectl without requiring Secure Shell (SSH) access:
+
+1. Use kubernetes proxy and node metrics:
+
+   ```bash
+   # Start the kubernetes proxy in a separate terminal
+   kubectl proxy
+   
+   # Access node metrics API
+   kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
+   ```
+
+2. For more detailed PSI metrics, use the `kubectl debug` feature to create a temporary debug pod:
+
+   ```bash
+   # Create a debug pod that mounts the host filesystem
+   kubectl debug node/<node_name> -it --image=busybox
+
+   # Once inside the debug pod, check PSI metrics
+   cat /host/proc/pressure/cpu
+   ```
+
+   Here's an example command output:
+
+   ```output
+   some avg10=0.00 avg60=0.00 avg300=0.00 total=0
+   full avg10=0.00 avg60=0.00 avg300=0.00 total=0
+   ```
+
+   - The `some` line indicates the percentage of time at least one task is stalled on CPU.
+   - The `full` line indicates the percentage of time all tasks are stalled on CPU.
+
+---
+
+### Step 2: Review best practices to prevent CPU pressure
+
+Review the following table to learn how to implement best practices for avoiding CPU pressure:
+
+| Best practice | Description |
+|---|---|
+|Focus on PSI metrics instead of utilization|Use PSI metrics as your primary indicator of resource contention rather than CPU utilization percentages. For more information, see [PSI - Pressure Stall Information](https://docs.kernel.org/accounting/psi.html).|
+|Identify pods utilizing the most CPU|Isolate the pods that are utilizing the most CPU and identify solutions to reduce pressure. For more information, see [Troubleshoot high CPU usage in AKS clusters](./identify-high-cpu-consuming-containers-aks.md).|
+|Minimize CPU limits|Consider removing CPU limits and rely on [Linux's Completely Fair Scheduler](https://docs.kernel.org/scheduler/sched-design-CFS.html) with CPU shares based on requests. For more information, see [Resource Management for Pods and Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/).|
+|Use appropriate Quality of Service (QoS) classes|Set the right QoSclass for each pod based on its importance and contention sensitivity. For more information, see [Configure Quality of Service for Pods](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/).|
+|Optimize pod placement|Use pod anti-affinity rules to avoid placing CPU-intensive workloads on the same nodes. For more information, see [Assigning Pods to Nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/).|
+|Monitor for brief pressure spikes|Short pressure spikes can indicate issues even when average utilization appears acceptable. For more information, see [Resource metrics pipeline](https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/).|
+
+## Key PSI metrics to monitor
+
+> [!NOTE]
+> If a node's CPU usage is moderate but the containers on the node experience CFS throttling, increase the resource limits, or remove them and follow [Linux's Completely Fair Scheduler (CFS)](https://docs.kernel.org/scheduler/sched-design-CFS.html) algorithm.
+
+### Node-level PSI metrics
+
+- `node_pressure_cpu_waiting_seconds_total`: Cumulative time tasks wait for CPU.
+- `node_cpu_seconds_total`: Traditional CPU utilization for comparison.
+
+### Container-level PSI indicators
+
+- `container_cpu_cfs_throttled_periods_total`: The number of periods a container is throttled.
+- `container_cpu_cfs_throttled_seconds_total`: Total time a container is throttled.
+- Throttling percentage: `rate(container_cpu_cfs_throttled_periods_total[5m]) / rate(container_cpu_cfs_periods_total[5m]) * 100`
+
+## Why using PSI metrics?
+
+AKS uses PSI metrics as an indicator for CPU pressure instead of load average for several reasons:
+
+- In oversized and multi-core nodes, load average often underreports CPU saturation.
+- On chattier and containerized nodes, load average can over-signal, leading to alert fatigue.
+- Since load average doesn't have per-cgroup visibility, noisy pods can hide behind a low system average.
+
+## References
+
+- [Linux PSI documentation](https://docs.kernel.org/accounting/psi.html)
+- [Kubernetes resource management](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/)
+- [AKS performance best practices](/azure/aks/concepts-clusters-workloads)
+- [Enable Prometheus and Grafana](/azure/azure-monitor/containers/kubernetes-monitoring-enable#enable-prometheus-and-grafana)
+- [Quality of Service in Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/)
+- [Linux Completely Fair Scheduler](https://docs.kernel.org/scheduler/sched-design-CFS.html)
+
+[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
diff --git a/support/azure/azure-kubernetes/toc.yml b/support/azure/azure-kubernetes/toc.yml
@@ -162,8 +162,10 @@
     href: availability-performance/container-image-pull-performance.md
   - name: AKS cluster/node is in failed state
     href: availability-performance/cluster-node-virtual-machine-failed-state.md
-  - name: Identify nodes and containers consuming high CPU
+  - name: Identify nodes and containers utilizing high CPU
     href: availability-performance/identify-high-cpu-consuming-containers-aks.md
+  - name: Identify containers facing high CPU pressure and throttling
+    href: availability-performance/troubleshoot-node-cpu-pressure-psi.md
   - name: Identify memory saturation in AKS clusters
     href: availability-performance/identify-memory-saturation-aks.md
   - name: Troubleshoot high memory consumption due to Linux kernel behaviors
diff --git a/support/azure/azure-storage/files/file-sync/file-sync-troubleshoot-installation.md b/support/azure/azure-storage/files/file-sync/file-sync-troubleshoot-installation.md
@@ -216,7 +216,7 @@ Reset-StorageSyncServer
 ```
 
 > [!Note]  
-> If the server is part of a cluster, use the `Reset-StorageSyncServer` `-CleanClusterRegistration` parameter to remove the server from the Azure File Sync cluster registration detail.
+> If the server is part of a cluster, the `Reset-StorageSyncServer` `-CleanClusterRegistration` parameter will unregister all servers in the cluster.
 
 <a id="web-site-not-trusted"></a>**When I register a server, I see numerous "web site not trusted" responses. Why?**
 
diff --git a/support/azure/azure-storage/files/performance/files-troubleshoot-performance.md b/support/azure/azure-storage/files/performance/files-troubleshoot-performance.md
@@ -3,7 +3,7 @@ title: Azure Files performance troubleshooting guide
 description: Troubleshoot performance issues with Azure file shares and discover potential causes and associated workarounds for these problems.
 ms.service: azure-file-storage
 ms.custom: sap:Performance, linux-related-content
-ms.date: 01/23/2025
+ms.date: 05/21/2025
 ms.reviewer: kendownie, v-weizhu
 #Customer intent: As a system admin, I want to troubleshoot performance issues with Azure file shares to improve performance for applications and users.
 ---
@@ -120,7 +120,7 @@ If you're using a premium file share, increase the provisioned file share size t
 
 If the majority of your requests are metadata-centric (such as `createfile`, `openfile`, `closefile`, `queryinfo`, or `querydirectory`), the latency will be worse than that of read/write operations.
 
-To determine whether most of your requests are metadata-centric, start by following steps 1-4 as previously outlined in Cause 1. For step 5, instead of adding a filter for **Response type**, add a property filter for **API name**.
+To determine whether most of your requests are metadata-centric, start by following steps 1-4 as previously outlined in Cause 1. For step 5, instead of adding a filter for **Response type**, add a property filter for **API name**. For more information, see [Monitor utilization by metadata IOPS](/azure/storage/files/analyze-files-metrics?tabs=azure-portal#monitor-utilization-by-metadata-iops).
 
 :::image type="content" source="media/files-troubleshoot-performance/metadata-metrics.png" alt-text="Screenshot that shows the 'API name' property filter.":::
 
diff --git a/support/azure/virtual-machines/windows/media/please-wait-for-the-group-policy-client/please-wait-for-the-group-policy-client.png b/support/azure/virtual-machines/windows/media/please-wait-for-the-group-policy-client/please-wait-for-the-group-policy-client.png
diff --git a/support/azure/virtual-machines/windows/please-wait-for-the-group-policy-client.md b/support/azure/virtual-machines/windows/please-wait-for-the-group-policy-client.md
@@ -0,0 +1,36 @@
+---
+title: Windows VM Startup gets Stuck on "Please wait for the Group Policy Client" in Azure
+description: Provides troubleshooting steps for an Azure virtual machine (VM) that gets stuck in startup on the "Please wait for the Group Policy Client" screen.
+ms.date: 05/14/2025
+author: cwhitley-MSFT 
+ms.author: cwhitley
+ms.service: azure-virtual-machines
+ms.collection: windows
+ms.custom: sap:My VM is not booting
+---
+
+# VM startup gets stuck at "Please wait for the Group Policy Client"
+
+**Applies to:** :heavy_check_mark: Windows VMs
+
+This article discusses an issue that causes a Microsoft Azure virtual machine (VM) to get stuck during startup on the **Please wait for the Group Policy Client** screen.
+
+## Symptoms
+
+A Windows VM doesn't start. When you use [Boot diagnostics](./boot-diagnostics.md) to view the screenshot of the VM, you see that the Windows operating system displays the message, "Please wait for the Group Policy Client."
+
+:::image type="content" source="media/please-wait-for-the-group-policy-client/please-wait-for-the-group-policy-client.png" alt-text="Screenshot of Windows operating system displaying the message 'Please wait for the  Group Policy Client'.":::
+
+## Cause
+
+When a Windows VM starts, it might take some time to apply Group Policy system settings. If the VM is applying many policies, or if the policies are complex, this process can take longer than usual.
+
+We recommend that you allow up to one hour for the VM to finish applying these settings. If the VM remains stuck on the same screen after that time, more troubleshooting might be necessary to identify the specific cause of the issue.
+
+## Collect memory dump file for troubleshooting
+
+For this scenario, Azure Support requires a memory dump file in order to be able to troubleshoot and diagnose the issue.
+
+To collect a memory dump file, follow the steps in [this article](./collect-os-memory-dump-file.md). Then, [create a support request](https://ms.portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/overview?DMC=troubleshoot).
+
+[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
diff --git a/support/azure/virtual-machines/windows/toc.yml b/support/azure/virtual-machines/windows/toc.yml
@@ -348,6 +348,8 @@
       href: azure-vm-cannot-rdp-driver-irql-not-less-equal.md
     - name: Azure VM cannot RDP - working on features
       href: azure-vm-cannot-rdp-working-features.md
+    - name: Azure VM startup hangs at "Please wait for the Group Policy Client"
+      href: please-wait-for-the-group-policy-client.md
 
 - name: Cannot start or stop my VM
   items:
diff --git a/support/developer/webapps/iis/health-diagnostic-performance/etw-log-diagnostic.md b/support/developer/webapps/iis/health-diagnostic-performance/etw-log-diagnostic.md
@@ -1,7 +1,7 @@
 ---
 title: IIS ETW logs diagnostic
 description: This article describes the information that may be collected from a machine when running the IIS logs diagnostic.
-ms.date: 04/07/2020
+ms.date: 05/21/2025
 ms.custom: sap:Health, Diagnostic, and Performance Features\IIS ETW tracing providers
 ms.topic: article
 ---
@@ -57,7 +57,7 @@ This IIS ETW logs diagnostic collects various related logs, event logs, and allo
   
 ## More information
 
-If the user selects to collect an IIS/HTTP ETW log, the IIS ETW logs diagnostic will enable an IIS ETW Trace named _IIS ETW SDP Trace_. The diagnostic will automatically stop this trace when the user is clicks **next** while the trace is running. If the user clicks **Cancel**, they should stop the trace with the following command from an Administrative command prompt:
+If the user selects to collect an IIS/HTTP ETW log, the IIS ETW logs diagnostic will enable an IIS ETW Trace named _IIS ETW SDP Trace_. The diagnostic will automatically stop this trace when the user selects **next** while the trace is running. If the user selects **Cancel**, they should stop the trace with the following command from an Administrative command prompt:
 
 ``` console
 LogMan.exe stop "IIS ETW SDP Trace" -ets
diff --git a/support/entra/entra-id/app-integration/update-your-browser-error-when-using-apps-adal-msal.md b/support/entra/entra-id/app-integration/update-your-browser-error-when-using-apps-adal-msal.md
@@ -0,0 +1,48 @@
+---
+title: Your Browser is Not Supported or Up-to-Date Error in MSAL App
+description: This article describes how to resolve the "Your browser is not supported or up-to-date" error when you use an app that integrates with MSAL.
+author: genlin
+ms.author: willfid
+ms.service: entra-id
+ms.date: 01/11/2025
+ms.custom: sap:Issues signing into application
+---
+
+# "Your browser is not supported or up-to-date" error in MSAL app
+
+This article discusses how to resolve the "Your browser is not supported or up-to-date" error that occurs in a Microsoft Authentication Library (MSAL)-based app.
+
+## Symptoms
+
+When you open an app that integrates with MSAL, and you try to register Microsoft Entra ID Multifactor Authentication (MFA), you receive the following error message:
+
+>Your browser is not supported or up-to-date. Try updating it, or else download and install the latest version of Microsoft Edge.
+You could also try to access https://aka.ms/mysecurityinfo from another device.
+
+## Cause
+
+This issue occurs if the MSAL app uses outdated web browser controls, such as WebView1. These older controls don't support Microsoft Entra ID MFA registration or the self-service password reset wizard.
+
+## Resolution
+
+### For users
+
+To work around this issue, register for Microsoft Entra ID Multifactor Authentication (MFA) by using a supported browser before you use the app:
+
+1. Open [My Apps](https://myapps.microsoft.com) in Microsoft Edge or another supported browser.
+2. Complete the MFA registration process.
+3. After successful registration, open the app.
+
+## For developers
+
+To resolve this issue, enable broker authentication by using [Web Account Manager](/entra/identity-platform/scenario-desktop-acquire-token-wam) in the app.
+
+The following sample code creates a client to use Broker Authentication:
+
+```csharp
+var pca = PublicClientApplicationBuilder.Create("client_id").WithBroker(new BrokerOptions(BrokerOptions.OperatingSystems.Windows))
+```
+
+If Web Account Manager is unavailable (such as on Windows Server 2012), consider using the [default system browser for authentication](/entra/msal/dotnet/acquiring-tokens/using-web-browsers#how-to-use-the-default-system-browser).
+
+[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
diff --git a/support/entra/entra-id/toc.yml b/support/entra/entra-id/toc.yml
@@ -144,6 +144,8 @@
         href: app-integration/reply-url-redirected-to-localhost.md
       - name: Nonce is null error in ASP.NET MVC with OpenID Connect
         href: app-integration/troubleshoot-validation-context-nonce-null-mvc.md
+      - name: Your browser is not supported or up-to-date error in MSAL app
+        href: app-integration/update-your-browser-error-when-using-apps-adal-msal.md
   - name: Troubleshoot app registrations
     items: 
       - name: Application and delegated permissions for access tokens