|
| 1 | +--- |
| 2 | +title: Shared Volumes Don't Respond During Planned Cluster Node Drain |
| 3 | +description: Resolves issues that occur during a planned cluster node drain operation if Cluster Shared Volumes stop responding. |
| 4 | +ms.date: 10/06/2025 |
| 5 | +author: kaushika-msft |
| 6 | +ms.author: kaushika |
| 7 | +manager: dcscontentpm |
| 8 | +audience: itpro |
| 9 | +ms.topic: troubleshooting |
| 10 | +ms.reviewer: kaushika |
| 11 | +ms.custom: |
| 12 | +- sap: virtualization and hyper-v\high availability virtual machines |
| 13 | +- pcy: Virtualization\high availability virtual machines |
| 14 | +appliesto: |
| 15 | + - <a href=https://learn.microsoft.com/windows/release-health/windows-server-release-info target=_blank>Supported versions of Windows Server</a> |
| 16 | +--- |
| 17 | + |
| 18 | +# Cluster Shared Volumes don't respond during a planned cluster node drain |
| 19 | + |
| 20 | +## Summary |
| 21 | + |
| 22 | +This article resolves issues that might occur during a planned cluster node drain operation if Cluster Shared Volumes (CSVs) stop responding and enter a pending offline state. This situation can disrupt I/O operations and cause the VMs (VMs) that are hosted on the affected volumes to fail. |
| 23 | + |
| 24 | +## Symptoms |
| 25 | + |
| 26 | +During the planned cluster node drain operation, you encounter the following symptoms: |
| 27 | + |
| 28 | +- CSVs become unresponsive and get stuck in a pending offline state. |
| 29 | +- I/O operations are paused for approximately 20–30 minutes. |
| 30 | +- The resource-hosting subsystem (RHS) process was terminated and caused the eviction of the affected node from the cluster. |
| 31 | +- The affected node is the quorum owner. This condition causes unresponsiveness in overall cluster management. |
| 32 | +- All VMs that are hosted on the affected volumes fail. |
| 33 | +- Other volumes on the same node fail over successfully and are unaffected. |
| 34 | +- Logs indicate repeated timeouts and resource failures for the affected volumes. |
| 35 | +- Network-related issues occur, including packet loss that's detected by the NetFT (Network Fault Tolerant) adapter. |
| 36 | +- SMB (Server Message Block) multichannel connectivity can't be established because of inconsistent adapter settings. |
| 37 | + |
| 38 | +## Cause |
| 39 | + |
| 40 | +The root cause of this issue is a combination of factors: |
| 41 | + |
| 42 | +- The node that's undergoing the drain operation is the cluster owner. This condition amplifies the effect of the operation. |
| 43 | +- File locks on the affected volumes hinder their migration and cause timeouts and subsequent failures. |
| 44 | +- Network congestion occurs. The NetFT adapter reports packet loss during the failover attempt. |
| 45 | +- Inconsistent network adapter settings across nodes prevent SMB multichannel connectivity. |
| 46 | +- The resource drain process triggers resource failures, and causes termination of the RHS process and initiated cluster recovery operations. |
| 47 | + |
| 48 | +## Resolution |
| 49 | + |
| 50 | +To resolve these issues and prevent future occurrences, follow these steps: |
| 51 | + |
| 52 | +1. Log Analysis and Diagnostics: Collect analyzed cluster logs, cluster validation reports, and failure minidump data to identify contributing factors. |
| 53 | +2. Network Configuration: |
| 54 | + |
| 55 | + - Make sure that network adapter settings are uniform across all cluster nodes to enable SMB multichannel connectivity. |
| 56 | + - Increase the network bandwidth or reduce congestion to avoid packet loss during failover operations. |
| 57 | + |
| 58 | +3. Cluster ownership consideration: |
| 59 | + |
| 60 | + - Plan node drain operations carefully. |
| 61 | + - Before you start maintenance, make sure that critical roles, such as quorum ownership, are moved to other nodes. |
| 62 | + |
| 63 | +4. Preventive actions: |
| 64 | + |
| 65 | + - Review file lock mechanisms to reduce the risk of migration failures. |
| 66 | + - Perform regular cluster validation tests to identify and resolve potential inconsistencies or misconfigurations. |
0 commit comments