Skip to content

Commit c1b46d8

Browse files
authored
AB#7441: Create shared-volumes-do-not-respond.md (#10079)
* Create shared-volumes-do-not-respond.md * Update shared-volumes-do-not-respond.md
1 parent 8792abd commit c1b46d8

1 file changed

Lines changed: 66 additions & 0 deletions

File tree

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Shared Volumes Don't Respond During Planned Cluster Node Drain
3+
description: Resolves issues that occur during a planned cluster node drain operation if Cluster Shared Volumes stop responding.
4+
ms.date: 10/06/2025
5+
author: kaushika-msft
6+
ms.author: kaushika
7+
manager: dcscontentpm
8+
audience: itpro
9+
ms.topic: troubleshooting
10+
ms.reviewer: kaushika
11+
ms.custom:
12+
- sap: virtualization and hyper-v\high availability virtual machines
13+
- pcy: Virtualization\high availability virtual machines
14+
appliesto:
15+
- <a href=https://learn.microsoft.com/windows/release-health/windows-server-release-info target=_blank>Supported versions of Windows Server</a>
16+
---
17+
18+
# Cluster Shared Volumes don't respond during a planned cluster node drain
19+
20+
## Summary
21+
22+
This article resolves issues that might occur during a planned cluster node drain operation if Cluster Shared Volumes (CSVs) stop responding and enter a pending offline state. This situation can disrupt I/O operations and cause the VMs (VMs) that are hosted on the affected volumes to fail.
23+
24+
## Symptoms
25+
26+
During the planned cluster node drain operation, you encounter the following symptoms:
27+
28+
- CSVs become unresponsive and get stuck in a pending offline state.
29+
- I/O operations are paused for approximately 20–30 minutes.
30+
- The resource-hosting subsystem (RHS) process was terminated and caused the eviction of the affected node from the cluster.
31+
- The affected node is the quorum owner. This condition causes unresponsiveness in overall cluster management.
32+
- All VMs that are hosted on the affected volumes fail.
33+
- Other volumes on the same node fail over successfully and are unaffected.
34+
- Logs indicate repeated timeouts and resource failures for the affected volumes.
35+
- Network-related issues occur, including packet loss that's detected by the NetFT (Network Fault Tolerant) adapter.
36+
- SMB (Server Message Block) multichannel connectivity can't be established because of inconsistent adapter settings.
37+
38+
## Cause
39+
40+
The root cause of this issue is a combination of factors:
41+
42+
- The node that's undergoing the drain operation is the cluster owner. This condition amplifies the effect of the operation.
43+
- File locks on the affected volumes hinder their migration and cause timeouts and subsequent failures.
44+
- Network congestion occurs. The NetFT adapter reports packet loss during the failover attempt.
45+
- Inconsistent network adapter settings across nodes prevent SMB multichannel connectivity.
46+
- The resource drain process triggers resource failures, and causes termination of the RHS process and initiated cluster recovery operations.
47+
48+
## Resolution
49+
50+
To resolve these issues and prevent future occurrences, follow these steps:
51+
52+
1. Log Analysis and Diagnostics: Collect analyzed cluster logs, cluster validation reports, and failure minidump data to identify contributing factors.
53+
2. Network Configuration:
54+
55+
- Make sure that network adapter settings are uniform across all cluster nodes to enable SMB multichannel connectivity.
56+
- Increase the network bandwidth or reduce congestion to avoid packet loss during failover operations.
57+
58+
3. Cluster ownership consideration:
59+
60+
- Plan node drain operations carefully.
61+
- Before you start maintenance, make sure that critical roles, such as quorum ownership, are moved to other nodes.
62+
63+
4. Preventive actions:
64+
65+
- Review file lock mechanisms to reduce the risk of migration failures.
66+
- Perform regular cluster validation tests to identify and resolve potential inconsistencies or misconfigurations.

0 commit comments

Comments
 (0)