You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- <a href=https://learn.microsoft.com/windows/release-health/windows-server-release-info target=_blank>Windows Server 2025</a>
16
16
---
17
-
# Cluster Shared Volume goes offline after a node or storage component goes offline during active I/O
17
+
# Cluster Shared Volume goes offline after a node or storage component goes offline
18
18
19
-
This article describes a situation where the Cluster Shared Volume (CSV) of a cluster might go offline after other components go offline. The article includes steps for resolving the issue and, if needed, recovering any affected virtual machines.
19
+
This article discusses a situation in which the Cluster Shared Volume (CSV) of a cluster goes offline after other components go offline. The article includes steps to resolve the issue and, if it's necessary, recover any affected virtual machines (VMs).
20
20
21
21
## Symptoms
22
22
23
23
This issue starts under the following circumstances:
24
24
25
-
1. A cluster node or storage component becomes unavailable, but I/O operations continue. For example, a disk array failed or requires maintenance.
25
+
1. A cluster node or storage component becomes unavailable, but I/O operations continue. For example, a disk array fails or requires maintenance.
26
26
1. As I/O operations continue, metadata records accumulate.
27
27
1. When the metadata records reach their allocated limits, I/O operations fail.
28
28
1. The associated CSV enters a Failed state.
29
-
1. Every 15 minutes (the default setting), the cluster tries to bring the CSV online. If the Virtual Machine Management Service (VMMS) manages virtual machines on the cluster, VMMS periodically tries to start the virtual machines.
30
-
1. After 30 minutes, VMMS stops trying to start the virtual machines. Any virtual machines that use the affected CSV can't automatically recover.
29
+
1. Every 15 minutes (the default setting), the cluster tries to bring the CSV online. If the Virtual Machine Management Service (VMMS) manages VMs on the cluster, VMMS periodically tries to start the VMs.
30
+
1. After 30 minutes, VMMS stops trying to start the VMs. Any VMs that use the affected CSV can't automatically recover.
31
31
32
32
## Cause
33
33
34
-
A recent change in cluster behavior affects how the CSV responds in the situation that's described in the Symptoms section. Previously, when metadata records accumulated to the allocated limits, I/O operations could hang indefinitely. Because of the change, I/O operations fail in this situation instead of hanging. The I/O failure in turn causes the CSV to go offline and enter a Failed state.
34
+
A recent change in cluster behavior affects how the CSV responds in the situation that's mentioned in the "Symptoms" section. Previously, when metadata records accumulated to the allocated limits, I/O operations could stop indefinitely. Because of the change, I/O operations fail in this situation instead of hanging. The I/O failure, in turn, causes the CSV to go offline and enter a Failed state.
35
35
36
36
## Recovery
37
37
@@ -45,7 +45,7 @@ After you restore the offline node or storage, the following steps occur automat
45
45
1. Automatic repair processes start, and then the volume becomes available.
46
46
47
47
> [!IMPORTANT]
48
-
> After the cluster recovers, you might have to manually start any VMMS-managed virtual machines that use the cluster. After the cluster is down for 30 minutes, VMMS stops automatically trying to restart the virtual machines.
48
+
> After the cluster recovers, you might have to manually start any VMMS-managed VMs that use the cluster. After the cluster is down for 30 minutes, VMMS stops automatically trying to restart the VMs.
49
49
50
50
### Method 2: Replace the offline component and manually recover the cluster
51
51
@@ -58,7 +58,7 @@ If you can't restore the missing node or storage, follow these steps to manually
58
58
59
59
Run the following steps as a cluster administrator on a node that has full access to the storage pool.
60
60
61
-
1. On a cluster node that has full access to the storage pool, open an administrative PowerShell command prompt window.
61
+
1. On a cluster node that has full access to the storage pool, open an administrative PowerShell Command Prompt window.
62
62
63
63
1. To get the properties of the affected storage pool, run the following command at the PowerShell command prompt:
64
64
@@ -69,7 +69,7 @@ Run the following steps as a cluster administrator on a node that has full acces
69
69
> [!NOTE]
70
70
>
71
71
> - In this cmdlet, \<Pool> is the name of the storage pool resource.
72
-
> - Later steps in this procedure use properties such as the name and the owner group of the resource.
72
+
> - Later steps in this procedure use properties such as the name and owner group of the resource.
73
73
74
74
1. To get the properties of the storage pool's virtual disks and CSVs, run the following command:
75
75
@@ -86,7 +86,7 @@ Run the following steps as a cluster administrator on a node that has full acces
86
86
> [!NOTE]
87
87
> If you intend to reuse name and ID information for any resources that you replace, you can use `Get-ClusterResource` and `Get-ClusterParameter` to get that information.
88
88
89
-
1. Whether you're replacing a node, or just storage, run the following cmdlets to add unpooled disks to the storage pool.
89
+
1. Whether you're replacing a node, or just storage, run the following cmdlets to add unpooled disks to the storage pool:
1. Monitor the virtual disks by running `Get-VirtualDisk` and looking for `OperationalStatus = InService` in the cmdlet output. When the `OperationalStatus` parameter is clear for all of the virtual disks, continue to the next step.
101
+
1. Monitor the virtual disks by running `Get-VirtualDisk` and looking for `OperationalStatus = InService` in the cmdlet output. When the `OperationalStatus` parameter is clear for all the virtual disks, go to the next step.
102
102
103
103
1. To move the affected storage pool (that you identified previously) to the current node, run a PowerShell cmdlet that resembles the following command:
104
104
@@ -111,14 +111,14 @@ Run the following steps as a cluster administrator on a node that has full acces
111
111
112
112
1. To move the failed disk and CSV resources to the current node, run `Move-ClusterResource` again for each physical disk and CSV resource. To see the OwnerGroup value of the CSV, run `Get-ClusterSharedVolume | get-ClusterGroup`.
113
113
114
-
1. To remove all cluster virtual disks and CSVs from cluster management, run the following PowerShell commands in sequence.
114
+
1. To remove all cluster virtual disks and CSVs from cluster management, run the following PowerShell commands in sequence:
1. To remove the storage pool from cluster management, run the `Remove-ClusterResource` command for the storage pool objects that you identified in the first step of this procedure.
121
+
1. To remove the storage pool from cluster management, run the `Remove-ClusterResource` command for the storage pool objects that you identified in step 1 of this procedure.
122
122
123
123
1. To make the storage pool writable, run the following commands:
124
124
@@ -132,17 +132,17 @@ Run the following steps as a cluster administrator on a node that has full acces
1. Use the `Get-StorageJob` cmdlet to monitor the storage jobs that are related to repair. After the jobs start (percentage complete is greater than 0), continue to the next step.
135
+
1. Use the `Get-StorageJob` cmdlet to monitor the storage jobs that are related to repair. After the jobs start (the percentage completed is greater than 0), go to the next step.
136
136
137
-
1. To add the storage pool back to cluster management, run the following commands:
137
+
1. To restore the storage pool to cluster management, run the following commands:
1.Add all non-failed virtual disks back to cluster management. If any of the virtual disks from the previous step were previously configured as CSVs, convert them to CSVs.
143
+
1.Restore all non-failed virtual disks to cluster management. If any of the virtual disks from the previous step were previously configured as CSVs, convert them to CSVs.
144
144
145
-
For example, you can bring back any of the virtual disk or CSV resources that you identified in step 2 that weren't in a failed state. To do this, use the `virtualdiskid` and `name` property values from step 2 and run commands that resemble the following script excerpt:
145
+
For example, you can bring back any of the virtual disk or CSV resources that you identified in step 2 that weren't in a failed state. To restore these resources, use the `virtualdiskid` and `name` property values from step 2, and then run commands that resemble the following script excerpt:
146
146
147
147
```powershell
148
148
`$virtualdiskname = "ClusterPerformanceHistory"`
@@ -167,15 +167,15 @@ Run the following steps as a cluster administrator on a node that has full acces
167
167
get-physicaldisk -Usage Retired | ft Deviceid, Usage, VirtualDiskFootprint
168
168
```
169
169
170
-
When the footprint reaches zero, continue to the next step.
170
+
When the footprint reaches zero, go to the next step.
171
171
172
-
1.Add the previously-failed virtual disks back to cluster management. If any of these virtual disks were previously configured as CSVs, convert them to CSVs.
172
+
1.Restore the previouslyfailed virtual disks to cluster management. If any of these virtual disks were previously configured as CSVs, convert them to CSVs.
173
173
174
-
1. Bring the virtual disks from the previous step online and configure them as read/write.
174
+
1. Bring the virtual disks from the previous step online, and configure them as read/write.
175
175
176
176
> [!IMPORTANT]
177
-
> After the cluster recovers, you might have to manually start any VMMS-managed virtual machines that use the cluster. After the cluster is down for 30 minutes, VMMS stops automatically trying to restart the virtual machines.
177
+
> After the cluster recovers, you might have to manually start any VMMS-managed VMs that use the cluster. After the cluster is down for 30 minutes, VMMS stops automatically trying to restart the VMs.
178
178
179
179
## Status
180
180
181
-
This behavior is by design in Windows Server 2025. It's intended to prevent indefinite I/O hangs.
181
+
This behavior is by design in Windows Server 2025. It's intended to prevent indefinite I/O unresponsiveness.
0 commit comments