You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -52,81 +52,129 @@ After you restore the offline node or storage, the following steps occur automat
52
52
If you can't restore the missing node or storage, follow these steps to manually recover the cluster.
53
53
54
54
> [!IMPORTANT]
55
-
> This process temporarily takes all volumes in the pool offline.
55
+
>
56
+
> - This procedure temporarily takes all volumes in the pool offline.
57
+
> - You can use this procedure for either Storage Spaces or failover clusters. Every step that applies to virtual disks or volumes also applies to Cluster Virtual Disks and Cluster Shared Volumes.
56
58
57
59
Run the following steps as a cluster administrator on a node that has full access to the storage pool.
58
60
59
-
1. Identify the clustered storage pool.
60
-
Get-ClusterResource *Pool*
61
-
1. Gather Cluster Virtual Disks and CSVs; identify those that are failed.
61
+
1. On a cluster node that has full access to the storage pool, open an administrative PowerShell command prompt window.
62
+
63
+
1. To get the properties of the affected storage pool, run the following command at the PowerShell command prompt:
64
+
65
+
```powershell
66
+
Get-ClusterResource <Pool>
67
+
```
68
+
69
+
> [!NOTE]
70
+
>
71
+
> - In this cmdlet, /<Pool> is the name of the storage pool resource.
72
+
> - Later steps in this procedure use properties such as the name and the owner group of the resource.
73
+
74
+
1. To get the properties of the storage pool's virtual disks and CSVs, run the following command:
The output indicates the resource state which would say Failed.
66
-
It's also useful to keep track of the resource name (if you want to make it match later) and the VirtualDiskId from <CSVorPhysicaldiskresource> | Get-clusterparameter for when you add the resources back
67
-
Every step involving virtual disks needs to include Cluster Virtual Disks and Cluster Shared Volumes - there's some naming collision here though, in Storage Spaces they are virtual disks or volumes, in Cluster they are a Physical Disk Resource or a Cluster Shared Volume. It basically defines whether a disk is available from a single node at a time, or all nodes the same time.
68
-
69
-
1. If available, load previously saved disk/CSV info from file; otherwise, gather and save the info for recovery continuity.
70
-
This was more a step for the scripted version - saving this information in a json formatted file in case the script got interrupted after actions were taken so that the script would remember the original state and could just be re-run.
71
-
The information that was kept was the cluster resource name, whether it was a cluster shared volume or a physical disk resource, and whether its state was Failed.
82
+
```
83
+
84
+
1. Review the properties to determine which resources are in a "Failed" state.
85
+
86
+
> [!NOTE]
87
+
> If you intend to reuse name and ID information for any resources that you replace, you can use `Get-ClusterResource` and `Get-ClusterParameter` to get that information.
88
+
89
+
1. Whether you're replacing a node, or just storage, run the following cmdlets to add unpooled disks to the storage pool.
72
90
73
-
1. Add unpooled disks from the new node to the storage pool (if needed).
1. Wait for virtual disks with `OperationalStatus = InService` to clear.
78
-
Get-VirtualDisk output - none of the virtual disks returned should show InService - if they do, wait until its gone before proceeding
79
-
1. Move the storage pool resource to the current node.
80
-
The storage pool cluster resource found in step 1, you find the ownergroup on the resource, then move-clusterresource -node <currentmachine> -name <ownergroupvalue>
81
-
1. Move failed disk and CSV resources to the current node.
82
-
Same idea as above except with the Physical Disk resource and CSV in step 2. For CSV you can use Get-clustersharedvolume | get-clustergroup to find the group
83
-
1. Remove all Cluster Virtual Disks and CSVs from cluster management.
1. Remove the storage pool from cluster management.
88
-
Remove-ClusterResource for objects found in step 1
99
+
```
100
+
101
+
1. Monitor the virtual disks by running `Get-VirtualDisk` and looking for `OperationalStatus = InService` in the cmdlet output. When the `OperationalStatus` parameter is clear for all of the virtual disks, continue to the next step.
102
+
103
+
1. To move the affected storage pool (that you identified previously) to the current node, run a PowerShell cmdlet that resembles the following command:
89
104
90
-
1. Set the cluster storage pool `-IsReadOnly $false`.
> In this command, /<CurrentNode> is the name of the node that you're working from, and /<OwnerGroup> is the value of the OwnerGroup property of the storage group resource.
111
+
112
+
1. To move the failed disk and CSV resources to the current node, run `Move-ClusterResource` again for each physical disk and CSV resource. To see the OwnerGroup value of the CSV, run `Get-ClusterSharedVolume | get-ClusterGroup`.
113
+
114
+
1. To remove all cluster virtual disks and CSVs from cluster management, run the following PowerShell commands in sequence.
1. To remove the storage pool from cluster management, run the `Remove-ClusterResource` command for the storage pool objects that you identified in the first step of this procedure.
122
+
123
+
1. To make the storage pool writable, run the following commands:
1. Wait for storage jobs related to repair to start (>0% complete).
95
-
Get-StorageJob
96
-
1. Add the storage pool back to cluster management.
133
+
```
134
+
135
+
1. Use the `Get-StorageJob` cmdlet to monitor the storage jobs that are related to repair. When the jobs have started (percentage complete is greater than 0), continue to the next step.
136
+
137
+
1. To add the storage pool back to cluster management, run the following commands:
1. Verify retired physical disks still showing a virtual disk footprint > 0.
113
-
get-physicaldisk -Usage Retired | ft Deviceid, Usage, VirtualDiskFootprint
114
-
If the footprint is already 0 ignore step 19
141
+
```
142
+
143
+
1. Add all non-failed virtual disks back to cluster management. If any of the virtual disks from the previous step were previously configured as CSVs, convert them to CSVs.
144
+
145
+
For example, you can bring back any of the virtual disk or CSV resources that you identified in step 2 that were not in a failed state. To do this, use the `virtualdiskid` and `name` property values from step 2 and run commands that resemble the following script excerpt:
115
146
116
-
1. Wait for the footprint to reduce.
117
-
Same as step 18
118
-
1. Add the originally failed virtual disks back to cluster management and convert them to CSVs if previously configured.
119
-
Same as Step 15, just the remaining disks
120
-
1. Clean up temporary files used during recovery.
121
-
Probably doesn't apply here - these were the contents of Step 1 and 2 that were written to a json file in the script so that they could be loaded if interupted
You can use `Add-ClusterSharedVolume` to reconfigure the CSVs.
124
154
125
-
Virtual Machine Management Service (VMMS) retries bringing VMs online for 30 minutes.
155
+
1. Monitor the virtual disks by running `Get-VirtualDisk` and looking for `OperationalStatus = InService` in the cmdlet output. When the `OperationalStatus` parameter is clear for all of the virtual disks, continue to the next step.
126
156
127
-
After this window, VMMS stops retrying, and affected VMs remain in a Failed state.
157
+
1. To bring the virtual disks online and configure them as read/write, run the following commands:
128
158
129
-
After the storage volume is successfully recovered, manually start the VMs.
1. Monitor the virtual disk footprints of the retired physical disks by running the following cmdlet:
165
+
166
+
```powershell
167
+
get-physicaldisk -Usage Retired | ft Deviceid, Usage, VirtualDiskFootprint
168
+
```
169
+
170
+
When the footprint reaches zero, continue to the next step.
171
+
172
+
1. Add the previously-failed virtual disks back to cluster management. If any of these virtual disks were previously configured as CSVs, convert them to CSVs.
173
+
174
+
1. Bring the virtual disks from the previous step online and configure them as read/write.
175
+
176
+
> [!IMPORTANT]
177
+
> After the cluster recovers, you might have to manually start any VMMS-managed virtual machines that use the cluster. After the cluster is down for 30 minutes, VMMS stops automatically trying to restart the virtual machines.
0 commit comments