Skip to content

Commit bd3457c

Browse files
authored
Update troubleshoot-rolling-upgrades.md
Edit review per CI 7835
1 parent 6a1e5ea commit bd3457c

1 file changed

Lines changed: 54 additions & 50 deletions

File tree

Lines changed: 54 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Troubleshoot Rolling Upgrade Issues
3-
description: Describes how to troubleshoot rolling upgrade issues.
3+
description: Discusses how to troubleshoot rolling upgrade issues.
44
ms.date: 12/05/2025
55
manager: dcscontentpm
66
audience: itpro
@@ -15,17 +15,17 @@ ms.custom:
1515

1616
## Summary
1717

18-
This article provides a structured troubleshooting approach for addressing common issues encountered during rolling upgrades in Windows Server Failover Clustering (WSFC), Storage Spaces Direct, SQL Server Always On availability groups, and Hyper-V.
18+
This article provides a structured troubleshooting method to resolve common issues that you might encounter during rolling upgrades in Windows Server Failover Clustering (WSFC), Storage Spaces Direct, SQL Server Always On availability groups, and Hyper-V.
1919

20-
Rolling upgrades are essential for maintaining and upgrading systems with minimal downtime. However, challenges like compatibility and configuration errors can impact availability and potentially cause data loss.
20+
Rolling upgrades are essential for maintaining and upgrading systems while experiencing minimal downtime. However, challenges such as compatibility and configuration errors can affect availability, and potentially cause data loss.
2121

2222
## Prerequisites
2323

24-
Before starting a rolling upgrade:
24+
Before you start a rolling upgrade:
2525

2626
- Verify that the rolling upgrade feature is supported for your workload and operating system (OS) versions.
27-
- Confirm all cluster nodes are healthy using the `Get-ClusterNode` PowerShell command.
28-
- Ensure you have up-to-date backups, including:
27+
- Verify that all cluster nodes are healthy by using the `Get-ClusterNode` PowerShell command.
28+
- Make sure that you have up-to-date backups, including:
2929
- System state
3030
- Cluster configuration
3131
- User data
@@ -34,82 +34,86 @@ Before starting a rolling upgrade:
3434

3535
### Address rolling upgrade failures
3636

37-
1. Move core resources to another node using Failover Cluster Manager or the `Move-ClusterGroup` PowerShell command.
38-
2. Use `Suspend-ClusterNode -Drain` to migrate roles and resources off the node.
39-
3. Check cluster logs for dependencies or errors blocking the operation.
37+
1. Move core resources to another node by using Failover Cluster Manager or the `Move-ClusterGroup` PowerShell command.
38+
2. Migrate roles and resources off the node by using `Suspend-ClusterNode -Drain`.
39+
3. Check cluster logs for dependencies or errors that might block the operation.
4040

4141
## Troubleshooting checklist
4242

43-
1. **Review prerequisites**: Ensure the environment meets all prerequisites previously cited in this article.
43+
1. **Review prerequisites**: Make sure that the environment meets all prerequisites that are mentioned in this article.
4444

45-
2. **Validate cluster status**: Run `Test-Cluster` and resolve any validation warnings or errors.
46-
- Verify the current cluster functional level using `Get-Cluster | Select ClusterFunctionalLevel`.
45+
2. **Validate cluster status**: Resolve any validation warnings or errors by running `Test-Cluster`.
46+
- Verify the current cluster functional level by using `Get-Cluster | Select ClusterFunctionalLevel`.
4747
- Validate network connectivity among all nodes.
4848

4949
3. **Plan and sequence upgrades**: Document the sequence of node upgrades (one node at a time).
50-
- Move cluster roles (like virtual machines (VMs), availability groups, or file shares) off the node being upgraded.
51-
- Update all nodes with the latest supported patches or hotfixes for the current OS.
50+
- Move cluster roles (such as virtual machines (VMs), availability groups, or file shares) off the node that's being upgraded.
51+
- Update all nodes to the latest supported updates or hotfixes for the current OS.
5252

5353
4. **Communicate with stakeholders**: Inform stakeholders and schedule maintenance windows.
54-
- Notify monitoring teams to avoid unnecessary alerts.
54+
- Notify monitoring teams in order to avoid unnecessary alerts.
5555

56-
5. **Ensure application awareness**: Confirm application compatibility for workloads like SQL Server, Hyper-V, or file services.
57-
- Inform application owners of planned upgrades.
56+
5. **Ensure application awareness**: Verify application compatibility for workloads such as SQL Server, Hyper-V, or file services.
57+
- Inform application owners about planned upgrades.
5858

5959
6. **Conduct pre-upgrade tests**: Review logs for Windows, applications, clusters, and storage to identify any pre-existing issues.
6060

6161
## Common issues and their respective solutions
6262

63-
### 1. Rolling upgrade fails to start or node can't be evicted
63+
### 1. Rolling upgrade doesn't start or node can't be evicted
6464

6565
**Symptoms**
6666

67-
You're unable to pause, drain, or remove a node from the cluster. Errors like "Node ... cannot be removed from the cluster ..." appear.
67+
You can't pause, drain, or remove a node from the cluster. You receive error messages such as the following example:
68+
69+
> Node... cannot be removed from the cluster.
6870
6971
**Cause**
7072

7173
The node hosts core cluster resources, dependencies are misconfigured, or the cluster is unstable.
7274

7375
**Solution**
7476

75-
1. Move core resources to another node using Failover Cluster Manager or `Move-ClusterGroup`.
76-
2. Use `Suspend-ClusterNode -Drain` to move roles and resources.
77-
3. Ensure the node isn't the last up-to-date or quorum node.
77+
1. Move core resources to another node by using Failover Cluster Manager or `Move-ClusterGroup`.
78+
2. move roles and resources by running `Suspend-ClusterNode -Drain`.
79+
3. Make sure that the node isn't the last up-to-date or quorum node.
7880
4. Check cluster logs for blocking dependencies.
7981

80-
### 2. Failure adding upgraded node back to cluster
82+
### 2. Can't restore upgraded node to cluster
8183

8284
**Symptoms**
8385

84-
Errors like "A node attempted to join a failover cluster but failed due to incompatibility…" or version mismatch messages appear.
86+
You receive a version mismatch message or error messages such as the following example:
87+
88+
> A node attempted to join a failover cluster but failed due to incompatibility.
8589
8690
**Cause**
8791

88-
Unsupported OS version mix or unpatched node.
92+
Unsupported OS version mix or nonupdated node.
8993

90-
**Solution**
94+
**Solution**
9195

9296
1. Verify the supported OS and cluster version matrix.
93-
2. Patch the node to the latest cumulative update (CU).
97+
2. Update the node to the latest cumulative update (CU).
9498
3. Upgrade the OS versions sequentially (for example, 2016 → 2019 → 2022).
95-
4. Use `Get-ClusterLog` to identify versioning errors.
99+
4. Identify versioning errors by using `Get-ClusterLog`.
96100

97-
### 3. Resource or service fails to come online
101+
### 3. Resource or service doesn't come online
98102

99103
**Symptoms**
100104

101-
Resources like VMs or file shares enter a failed or offline state post-upgrade. Common Event IDs include `1069`, `1146`, and `1230`.
105+
Resources such as VMs or file shares enter a failed or offline state post-upgrade. Common Event IDs include `1069`, `1146`, and `1230`.
102106

103107
**Cause**
104108

105109
Misconfiguration during upgrade, missing registry keys or files, or service account failures.
106110

107-
**Solution**
111+
**Solution**
108112

109113
1. Check cluster events in Failover Cluster Manager.
110-
2. Validate resource owner configurations using `Get-ClusterResource | Get-ClusterOwnerNode`.
111-
3. Repair or recreate missing dependencies.
112-
4. Restart cluster services with `Restart-Service ClusSvc`.
114+
2. Verify resource owner configurations by running `Get-ClusterResource | Get-ClusterOwnerNode`.
115+
3. Repair or re-create missing dependencies.
116+
4. Restart cluster services by running `Restart-Service ClusSvc`.
113117

114118
### 4. Quorum or communication loss
115119

@@ -123,48 +127,48 @@ Network partition, firewall configuration, or quorum misconfiguration.
123127

124128
**Solution**
125129

126-
1. Ensure all required ports are open.
130+
1. Make sure that all required ports are open.
127131
2. Check network, DNS, and routing configurations.
128-
3. Check quorum settings with `Get-ClusterQuorum` and update them if necessary.
129-
4. Run `Validate-Cluster` to identify root causes.
132+
3. Check quorum settings by running `Get-ClusterQuorum`. Update settings as appropriate.
133+
4. To identify root causes, run `Validate-Cluster`.
130134

131-
### 5. Patch or update failure or known bug
135+
### 5. Update failure or known bug
132136

133137
**Symptoms**
134138

135-
Cluster services crash post-update or resources fail due to a known problematic update.
139+
Cluster services stop responding after an update, or resources fail because of a known problematic update.
136140

137141
**Cause**
138142

139-
Microsoft updates or patches causing cluster instability.
143+
Cluster instability occurred after a Microsoft update installation.
140144

141145
**Solution**
142146

143147
1. Review Microsoft Knowledge Base (KB) articles for known issues.
144-
2. Remove problematic updates if needed.
145-
3. Apply recommended hotfixes or wait for updated patches.
146-
4. Open a support case if still unresolved.
148+
2. Remove problematic updates, if it's necessary.
149+
3. Apply recommended hotfixes or wait for new updates.
150+
4. Open a support case if the issue remains unresolved.
147151

148152
### 6. Cluster validation or functional level errors
149153

150154
**Symptoms**
151155

152-
Unable to update the cluster functional level or validation fails.
156+
Can't update the cluster functional level, or validation fails.
153157

154158
**Cause**
155159

156160
Mixed OS versions, incomplete upgrades, or outdated drivers.
157161

158162
**Solution**
159163

160-
1. Update all nodes and ensure they're joined to the cluster.
161-
2. Update hardware drivers (like network and storage) and firmware.
162-
3. Use `Update-ClusterFunctionalLevel` to complete the upgrade.
164+
1. Update all nodes, and make sure that they're joined to the cluster.
165+
2. Update hardware drivers (such as network and storage) and firmware.
166+
3. Complete the upgrade by using `Update-ClusterFunctionalLevel`.
163167
4. Review logs for driver or validation failures.
164168

165169
## Advanced troubleshooting and data collection
166170

167-
For persistent or complex issues, collect the following data:
171+
For persistent or complex issues, collect the following data.
168172

169173
**Cluster logs**
170174

@@ -203,7 +207,7 @@ Get-ClusterLog -TimeSpan 24:00 -Destination
203207
204208
```
205209

206-
**Patch or update history**
210+
**Uupdate history**
207211

208212
```powershell
209213
@@ -215,4 +219,4 @@ Get-HotFix | Export-Csv \Hotfix.csv
215219

216220
- [Upgrade a Windows Server failover cluster with a cluster OS rolling upgrade](/windows-server/failover-clustering/cluster-operating-system-rolling-upgrade)
217221
- [Update-ClusterFunctionalLevel](/powershell/module/failoverclusters/update-clusterfunctionallevel)
218-
- [Known issues - KB5062557](https://support.microsoft.com/help/5062557)
222+
- [Known issues - KB5062557](https://support.microsoft.com/help/5062557)

0 commit comments

Comments
 (0)