You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/virtual-machines/linux/troubleshoot-unexpected-node-reboots-pacemaker-suse.md
+12-8Lines changed: 12 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,13 +46,16 @@ Aug 21 01:47:31 node 02 corosync[15241]: [TOTEM ] Token has not been received
46
46
```
47
47
48
48
### Cause
49
-
It's noted that the unexpected node reboot is observed due to Network Maintenance activity or an outage. For confirmation, the timestamp can be matched by reviewing the [Azure Maintenance Notification](/azure/virtual-machines/linux/maintenance-notifications) in Azure Portal. For more information about Azure Scheduled Events, see [Azure Metadata Service: Scheduled Events for Linux VMs](/azure/virtual-machines/linux/scheduled-events).
49
+
The unexpected node reboot is noted as a result of a Network Maintenance activity or an outage. For confirmation, the timestamp can be matched by reviewing the [Azure Maintenance Notification](/azure/virtual-machines/linux/maintenance-notifications) in Azure Portal. For more information about Azure Scheduled Events, see [Azure Metadata Service: Scheduled Events for Linux VMs](/azure/virtual-machines/linux/scheduled-events).
50
50
51
51
#### Resolution
52
-
If the unexpected reboot timestamp aligns with a maintenance activity, the analysis confirms that the cluster was impacted by either platform or network maintenance. For further assistance or additional queries, you can open a support request by following these [instructions](#next-steps).
52
+
If the unexpected reboot timestamp aligns with a maintenance activity, the analysis confirms that either platform or network maintenance impacted the cluster.
53
+
54
+
For further assistance or other queries, you can open a support request by following these [instructions](#next-steps).
53
55
54
56
### Scenario 2: Cluster Misconfiguration
55
-
The cluster nodes experience unexpected failover or node reboots and it's often observed due to cluster misconfiguration affecting the stability of Pacemaker Clusters.
57
+
The cluster nodes experience unexpected failovers or node reboots, often caused by cluster misconfigurations that affect the stability of Pacemaker Clusters.
58
+
56
59
The cluster configuration can be reviewed by running the following command:
57
60
```bash
58
61
sudo crm configure show
@@ -62,8 +65,8 @@ sudo crm configure show
62
65
Unexpected reboots in an Azure SUSE Pacemaker cluster often occur due to misconfigurations:
63
66
64
67
1. Incorrect STONITH configuration:
65
-
- No STONITH or fencing fisconfigured: Not configuring STONITH properly can lead to nodes being marked as unhealthy and triggering unnecessary reboots.
66
-
- Wrong STONITH resource rettings: Incorrect parameters for Azure fencing agents like `fence_azure_arm` can cause nodes to reboot unexpectedly during failovers.
68
+
- No STONITH or fencing misconfigured: Not configuring STONITH properly can lead to nodes being marked as unhealthy and triggering unnecessary reboots.
69
+
- Wrong STONITH resource settings: Incorrect parameters for Azure fencing agents like `fence_azure_arm` can cause nodes to reboot unexpectedly during failovers.
67
70
- Insufficient permissions: The Azure resource group or credentials used for fencing may lack required permissions, causing STONITH failures.
68
71
69
72
2. Missing/Incorrect resource constraints:
@@ -106,9 +109,10 @@ Unexpected reboots in an Azure SUSE Pacemaker cluster often occur due to misconf
106
109
crm configure property maintenance-mode=false
107
110
```
108
111
### Scenario 3: Migration from On-premises to Azure
109
-
When migrating a SUSE Pacemaker cluster from on-premises to Azure, unexpected reboots can arise from specific misconfigurations or overlooked dependencies. Below are common mistakes in this category:
112
+
When migrating a SUSE Pacemaker cluster from on-premises to Azure, unexpected reboots can arise from specific misconfigurations or overlooked dependencies.
110
113
111
114
### Cause
115
+
The following are common mistakes in this category:
112
116
113
117
1. Incomplete or incorrect STONITH configuration:
114
118
- No STONITH or fencing misfconfigured: Not configuring STONITH (Shoot-The-Other-Node-In-The-Head) properly can lead to nodes being marked as unhealthy and triggering unnecessary reboots.
@@ -183,7 +187,7 @@ The `ASCS/ERS` resource is considered the application for SAP Netweaver clusters
183
187
#### Resolution
184
188
- To identify the root cause of the issue, it's essential to review the [OS performance](collect-performance-metrics-from-a-linux-system.md).
185
189
- Particular attention should be given to memory pressure and storage devices, their configuration especially if SAP Netweaver is hosted on Network File System (NFS), Azure NetApp Files (ANF), or Azure Files.
186
-
- Once external factors, such as platform or network outages, are ruled out, it's recommended to engage the application vendor for trace call analysis and log review.
190
+
- Once external factors, such as platform or network outages, are ruled out, it's recommended to engage the application vendor for trace call analysis and log review.
187
191
188
192
## Next steps
189
193
For additional help, open a support request by using the following instructions. When you submit your request, attach [supportconfig](https://documentation.suse.com/smart/systems-management/html/supportconfig/index.html) and [hb_report](https://www.suse.com/support/kb/doc/?id=000019142) logs for troubleshooting.
@@ -192,4 +196,4 @@ For additional help, open a support request by using the following instructions.
0 commit comments