You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/virtual-machines/linux/troubleshoot-unexpected-node-reboots-pacemaker-suse.md
+12-11Lines changed: 12 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,10 +46,10 @@ Aug 21 01:47:27 node 02 corosync[15241]: [KNET ] host: host: 2 has no active
46
46
Aug 21 01:47:31 node 02 corosync[15241]: [TOTEM ] Token has not been received in 30000 ms
47
47
```
48
48
49
-
### Cause
49
+
### Cause for scenario 1
50
50
The unexpected node reboot is noted as a result of a Network Maintenance activity or an outage. For confirmation, the timestamp can be matched by reviewing the [Azure Maintenance Notification](/azure/virtual-machines/linux/maintenance-notifications) in Azure Portal. For more information about Azure Scheduled Events, see [Azure Metadata Service: Scheduled Events for Linux VMs](/azure/virtual-machines/linux/scheduled-events).
51
51
52
-
### Resolution
52
+
### Resolution for scenario 1
53
53
If the unexpected reboot timestamp aligns with a maintenance activity, the analysis confirms that either platform or network maintenance impacted the cluster.
54
54
55
55
For further assistance or other queries, you can open a support request by following these [instructions](#next-steps).
@@ -62,7 +62,7 @@ The cluster configuration can be reviewed by running the following command:
62
62
sudo crm configure show
63
63
```
64
64
65
-
## Cause
65
+
###Cause for scenario 2
66
66
Unexpected reboots in an Azure SUSE Pacemaker cluster often occur due to misconfigurations:
67
67
68
68
1. Incorrect STONITH configuration:
@@ -92,7 +92,7 @@ Unexpected reboots in an Azure SUSE Pacemaker cluster often occur due to misconf
92
92
sudo cat /etc/corosync/corosync.conf
93
93
```
94
94
95
-
### Resolution
95
+
### Resolution for scenario 2
96
96
- It's necessary to follow the proper guidelines outlined for setting up a [SUSE Pacemaker Cluster](#prerequisites). Additionally, ensure that appropriate resources are allocated for applications such as [SAP HANA](/azure/sap/workloads/sap-hana-high-availability) or [SAP NetWeaver](/azure/sap/workloads/high-availability-guide-suse), as specified in our Microsoft documentation.
97
97
- Steps to make necessary changes to the cluster configuration:
98
98
1. Stop the application on both the nodes.
@@ -112,7 +112,7 @@ Unexpected reboots in an Azure SUSE Pacemaker cluster often occur due to misconf
112
112
## Scenario 3: Migration from On-premises to Azure
113
113
When migrating a SUSE Pacemaker cluster from on-premises to Azure, unexpected reboots can arise from specific misconfigurations or overlooked dependencies.
114
114
115
-
### Cause
115
+
### Cause for scenario 3
116
116
The following are common mistakes in this category:
117
117
118
118
1. Incomplete or incorrect STONITH configuration:
@@ -144,8 +144,9 @@ The following are common mistakes in this category:
144
144
For more information, see [Network security group test](/azure/virtual-machines/network-security-group-test)
145
145
146
146
147
-
### Resolution
148
-
- It's necessary to follow the proper guidelines outlined for setting up a [SUSE Pacemaker Cluster](#prerequisites). Additionally, ensure that appropriate resources are allocated for applications such as [SAP HANA](/azure/sap/workloads/sap-hana-high-availability) or [SAP NetWeaver](/azure/sap/workloads/high-availability-guide-suse), as specified in our Microsoft documentation.
147
+
### Resolution for scenario 3
148
+
149
+
Follow the guidelines outlined to set up a [SUSE Pacemaker Cluster](#prerequisites). Additionally, ensure that appropriate resources are allocated for applications such as [SAP HANA](/azure/sap/workloads/sap-hana-high-availability) or [SAP NetWeaver](/azure/sap/workloads/high-availability-guide-suse), as specified in our Microsoft documentation.
149
150
150
151
## Scenario 4: `HANA_CALL` timeout after 60 seconds
151
152
@@ -158,10 +159,10 @@ The Azure SUSE Pacemaker Cluster is running SAP HANA as application and experien
158
159
2024-06-04T09:25:38.736748+00:00 node01 SAPHana(rsc_SAPHana_H00_HDB02)[99475]: ERROR: ACT: check_for_primary: we didn't expect node_status to be: DUMP <00000000 0a |.|#01200000001>
159
160
```
160
161
161
-
### Cause
162
+
### Cause for scenario 4
162
163
The SAP HANA timeout messages are commonly considered internal application timeouts, and the SAP vendor should be engaged.
163
164
164
-
### Resolution
165
+
### Resolution for scenario 4
165
166
- To identify the root cause of the issue, it's essential to review the [OS performance](collect-performance-metrics-from-a-linux-system.md).
166
167
- Particular attention should be given to memory pressure and storage devices, their configuration, especially if HANA is hosted on Network File System (NFS), Azure NetApp Files (ANF), or Azure Files.
167
168
- Once external factors, such as platform or network outages, are ruled out, engaging the application vendor for trace call analysis and log review is recommended.
@@ -182,10 +183,10 @@ The Azure SUSE Pacemaker Cluster is running SAP Netweaver ASCS/ERS as applicatio
182
183
2024-11-09T07:39:42.828955-05:00 node 01 pacemaker-schedulerd[2406]: warning: Unexpected result (not running) was recorded for start of RSC_SAP_ASCS00 on node01 at Nov 9 07:39:42 2024
183
184
```
184
185
185
-
### Cause
186
+
### Cause for scenario 5
186
187
The `ASCS/ERS` resource is considered the application for SAP Netweaver clusters. When the corresponding cluster monitoring resource times out, it triggers a failover process.
187
188
188
-
### Resolution
189
+
### Resolution scenario 5
189
190
- To identify the root cause of the issue, it's essential to review the [OS performance](collect-performance-metrics-from-a-linux-system.md).
190
191
- Particular attention should be given to memory pressure and storage devices, their configuration especially if SAP Netweaver is hosted on Network File System (NFS), Azure NetApp Files (ANF), or Azure Files.
191
192
- Once external factors, such as platform or network outages, are ruled out, engaging the application vendor for trace call analysis and log review is recommended.
0 commit comments