diff --git a/support/azure/virtual-machines/linux/troubleshoot-rhel-pacemaker-cluster-services-resources-startup-issues.md b/support/azure/virtual-machines/linux/troubleshoot-rhel-pacemaker-cluster-services-resources-startup-issues.md index bb390de737f..6a26013c1db 100644 --- a/support/azure/virtual-machines/linux/troubleshoot-rhel-pacemaker-cluster-services-resources-startup-issues.md +++ b/support/azure/virtual-machines/linux/troubleshoot-rhel-pacemaker-cluster-services-resources-startup-issues.md @@ -5,7 +5,7 @@ ms.reviewer: rnirek,srsakthi ms.author: rnirek author: skarthikeyan7-msft ms.topic: troubleshooting -ms.date: 02/24/2025 +ms.date: 09/01/2026 ms.service: azure-virtual-machines ms.collection: linux ms.custom: sap:Issue with Pacemaker clustering, and fencing @@ -29,19 +29,18 @@ This article discusses the most common causes of startup issues in RedHat Enterp sudo pcs status ``` ```output - Cluster name: my_cluster - Status of pacemakerd: 'Pacemaker is running' (last updated 2024-06-25 16:34:49 -04:00) + Cluster name: rhel-cluster Cluster Summary: - * Stack: corosync - * **Current DC: NONE** - * Last updated: Tue Jun 25 14:34:49 2024 - * Last change: Tue Jun 25 14:29:51 2024 by root via cibadmin on node-0 - * 2 nodes configured - * 9 resource instances configured + * Stack: corosync (Pacemaker is running) + * Current DC: vm-mce-hana02 (version 2.1.9-1.2.el9_6-49aab9983) - partition with quorum + * Last updated: Wed Jan 7 09:17:50 2026 + * Last change: Wed Jan 7 08:38:25 2026 by root via root on vm-mce-hana01 + * 2 nodes configured + * 9 resource instances configured Node List: - * **Node node-0: UNCLEAN (offline)** - * **Node node-1: UNCLEAN (offline)** + * Node vm-mce-hana01: UNCLEAN (offline) + * Node vm-mce-hana02: UNCLEAN (offline) ``` - `sudo pcs quorum status` returns the following error message: @@ -49,10 +48,16 @@ This article discusses the most common causes of startup issues in RedHat Enterp ```bash sudo pcs quorum status ``` + ```output Error: Unable to get quorum status: Unable to start votequorum status tracking: CS_ERR_BAD_HANDLE - Check for the error: Corosync quorum is not configured in /var/log/messeges: - Jun 16 11:17:53 node-0 pacemaker-controld[509433]: error: Corosync quorum is not configured + ``` + +- Check for the error: Corosync quorum is not configured in `/var/log/messeges`: + + ```output + Jan 7 09:20:50 vm-mce-hana01 pacemaker-controld[28982]: error: Corosync quorum is not configured + Jan 7 09:21:20 vm-mce-hana01 pacemaker-controld[29026]: error: Corosync quorum is not configured ``` ### Cause for scenario 1 @@ -64,6 +69,7 @@ The following `/etc/corosync/corosync.conf` setting enables **VoteQuorum** servi ```output quorum { provider: corosync_votequorum + two_node: 1 } ``` @@ -90,38 +96,39 @@ quorum { ``` ```output totem { - version: 2 - cluster_name: my_cluster - transport: knet - token: 30000 - crypto_cipher: aes256 - crypto_hash: sha256 - cluster_uuid: afd62fe2045b43b9a102de76fdf4659a + version: 2 + cluster_name: rhel-cluster + transport: knet + token: 30000 + crypto_cipher: aes256 + crypto_hash: sha256 + cluster_uuid: df29c619ef55405ab6dc9ba1364ea315 } + nodelist { - node { - ring0_addr: node-0 - name: node-0 - nodeid: 1 + node { + ring0_addr: vm-mce-hana01 + name: vm-mce-hana01 + nodeid: 1 + } + + node { + ring0_addr: vm-mce-hana02 + name: vm-mce-hana02 + nodeid: 2 + } } - node { - ring0_addr: node-1 - name: node-1 - nodeid: 2 + quorum { + provider: corosync_votequorum + two_node: 1 } - } - - **quorum { - provider: corosync_votequorum - two_node: 1 - }** logging { - to_logfile: yes - logfile: /var/log/cluster/corosync.log - to_syslog: yes - timestamp: on + to_logfile: yes + logfile: /var/log/cluster/corosync.log + to_syslog: yes + timestamp: on } ``` @@ -149,11 +156,13 @@ quorum { A virtual IP resource (`IPaddr2` resource) didn't start or stop in Pacemaker. -The following error entries are logged in `/var/log/pacemaker.log`: +The following error entries are logged in `/var/log/pacemaker/pacemaker.log`: ```output -25167 IPaddr2(VIP)[16985]: 2024/09/07_15:44:19 ERROR: Unable to find nic or netmask. -25168 IPaddr2(VIP)[16985]: 2024/09/07_15:44:19 ERROR: [findif] failed +Jan 03 13:46:30 IPaddr2(vip_HN1_00)[345260]: ERROR: [findif] failed +Jan 03 13:46:30.641 vm-mce-hana01 pacemaker-execd [2162] (log_op_output) info: vip_HN1_00_monitor_10000[345260] error output [ ocf-exit-reason:[findif] failed ] +Jan 03 13:47:18.067 vm-mce-hana01 pacemaker-schedulerd[2164] (unpack_rsc_op_failure) warning: Unexpected result (error: [findif] failed) was recorded for monitor of vip_HN1_00 on vm-mce-hana01 at Jan 3 13:46:20 2026 | exit-status=1 id=vip_HN1_00_last_failure_0 +Jan 03 13:47:18 IPaddr2(vip_HN1_00)[347048]: WARNING: [findif] failed ``` The error can also be observed while running `sudo pcs status`: @@ -162,7 +171,8 @@ The error can also be observed while running `sudo pcs status`: sudo pcs status ``` ```output -vip_HN1_03_start_0 on node-1 'unknown error' (1): call=30, status=complete, exit-reason='[findif] failed', last-rc-change='Thu Jan 07 17:25:52 2025', queued=0ms, exec=57ms +Failed Resource Actions: + * vip_HN1_00 10s-interval monitor on vm-mce-hana01 returned 'error' ([findif] failed) at Sat Jan 3 14:04:58 2026 after 37ms ``` ### Cause for scenario 2 @@ -176,32 +186,38 @@ vip_HN1_03_start_0 on node-1 'unknown error' (1): call=30, status=complete, exit Check the `IPaddr2` settings: ```bash - sudo pcs resource show vip_HN1_03 + sudo pcs resource config vip_HN1_00 ``` ```output - Resource: vip_HN1_03 (class=ocf provider=heartbeat type=IPaddr2) - Attributes: ip=172.17.10.10 cidr_netmask=24 nic=ens6 - Operations: start interval=0s timeout=20s (vip_HN1_03-start-timeout-20s) - stop interval=0s timeout=20s (vip_HN1_03-stop-timeout-20s) - monitor interval=10s timeout=20s (vip_HN1_03-monitor-interval-10s) + Resource: vip_HN1_00 (class=ocf provider=heartbeat type=IPaddr2) + Attributes: vip_HN1_00-instance_attributes + cidr_netmask=24 + ip=10.33.0.9 + Operations: + monitor: vip_HN1_00-monitor-interval-10s + interval=10s timeout=20s + start: vip_HN1_00-start-interval-0s + interval=0s timeout=20s + stop: vip_HN1_00-stop-interval-0s + interval=0s timeout=20s ``` -3. Try to determine the `NIC` information manually. In this example, based on the IP address and netmask, we can successfully find `ens6` from the route table: +3. Try to determine the `NIC` information manually. In this example, based on the IP address and netmask, we can successfully find `eth0` from the route table: ```bash - sudo ip -o -f inet route list match 172.17.10.10/24 scope link + sudo ip -o -f inet route list match 10.33.0.9/24 scope link ``` ```output - 172.17.10.0/24 dev ens6 proto kernel src 172.17.10.7 + 10.33.0.0/24 dev eth0 proto kernel src 10.33.0.5 metric 100 ``` -4. In a situation in which `NIC (ens6)` is down, you can't manually find the `NIC` information, and that might cause `[findif]` to fail. Replace `172.17.10.10/24` and `ens6` as appropriate. +4. In a situation in which `NIC (eth0)` is down, you can't manually find the `NIC` information, and that might cause `[findif]` to fail. Replace `10.33.0.9/24` and `eth0` as appropriate. ```bash - sudo ip link set ens6 down + sudo ip link set eth0 down ``` ```bash - sudo ip -o -f inet route list match 172.17.10.10/24 scope link + sudo ip -o -f inet route list match 10.33.0.9/24 scope link ``` ### Resolution for scenario 2 @@ -219,7 +235,7 @@ If a route that matches the `VIP` isn't in the default routing table, you can sp 3. Update the `NIC` resources: ```bash - sudo pcs resource update vip_HN1_03 nic=ens6 + sudo pcs resource update vip_HN1_00 nic=eth0 ``` 4. Restart the `NIC `resources: @@ -240,16 +256,21 @@ If a route that matches the `VIP` isn't in the default routing table, you can sp 6. Verify the `IP` resource: ```bash - sudo pcs resource show vip_HN1_03 + sudo pcs resource config show vip_HN1_00 ``` ```output - Warning: This command is deprecated and will be removed. Please use 'pcs resource config' instead. - Resource: vip_HN1_03 (class=ocf provider=heartbeat type=IPaddr2) - Attributes: cidr_netmask=32 ip=172.17.223.36 nic=vlan10 - Meta Attrs: resource-stickiness=INFINITY - Operations: monitor interval=10s timeout=20s (vip_HN1_03-monitor-interval-10s) - start interval=0s timeout=20s (vip_HN1_03-start-interval-0s) - stop interval=0s timeout=20s (vip_HN1_03-stop-interval-0s) + Resource: vip_HN1_00 (class=ocf provider=heartbeat type=IPaddr2) + Attributes: vip_HN1_00-instance_attributes + cidr_netmask=24 + ip=10.33.0.9 + nic=eth0 + Operations: + monitor: vip_HN1_00-monitor-interval-10s + interval=10s timeout=20s + start: vip_HN1_00-start-interval-0s + interval=0s timeout=20s + stop: vip_HN1_00-stop-interval-0s + interval=0s timeout=20s ``` For more information about this scenario, see the following Red Hat article: ["ERROR: [findif] failed" shown in Pacemaker'](https://access.redhat.com/solutions/2119711). @@ -260,7 +281,7 @@ For more information about this scenario, see the following Red Hat article: ["E SAP HANA DB doesn't start, and it returns an `unknown error` error message. -1. In the `/var/log/messages` log section, an `SRHOOK=SFAIL` entry is logged. This indicates that the cluster nodes are out of sync. +1. In the `/var/log/messages` log section, an `SRHOOK()=SFAIL` entry is logged. This indicates that the cluster nodes are out of sync. 2. The secondary cluster node is in `WAITING4PRIM` status. ```bash @@ -270,12 +291,12 @@ SAP HANA DB doesn't start, and it returns an `unknown error` error message. * Node node-0 (1): + hana_XXX_clone_state : PROMOTED + hana_XXX_sync_state : PRIM - + hana_XXX_roles : 2:P:master1:master:worker:slave + + hana_XXX_roles : 4:P:master1:master:worker:master * Node node-1 (2): + hana_XXX_clone_state : WAITING4PRIM + hana_XX_sync_state : SFAIL - + hana_XXX_roles : 2:S:master1:master:worker:slave + + hana_XXX_roles : 4:S:master1:master:worker:slave ``` 3. When you run `sudo pcs status`, the cluster status is shown as follows: @@ -284,49 +305,58 @@ SAP HANA DB doesn't start, and it returns an `unknown error` error message. sudo pcs status ``` ```output - 2 nodes configured - 8 resources configured - - Online: [ node-0 node-1 ] + * 2 nodes configured + * 9 resource instances configured + + Node List: + * Online: [ vm-mce-hana01 vm-mce-hana02 ] + + Full List of Resources: + * rsc_st_azure (stonith:fence_azure_arm): Started vm-mce-hana01 + * Clone Set: health-azure-events-clone [health-azure-events]: + * Started: [ vm-mce-hana01 vm-mce-hana02 ] + * Resource Group: g_ip_HN1_00: + * nc_HN1_00 (ocf:heartbeat:azure-lb): Started vm-mce-hana02 + * vip_HN1_00 (ocf:heartbeat:IPaddr2): Started vm-mce-hana02 + * Clone Set: SAPHanaTopology_HN1_00-clone [SAPHanaTopology_HN1_00]: + * Started: [ vm-mce-hana01 vm-mce-hana02 ] + * Clone Set: SAPHana_HN1_00-clone [SAPHana_HN1_00] (promotable): + * Promoted: [ vm-mce-hana01 ] + * Stopped: [ vm-mce-hana02 ] - Full list of resources: - - rsc_st_azure (stonith:fence_azure_arm): Started node-1 - Clone Set: cln_SAPHanaTopology [rsc_SAPHanaTopology] - Started: [ node-0 node-1 ] - Master/Slave Set: msl_SAPHana [rsc_SAPHana] - Master: [ node-1 ] - Slave: [ node-0 ] - Resource Group: g_ip_HN1_HBD00 - vip_HN1_HBD00 (ocf::heartbeat:IPaddr2): Started node-0 - nc_HN1_HBD00 (ocf::heartbeat:azure-lb): Started node-0 - - Failed Resource Actions: - * rsc_SAPHana_monitor_61000 on node-0 'unknown error' (1): call=32, status=complete, exitreason='', - last-rc-change='Sat May 22 09:29:20 2021', queued=0ms, exec=0ms - * rsc_SAPHana_start_0 on node-1 'not running' (7): call=55, status=complete, exitreason='', - last-rc-change='Sat May 22 09:36:32 2021', queued=0ms, exec=3093ms + * SAPHana_HN1_00 start on vm-mce-hana02 returned 'not running' at Wed Jan 7 13:39:14 2026 after 739ms ``` ### Cause for scenario 3, symptom 1 -Pacemaker can't start the SAP HANA resource if there are `SYN` failures between the primary and secondary nodes: +Pacemaker can't start the SAP HANA resource if there are `SYN` failures between the primary and secondary nodes. Given command shows the replication status of the SAP HANA database as `UNKNOWN` which confirms replication failed between the HANA DB. ```bash -sudo SAPHanaSR-showAttr + sudo su - hn1adm -c "python /usr/sap/HN1/HDB00/exe/python_support/systemReplicationStatus.py" ``` ```output -Global cib-time maintenance --------------------------------------------- -global Fri Aug 23 11:47:32 2024 false - -Hosts clone_state lpa_fh9_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -node-0 DEMOTED 10 online logreplay node-1 4:S:master1:master:worker:master 5 SITEA syncmem SOK 2.00.046.00.1581325702 node-0 -node-1 PROMOTED 1693237652 online logreplay node-0 4:P:master1:master:worker:master 150 SITEA syncmem PRIM 2.00.046.00.1581325702 node-1 + |Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replication |Secondary | + | | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status Details |Fully Synced | + |-------- |------------- |----- |------------ |--------- |------- |--------- |------------- |--------- |--------- |--------- |------------------ |----------- |----------- |-------------- |------------ | + |SYSTEMDB |vm-mce-hana01 |30001 |nameserver | 1 | 1 |Site1 |vm-mce-hana02 | 30001 | 2 |Site2 |CONNECTION TIMEOUT |SYNC |UNKNOWN | | False | + |HN1 |vm-mce-hana01 |30007 |xsengine | 2 | 1 |Site1 |vm-mce-hana02 | 30007 | 2 |Site2 |CONNECTION TIMEOUT |SYNC |UNKNOWN | | False | + |HN1 |vm-mce-hana01 |30003 |indexserver | 3 | 1 |Site1 |vm-mce-hana02 | 30003 | 2 |Site2 |CONNECTION TIMEOUT |SYNC |UNKNOWN | | False | + + status system replication site "2": UNKNOWN + overall system replication status: UNKNOWN + + Local System Replication State + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + mode: PRIMARY + site id: 1 + site name: Site1 ``` +> [!Important] +> `SAPHanaSR-showAttr` is supported only up to RHEL 7, deprecated in RHEL 8.4+, and completely removed in RHEL 9. Therefore alternative commands should be used to find the replication status like `sudo pcs status --full` or 'crm_attribute'. + ### Workaround for scenario 3, symptom 1 The SAP HANA resource can't be started by Pacemaker if there are `SYN` failures between the primary and secondary cluster nodes. To mitigate this issue, you must manually enable `SYN` between the primary and secondary nodes. @@ -344,33 +374,34 @@ The SAP HANA resource can't be started by Pacemaker if there are `SYN` failures 3. Check the SAP HANA DB and processes state: - a. Verify that both the primary and secondary nodes are running the SAP database and related SAP processes. One should function as the primary node and the other as the secondary. This makes sure that the databases on both nodes remain synchronized. + a. Verify that both the primary and secondary nodes are running the SAP database and related SAP processes. One should function as the primary node and the other as the secondary. This makes sure that the databases on both nodes remain synchronized. - b. Run `HDB info` on each node to check the SAP-related processes that are running in the node. The SAP administrator should be able to confirm that the required process are running on both of the nodes. + b. Switch to `adm` and run `HDB info` on each node to check the SAP-related processes that are running in the node. The SAP administrator should be able to confirm that the required process are running on both of the nodes. ```bash HDB info ``` ```output - USER PID PPID %CPU VSZ RSS COMMAND - a00adm 5183 5178 0.0 87684 1804 sshd: a00adm@pts/0 - a00adm 5184 5183 0.0 14808 3624 \_ -sh - a00adm 5994 5184 0.0 13200 1824 \_ /bin/sh /usr/sap/A00/HDB00/HDB info - a00adm 6019 5994 0.0 26668 1356 \_ ps fx -U a00adm -o user,pid,ppid,pcpu,vsz,rss,args - a00adm 5369 1 0.0 20932 1644 sapstart pf=/usr/sap/A00/SYS/profile/A00_HDB00_node-0 - a00adm 5377 5369 1.8 582944 292720 \_ /usr/sap/A00/HDB00/node-0/trace/hdb.sapA00_HDB00 -d -nw -f /usr/sap/A00/HDB00/node-0/daemon.ini pf=/usr/sap/A00/SYS/profile/A00_HDB00_node-0 - a00adm 5394 5377 9.3 3930388 1146444 \_ hdbnameserver - a00adm 5548 5377 21.3 2943472 529672 \_ hdbcompileserver - a00adm 5550 5377 4.4 2838792 465664 \_ hdbpreprocessor - a00adm 5571 5377 91.6 7151116 4019640 \_ hdbindexserver - a00adm 5573 5377 21.8 4323488 1203128 \_ hdbxsengine - a00adm 5905 5377 18.9 3182120 710680 \_ hdbwebdispatcher - a00adm 2104 1 0.0 428748 27760 /usr/sap/A00/HDB00/exe/sapstartsrv pf=/usr/sap/A00/SYS/profile/A00_HDB00_node-0 -D -u a00adm - a00adm 2004 1 0.0 31844 2352 /usr/lib/systemd/systemd --user - a00adm 2008 2004 0.0 63796 2620 \_ (sd-pam) + USER PID PPID %CPU VSZ RSS COMMAND + hn1adm 5261 5260 0.0 8892 5376 -sh + hn1adm 13183 5261 0.0 7516 4096 \_ /bin/sh /usr/sap/HN1/HDB00/HDB info + hn1adm 13212 13183 0.0 10020 3712 \_ ps fx -U hn1adm -o user:8,pid:8,ppid:8,pcpu:5,vsz:10,rss:10,args + hn1adm 12034 1 0.3 23900 14204 /usr/lib/systemd/systemd --user + hn1adm 12036 12034 0.0 176000 8732 \_ (sd-pam) + hn1adm 5219 1 0.0 566736 40900 hdbrsutil --start --port 30003 --volume 3 --volumesuffix mnt00001/hdb00003.00003 --identifier 1767858538 + hn1adm 4764 1 0.0 566664 40956 hdbrsutil --start --port 30001 --volume 1 --volumesuffix mnt00001/hdb00001 --identifier 1767858528 + hn1adm 4288 1 0.0 9308 3172 sapstart pf=/usr/sap/HN1/SYS/profile/HN1_HDB00_vm-mce-hana01 + hn1adm 4297 4288 0.2 455668 88620 \_ /usr/sap/HN1/HDB00/vm-mce-hana01/trace/hdb.sapHN1_HDB00 -d -nw -f /usr/sap/HN1/HDB00/vm-mce-hana01/daemon.ini pf=/usr/sap/HN1/SYS/profile/HN1_HDB00_vm-mce-hana01 + hn1adm 4324 4297 10.4 6460472 3496360 \_ hdbnameserver + hn1adm 4999 4297 0.4 935604 181024 \_ hdbcompileserver + hn1adm 5002 4297 0.4 1174076 195664 \_ hdbpreprocessor + hn1adm 5044 4297 19.9 6972960 3707744 \_ hdbindexserver -port 30003 + hn1adm 5047 4297 2.8 4401136 1263592 \_ hdbxsengine -port 30007 + hn1adm 5689 4297 1.2 2237896 436604 \_ hdbwebdispatcher + hn1adm 1482 1 0.0 500636 58040 /usr/sap/HN1/HDB00/exe/sapstartsrv pf=/usr/sap/HN1/SYS/profile/HN1_HDB00_vm-mce-hana01 ``` - c. If the SAP DB and services aren't active on the node, we recommend that you contact your SAP administrator to review and stop the SAP DB services, first in the secondary node and then in the primary node: + c. If the SAP DB and services aren't active on the node, we recommend that you contact your SAP administrator to review and stop the SAP DB services, first in the secondary node and then in the primary node: ```bash sudo HDB stop @@ -378,10 +409,10 @@ The SAP HANA resource can't be started by Pacemaker if there are `SYN` failures or: ```bash - sudo sapcontrol -nr -function stop + sudo sapcontrol -nr -function Stop ``` - d. After the stop operation finishes, start HANA DB in the primary node and then in the secondary node. Modify `` as appropriate. + d. After the stop operation finishes, start HANA DB in the primary node and then in the secondary node. Modify `` as appropriate. ```bash sudo HDB start @@ -389,7 +420,7 @@ The SAP HANA resource can't be started by Pacemaker if there are `SYN` failures or: ```bash - sudo sapcontrol -nr -function start + sudo sapcontrol -nr -function Start ``` 4. If the database nodes are still not synchronized, the SAP administrator should troubleshoot the issue by reviewing the SAP logs to make sure that the database nodes are correctly synchronized. @@ -403,50 +434,33 @@ The SAP HANA resource can't be started by Pacemaker if there are `SYN` failures If the database nodes are still not synchronized, the SAP administrator should troubleshoot the issue by reviewing the SAP logs to make sure that the database nodes are correctly synchronized: ```bash - sudo su - hn1adm -c "python /usr/sap/HN1/HDB03/exe/python_support/systemReplicationStatus.py" - ``` - ```output - | Host | Port | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary | Replication | Replication | Replication | - | | | | | | | Host | Port | Site ID | Site Name | Active Status | Mode | Status | Status Details | - | ------ | ----- | ------------ | --------- | ------- | --------- | --------- | --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- | - | node-0 | 30007 | xsengine | 2 | 1 | node-0 | sapn2 | 30007 | 2 | node-1 | YES | SYNC | ACTIVE | | - | node-0 | 30001 | nameserver | 1 | 1 | node-0 | sapn2 | 30001 | 2 | node-1 | YES | SYNC | ACTIVE | | - | node-0 | 30003 | indexserver | 3 | 1 | node-0 | sapn2 | 30003 | 2 | node-1 | YES | SYNC | ACTIVE | | - - status system replication site "2": ACTIVE - overall system replication status: ACTIVE - - Local System Replication State - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - - mode: PRIMARY - site id: 1 - site name: node-0 - ``` - -6. Verify the SAP HANA system replication status again by running the following command: - - ```bash - sudo SAPHanaSR-showAttr + sudo su - hn1adm -c "python /usr/sap/HN1/HDB00/exe/python_support/systemReplicationStatus.py" ``` ```output - Global cib-time maintenance - -------------------------------------------- - global Mon Oct 14 10:25:51 2024 false - - Hosts clone_state lpa_fh9_lpt node_state op_mode remoteHost roles score site srmode sync_state version vhost - ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - node-0 DEMOTED 10 online logreplay node-1 4:S:master1:master:worker:master 5 SITEA syncmem SOK 2.00.046.00.1581325702 node-0 - node-1 PROMOTED 1693237652 online logreplay node-0 4:P:master1:master:worker:master 150 SITEA syncmem PRIM 2.00.046.00.1581325702 node-1 + |Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replication |Secondary | + | | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status Details |Fully Synced | + |-------- |------------- |----- |------------ |--------- |------- |--------- |------------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ | + |SYSTEMDB |vm-mce-hana01 |30001 |nameserver | 1 | 1 |Site1 |vm-mce-hana02 | 30001 | 2 |Site2 |YES |SYNC |ACTIVE | | True | + |HN1 |vm-mce-hana01 |30007 |xsengine | 2 | 1 |Site1 |vm-mce-hana02 | 30007 | 2 |Site2 |YES |SYNC |ACTIVE | | True | + |HN1 |vm-mce-hana01 |30003 |indexserver | 3 | 1 |Site1 |vm-mce-hana02 | 30003 | 2 |Site2 |YES |SYNC |ACTIVE | | True | + + status system replication site "2": ACTIVE + overall system replication status: ACTIVE + + Local System Replication State + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + mode: PRIMARY + site id: 1 + site name: Site1 ``` - -7. Exit the SAP Admin account, and then remove the cluster from maintenance mode: +6. Exit the SAP Admin account, and then remove the cluster from maintenance mode: ```bash sudo pcs property set maintenance-mode=false ``` -8. Make sure that the Pacemaker cluster resources are running successfully. +7. Make sure that the Pacemaker cluster resources are running successfully. ### Scenario 3, Symptom 2: SAP HANA doesn't start because of replication failure @@ -458,49 +472,52 @@ When you run the `sudo pcs status --full` command, the `node attributes` status sudo pcs status --full ``` ```output - Node Attributes: - * Node: node-0 (1): - * hana_XXX_clone_state : UNDEFINED - * hana_XXX_op_mode : logreplay - * hana_XXX_remoteHost : node-1 - * hana_XXX_roles : 1:N:master1::worker: - * hana_XXX_site : SITE1 - * hana_XXX_srah : - - * hana_XXX_srmode : sync - * hana_XXX_version : 2.00.079.00 - * hana_XXX_vhost : node-0 - * lpa_XXX_lpt : 10 - * Node: node-1 (2): - * hana_XXX_clone_state : UNDEFINED - * hana_XXX_op_mode : logreplay - * hana_XXX_remoteHost : node-0 - * hana_XXX_roles : 4:N:master1:master:worker:master - * hana_XXX_site : SITE2 - * hana_XXX_sra : - - * hana_XXX_srah : - - * hana_XXX_srmode : sync - * hana_XXX_sync_state : PRIM - * hana_XXX_version : 2.00.079.00 - * hana_XXX_vhost : node-1 - * lpa_XXX_lpt : 1733552029 - * mastery-SAPHana_XXX_00 : 150 + Node Attributes: + * Node: vm-mce-hana01 (1): + * azName : vm-mce-hana01 + * azure-events-az_curNodeState : AVAILABLE + * hana_XXX_clone_state : PROMOTED + * hana_XXX_op_mode : logreplay + * hana_XXX_remoteHost : vm-mce-hana02 + * hana_XXX_roles : 4:P:master1:master:worker:master + * hana_XXX_site : Site1 + * hana_XXX_srah : - + * hana_XXX_srmode : sync + * hana_XXX_sync_state : PRIM + * hana_XXX_version : 2.00.083.00 + * hana_XXX_vhost : vm-mce-hana01 + * lpa_XXX_lpt : 1767866244 + * master-SAPHana_HN1_00 : 150 + * Node: vm-mce-hana02 (2): + * azName : vm-mce-hana02 + * hana_XXX_clone_state : UNDEFINED + * hana_XXX_op_mode : logreplay + * hana_XXX_remoteHost : vm-mce-hana01 + * hana_XXX_roles : 4:N:master1:master:worker:master + * hana_XXX_site : Site2 + * hana_XXX_srah : - + * hana_XXX_srmode : sync + * hana_XXX_sync_state : SOK + * hana_XXX_version : 2.00.083.00 + * hana_XXX_vhost : vm-mce-hana02 + * lpa_XXX_lpt : 30 ``` -This Migration summary indicates that the SAP HANA resource (SAPHana_XXX_00) failed to start on both nodes (node-0 and node-1). The fail count is set to 1000000 (infinity). +This Migration summary indicates that the SAP HANA resource (SAPHana_XXX_00) failed to start on both nodes (vm-mce-hana01 and vm-mce-hana02). The fail count is set to 1000000 (infinity). ```bash sudo pcs status ``` ```output Migration Summary: - * Node: node-0 (1): - * SAPHana_XXX_00: migration-threshold=1000000 fail-count=1000000 last-failure='Sat Dec 7 15:17:16 2024' - * Node: node-1 (2): - * SAPHana_XXX_00: migration-threshold=1000000 fail-count=1000000 last-failure='Sat Dec 7 15:48:57 2024' + * Node: vm-mce-hana01 (1): + * SAPHana_HN1_00: migration-threshold=5000 fail-count=1 last-failure='Thu Jan 8 07:56:30 2026' + * Node: vm-mce-hana02 (2): + * SAPHana_HN1_00: migration-threshold=5000 fail-count=1000000 last-failure='Thu Jan 8 07:48:34 2026' Failed Resource Actions: - * SAPHana_XXX_00_start_0 on node-1 'not running' (7): call=74, status='complete', last-rc-change='Sat Dec 7 15:17:14 2024', queued=0ms, exec=1715ms - * SAPHana_XXX_00_start_0 on node-0 'not running' (7): call=30, status='complete', last-rc-change='Sat Dec 7 15:49:12 2024', queued=0ms, exec=1680ms + * SAPHana_HN1_00_monitor_59000 on vm-mce-hana01 'promoted (failed)' (9): call=41, status='complete', last-rc-change='Thu Jan 8 07:56:30 2026', queued=0ms, exec=0ms + * SAPHana_HN1_00_start_0 on vm-mce-hana02 'not running' (7): call=38, status='complete', last-rc-change='Thu Jan 8 07:48:34 2026', queued=0ms, exec=745ms ``` ### Cause for scenario 3, symptom 2 @@ -532,10 +549,10 @@ This issue frequently occurs if the database is modified (manually stopped or st sudo hdbnsutil -sr_enable --name= ``` -5. Initialize replication on secondary node. Replace ``, `` and `` as appropriate. +5. Initialize replication on secondary node. Replace ``, `` and `` as appropriate. ```bash - sudo hdbnsutil -sr_register --remoteHost= --remoteInstance= --replicationMode=syncmem --name= + sudo hdbnsutil -sr_register --remoteHost= --remoteInstance= --replicationMode=sync --name= ``` 6. Manually start the database outside the cluster on the secondary node: @@ -572,11 +589,11 @@ SAP HANA Resource Start Failure with error message as shown: 'FAIL: process hdbdaemon HDB Daemon not running' ``` -The `sudo pcs status --full` command can also be used to view this error, as it also resulted the SAP HANA Pacemaker cluster resources failover error. +The `sudo pcs status --full` command can also be used to view this error, as it resulted the SAP HANA Pacemaker cluster resources failover error. ```output Failed Resource Actions: - * SAPHana_XXX_00_start_0 on node-0 'error' (1): call=44, status='complete', exitreason='', last-rc-change='2024-07-07 06:15:45 -08:00', queued=0ms, exec=51659ms + * SAPHana_XXX_00_monitor_0 on vm-mce-hana01 'error' (1): call=25, status='complete', last-rc-change='Thu Jan 2 08:01:54 2026', queued=0ms, exec=1604ms ``` ### Cause for scenario 3, symptom 3 @@ -584,16 +601,17 @@ Failed Resource Actions: A review of the `/var/log/messages` log indicates that `hbddaemon` didn't start because of the following error: ```output -Jun 7 02:25:09 node-0 SAPHana(SAPHana_XXX_00)[12336]: ERROR: ACT: SAPHana Instance ECR-HDB00 start failed: #01201.03.2024 02:25:09#012WaitforStarted#012FAIL: process hdbdaemon HDB Daemon not running -Jun 7 02:25:09 node-0 SAPHana(SAPHana_XXX_00)[12336]: INFO: RA ==== end action start_clone with rc=1 (0.154.0) (25s)==== -Jun 7 02:25:09 node-0 pacemaker-execd[8567]: notice: SAPHana_XXX_00_start_0[12336] error output [ tput: No value for $TERM and no -T specified ] -Jun 7 02:25:09 node-0 pacemaker-execd[8567]: notice: SAPHana_XXX_00_start_0[12336] error output [ tput: No value for $TERM and no -T specified ] -Jun 7 02:25:09 node-0 pacemaker-execd[8567]: notice: SAPHana_XXX_00_start_0[12336] error output [ tput: No value for $TERM and no -T specified ] -Jun 7 02:25:09 node-0 pacemaker-execd[8567]: notice: SAPHana_XXX_00_start_0[12336] error output [ Error performing operation: No such device or address ] -Jun 7 02:25:09 node-0 pacemaker-controld[8570]: notice: Result of start operation for SAPHana_XXX_00 on node-0: error -Jun 7 02:25:09 node-0 pacemaker-controld[8570]: notice: node-0-SAPHana_XXX_00_start_0:33 [ tput: No value for $TERM and no -T specified\ntput: No value for $TERM and no -T specified\ntput: No value for $TERM and no -T specified\nError performing operation: No such device or address\n ] -Jun 7 02:25:09 node-0 pacemaker-attrd[8568]: notice: Setting fail-count-SAPHana_XXX_00#start_0[node-0]: (unset) -> INFINITY -Jun 7 02:25:09 node-0 pacemaker-attrd[8568]: notice: Setting last-failure-SAPHana_XXX_00#start_0[node-0]: (unset) -> 1709288709 +Jan 2 08:01:54 vm-mce-hana01 SAPHana(SAPHana_HXXX_00)[5296]: ERROR: ACT: SAP HANA: Instance HN1-HDB00 start failed: #01202.01.2026 08:01:54#012WaitforStarted#012FAIL: process hdbdaemon HDB Daemon not running +Jan 2 08:01:55 vm-mce-hana01 SAPHana(SAPHana_XXX_00)[5314]: INFO: RA ==== end action start_clone with rc=1 (0.162.3) (22s)==== +Jan 2 08:01:55 vm-mce-hana01 SAPHana(SAPHana_XXX_00)[5320]: error output [ tput: No value for $TERM and no -T specified ] +Jan 2 08:01:55 vm-mce-hana01 SAPHana(SAPHana_XXX_00)[5324]: error output [ tput: No value for $TERM and no -T specified ] +Jan 2 08:01:55 vm-mce-hana01 SAPHana(SAPHana_XXX_00)[5328]: error output [ tput: No value for $TERM and no -T specified ] +Jan 2 08:01:55 vm-mce-hana01 pacemaker-attrd[2161]: error output [ Error performing operation: No such device or address ] +Jan 2 08:01:55 vm-mce-hana01 SAPHana(SAPHana_XXX_00)[5339]: INFO: RA ==== end action start_clone with rc=1 (0.162.3) (22s)==== +Jan 2 08:01:55 vm-mce-hana01 pacemaker-controld[2163]: notice: Result of start operation for SAPHana_XXX_00 on vm-mce-hana01: error +Jan 2 08:01:55 vm-mce-hana01 pacemaker-controld[2163]: notice: node-0-SAPHana_XXX_00_start_0:33 [ tput: No value for $TERM and no -T specified\ntput: No value for $TERM and no -T specified\ntput: No value for $TERM and no -T specified\nError performing operation: No such device or address\n ] +Jan 2 08:01:55 vm-mce-hana01 pacemaker-attrd[2161]: notice: Setting last-failure-SAPHana_XXX_00#start_0[vm-mce-hana01] in instance_attributes: (unset) -> 1767340915 +Jan 2 08:01:55 vm-mce-hana01 pacemaker-attrd[2161]: notice: Setting fail-count-SAPHana_XXX_00#start_0[vm-mce-hana01] in instance_attributes: (unset) -> INFINITY ``` ### Resolution for scenario 3, symptom 3 @@ -607,8 +625,8 @@ See the following Red Hat article: [SAPHana Resource Start Failure with Error 'F ASCS and ERS instances can't start under cluster control. The `/var/log/messages` log indicates The following errors: ```output -Jun 9 23:29:16 nodeci SAPRh2_10[340480]: Unable to change to Directory /usr/sap/RH2/ERS10/work. (Error 2 No such file or directory) [ntservsserver.cpp 3845] -Jun 9 23:29:16 nodeci SAPRH2_00[340486]: Unable to change to Directory /usr/sap/Rh2/ASCS00/work. (Error 2 No such file or directory) [ntservsserver.cpp 3845] +Jan 7 23:29:16 vm-mce-hana01 SAPHN1_01[340480]: Unable to change to Directory /usr/sap/RH2/ERS01/work. (Error 2 No such file or directory) [ntservsserver.cpp 3845] +Jan 7 23:29:16 vm-mce-hana01 SAPHN1_00[340486]: Unable to change to Directory /usr/sap/Rh2/ASCS01/work. (Error 2 No such file or directory) [ntservsserver.cpp 3845] ``` ### Cause for scenario 4 @@ -634,8 +652,8 @@ Because of incorrect `InstanceName` and `START_PROFILE` attributes, the SAP inst sudo cat /usr/sap/sapservices ``` ```output - LD_LIBRARY_PATH=/usr/sap/RH2/ASCS00/exe:$LD_LIBRARY_PATH;export LD_LIBRARY_PATH;/usr/sap/RH2/ASCS00/exe/sapstartsrv pf=/usr/sap/RH2/SYS/profile/START_ASCS00_nodeci -D -u rh2adm - LD_LIBRARY_PATH=/usr/sap/RH2/ERS10/exe:$LD_LIBRARY_PATH;export LD_LIBRARY_PATH;/usr/sap/RH2/ERS10/exe/sapstartsrv pf=/usr/sap/RH2/ERS10/profile/START_ERS10_nodersvi -D -u rh2adm + LD_LIBRARY_PATH=/usr/sap/HN1/ASCS01/exe:$LD_LIBRARY_PATH;export LD_LIBRARY_PATH;/usr/sap/RH2/ASCS00/exe/sapstartsrv pf=/usr/sap/HN1/SYS/profile/START_ASCS01_nodeci -D -u rh2adm + LD_LIBRARY_PATH=/usr/sap/HN1/ERS01/exe:$LD_LIBRARY_PATH;export LD_LIBRARY_PATH;/usr/sap/RH2/ERS10/exe/sapstartsrv pf=/usr/sap/HN1/ERS01/profile/START_ERS01_nodersvi -D -u rh2adm ``` 4. Correct the `InstanceName` and `START_PROFILE` attribute values in the `SAPInstance` cluster configuration resource agent. @@ -643,15 +661,15 @@ Because of incorrect `InstanceName` and `START_PROFILE` attributes, the SAP inst **Example:** ```bash - sudo pcs resource update ASCS_RH2_ASCS00 InstanceName=RH2_ASCS00_nodeci START_PROFILE=/usr/sap/RH2/SYS/profile/START_ASCS00_nodeci + sudo pcs resource update ASCS_RH2_ASCS01 InstanceName=HN1_ASCS01_nodeci START_PROFILE=/usr/sap/HN1/SYS/profile/START_ASCS01_nodeci ``` - Replace `RH2_ASCS00_nodeci` and `/usr/sap/RH2/SYS/profile/START_ASCS00_nodeci` with the appropriate values. + Replace `HN1_ASCS01_nodeci` and `/usr/sap/HN1/SYS/profile/START_ASCS01_nodeci` with the appropriate values. ```bash - sudo pcs resource update ERS_RH2_ERS10 InstanceName=RH2_ERS10_nodersvi START_PROFILE=/usr/sap/RH2/ERS10/profile/START_ERS10_nodersvi + sudo pcs resource update ERS_HN1_ERS01 InstanceName=HN1_ERS01_nodersvi START_PROFILE=/usr/sap/HN1/ERS01/profile/START_ERS01_nodersvi ``` - Replace `RH2_ERS10_nodersvi` and `/usr/sap/RH2/ERS10/profile/START_ERS10_nodersvi` with the appropriate values. + Replace `HN1_ERS01_nodersvi` and `/usr/sap/HN1/ERS01/profile/START_ERS01_nodersvi` with the appropriate values. 5. Remove the cluster from maintenance mode: @@ -667,48 +685,48 @@ After the fencing operation is finished, the affected node typically doesn't rej ### Cause for scenario 5 -After the node is fenced and restarted and has restarted its cluster services, it subsequently receives a message that states, `We were allegedly just fenced`. This causes it to shut down its Pacemaker and Corosync services and prevent the cluster from starting. Node1 initiates a STONITH action against node2. At `03:27:23`, when the network issue is resolved, node2 rejoins the Corosync membership. Consequently, a new two-node membership is established, as shown in `/var/log/messages` for node1: +After the node is fenced and restarted and has restarted its cluster services, it subsequently receives a message that states, `We were allegedly just fenced`. This causes it to shut down its Pacemaker and Corosync services and prevent the cluster from starting. `vm-mce-hana01` initiates a STONITH action against vm-mce-hana02. At `03:27:23`, when the network issue is resolved, vm-mce-hana02 rejoins the Corosync membership. Consequently, a new two-node membership is established, as shown in `/var/log/messages` for `vm-mce-hana01`: ```output -Feb 20 03:26:56 node1 corosync[1722]: [TOTEM ] A processor failed, forming new configuration. -Feb 20 03:27:23 node1 corosync[1722]: [TOTEM ] A new membership (1.116f4) was formed. Members left: 2 -Feb 20 03:27:24 node1 corosync[1722]: [QUORUM] Members[1]: 1 +Feb 20 03:26:56 vm-mce-hana01 corosync[1722]: [TOTEM ] A processor failed, forming new configuration. +Feb 20 03:27:23 vm-mce-hana01 corosync[1722]: [TOTEM ] A new membership (1.116f4) was formed. Members left: 2 +Feb 20 03:27:24 vm-mce-hana01 corosync[1722]: [QUORUM] Members[1]: 1 ... -Feb 20 03:27:24 node1 pacemaker-schedulerd[1739]: warning: Cluster node node2 will be fenced: peer is no longer part of the cluster +Feb 20 03:27:24 vm-mce-hana01 pacemaker-schedulerd[1739]: warning: Cluster node vm-mce-hana02 will be fenced: peer is no longer part of the cluster ... -Feb 20 03:27:24 node1 pacemaker-fenced[1736]: notice: Delaying 'reboot' action targeting node2 using for 20s -Feb 20 03:27:25 node1 corosync[1722]: [TOTEM ] A new membership (1.116f8) was formed. Members joined: 2 -Feb 20 03:27:25 node1 corosync[1722]: [QUORUM] Members[2]: 1 2 -Feb 20 03:27:25 node1 corosync[1722]: [MAIN ] Completed service synchronization, ready to provide service. +Feb 20 03:27:24 vm-mce-hana01 pacemaker-fenced[1736]: notice: Delaying 'reboot' action targeting vm-mce-hana02 using for 20s +Feb 20 03:27:25 vm-mce-hana01 corosync[1722]: [TOTEM ] A new membership (1.116f8) was formed. Members joined: 2 +Feb 20 03:27:25 vm-mce-hana01 corosync[1722]: [QUORUM] Members[2]: 1 2 +Feb 20 03:27:25 vm-mce-hana01 corosync[1722]: [MAIN ] Completed service synchronization, ready to provide service. ``` -Node1 received confirmation that node2 was successfully restarted, as shown in `/var/log/messages` for node2. +vm-mce-hana01 received confirmation that vm-mce-hana02 was successfully restarted, as shown in `/var/log/messages` for vm-mce-hana02. ```output -Feb 20 03:27:46 node1 pacemaker-fenced[1736]: notice: Operation 'reboot' [43895] (call 28 from pacemaker-controld.1740) targeting node2 using xvm2 returned 0 (OK) +Feb 20 03:27:46 vm-mce-hana01 pacemaker-fenced[1736]: notice: Operation 'reboot' [43895] (call 28 from pacemaker-controld.1740) targeting vm-mce-hana02 using xvm2 returned 0 (OK) ``` -To fully complete the STONITH action, the system had to deliver the confirmation message to every node. Because node2 rejoined the group at `03:27:25` and no new membership that excluded node2 was yet formed because of the token and consensus timeouts not expiring, the confirmation message is delayed until node2 restarts its cluster services after startup. Upon receiving the message, node2 recognizes that it has been fenced and, consequently, shut down its services as shown: +To fully complete the STONITH action, the system had to deliver the confirmation message to every node. Because vm-mce-hana02 rejoined the group at `03:27:25` and no new membership that excluded vm-mce-hana02 was yet formed because of the token and consensus timeouts not expiring, the confirmation message is delayed until vm-mce-hana02 restarts its cluster services after startup. Upon receiving the message, vm-mce-hana02 recognizes that it has been fenced and, consequently, shut down its services as shown: -`/var/log/messages` in node1: +`/var/log/messages` in vm-mce-hana01: ```output -Feb 20 03:29:02 node1 corosync[1722]: [TOTEM ] A processor failed, forming new configuration. -Feb 20 03:29:10 node1 corosync[1722]: [TOTEM ] A new membership (1.116fc) was formed. Members joined: 2 left: 2 -Feb 20 03:29:10 node1 corosync[1722]: [QUORUM] Members[2]: 1 2 -Feb 20 03:29:10 node1 pacemaker-fenced[1736]: notice: Operation 'reboot' targeting node2 by node1 for pacemaker-controld.1740@node1: OK -Feb 20 03:29:10 node1 pacemaker-controld[1740]: notice: Peer node2 was terminated (reboot) by node1 on behalf of pacemaker-controld.1740: OK +Feb 20 03:29:02 vm-mce-hana01 corosync[1722]: [TOTEM ] A processor failed, forming new configuration. +Feb 20 03:29:10 vm-mce-hana01 corosync[1722]: [TOTEM ] A new membership (1.116fc) was formed. Members joined: 2 left: 2 +Feb 20 03:29:10 vm-mce-hana01 corosync[1722]: [QUORUM] Members[2]: 1 2 +Feb 20 03:29:10 vm-mce-hana01 pacemaker-fenced[1736]: notice: Operation 'reboot' targeting vm-mce-hana02 by vm-mce-hana01 for pacemaker-controld.1740@vm-mce-hana01: OK +Feb 20 03:29:10 vm-mce-hana01 pacemaker-controld[1740]: notice: Peer vm-mce-hana02 was terminated (reboot) by vm-mce-hana01 on behalf of pacemaker-controld.1740: OK ... -Feb 20 03:29:11 node1 corosync[1722]: [CFG ] Node 2 was shut down by sysadmin -Feb 20 03:29:11 node1 corosync[1722]: [TOTEM ] A new membership (1.11700) was formed. Members left: 2 -Feb 20 03:29:11 node1 corosync[1722]: [QUORUM] Members[1]: 1 -Feb 20 03:29:11 node1 corosync[1722]: [MAIN ] Completed service synchronization, ready to provide service. +Feb 20 03:29:11 vm-mce-hana01 corosync[1722]: [CFG ] Node 2 was shut down by sysadmin +Feb 20 03:29:11 vm-mce-hana01 corosync[1722]: [TOTEM ] A new membership (1.11700) was formed. Members left: 2 +Feb 20 03:29:11 vm-mce-hana01 corosync[1722]: [QUORUM] Members[1]: 1 +Feb 20 03:29:11 vm-mce-hana01 corosync[1722]: [MAIN ] Completed service synchronization, ready to provide service. ``` -`/var/log/messages` in node2: +`/var/log/messages` in vm-mce-hana02: ```output -Feb 20 03:29:11 [1155] node2 corosync notice [TOTEM ] A new membership (1.116fc) was formed. Members joined: 1 -Feb 20 03:29:11 [1155] node2 corosync notice [QUORUM] Members[2]: 1 2 -Feb 20 03:29:09 node2 pacemaker-controld [1323] (tengine_stonith_notify) crit: We were allegedly just fenced by node1 for node1! +Feb 20 03:29:11 [1155] vm-mce-hana02 corosync notice [TOTEM ] A new membership (1.116fc) was formed. Members joined: 1 +Feb 20 03:29:11 [1155] vm-mce-hana02 corosync notice [QUORUM] Members[2]: 1 2 +Feb 20 03:29:09 vm-mce-hana02 pacemaker-controld [1323] (tengine_stonith_notify) crit: We were allegedly just fenced by vm-mce-hana01 for vm-mce-hana01! ``` ### Resolution for scenario 5