You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: support/azure/virtual-machines/linux/troubleshoot-rhel-pacemaker-cluster-services-resources-startup-issues.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ This article discusses the most common causes of startup issues in RedHat Enterp
19
19
20
20
## Scenario 1: Can't start cluster service because of quorum
21
21
22
-
### Symptom
22
+
### Symptom for scenario 1
23
23
24
24
- Cluster node don't join a cluster after a cluster restart
25
25
- Nodes are reported as `UNCLEAN (offline)`
@@ -55,7 +55,7 @@ Check for the error: Corosync quorum is not configured in /var/log/messeges:
55
55
Jun 16 11:17:53 node-0 pacemaker-controld[509433]: error: Corosync quorum is not configured
56
56
```
57
57
58
-
### Cause
58
+
### Cause for scenario 1
59
59
60
60
The **VoteQuorum** service is a component of the corosync project. To prevent split-brain scenarios, this service can be optionally loaded into a corosync cluster's nodes. Every system in the cluster is given a certain number of votes to achieve this quorum. This makes sure that cluster actions can occur only if a majority of votes are cast. Either every node or no node must have the service loaded. The outcomes are uncertain if the service is loaded into a subset of cluster nodes.
61
61
@@ -69,7 +69,7 @@ The following `/etc/corosync/corosync.conf` extract enables **VoteQuorum** servi
69
69
70
70
**VoteQuorum** reads its configuration from `/etc/corosync/corosync.conf`. Some values can be changed at runtime and others are read only at corosync startup. It's important that those values are consistent across all nodes that are participating in the cluster. Otherwise, vote quorum behavior is unpredictable.
71
71
72
-
### Resolution
72
+
### Resolution for scenario 1
73
73
74
74
1. As a precaution, make a full backup or take a snapshot before you make any changes. For more information, see [Azure VM backup](/azure/backup/backup-azure-vms-introduction).
A virtual IP resource (`IPaddr2` resource) didn't start or stop in Pacemaker.
151
151
@@ -165,7 +165,7 @@ sudo pcs status
165
165
vip_HN1_03_start_0 on node-1 'unknown error' (1): call=30, status=complete, exit-reason='[findif] failed', last-rc-change='Thu Jan 07 17:25:52 2025', queued=0ms, exec=57ms
166
166
```
167
167
168
-
### Cause
168
+
### Cause for scenario 2
169
169
170
170
1. To choose which network adapter (NIC) to start the `IPAddr2` resource on, `IPaddr2` invokes the `findif()` function, as defined in `/usr/lib/ocf/resource.d/heartbeat/IPaddr2` that is contained in the `resource-agents` package.
171
171
@@ -206,7 +206,7 @@ sudo ip -o -f inet route list match 172.17.10.10/24 scope link
206
206
> [!Note]
207
207
> Replace `172.17.10.10/24` and `ens6` as appropriate.
208
208
209
-
### Resolution
209
+
### Resolution for scenario 2
210
210
211
211
If a route that matches the `VIP` isn't in the default routing table, you can specify the `NIC` name in the Pacemaker resource so that it can be configured to bypass the check:
212
212
@@ -311,7 +311,7 @@ sudo pcs status
311
311
last-rc-change='Sat May 22 09:36:32 2021', queued=0ms, exec=3093ms
312
312
```
313
313
314
-
### Cause 1
314
+
### Cause for scenario 3, symptom 1
315
315
316
316
Pacemaker can't start the SAP HANA resource if there are `SYN` failures between the primary and secondary nodes:
The SAP HANA resource can't be started by Pacemaker if there are `SYN` failures between the primary and secondary cluster nodes. To mitigate this issue, you must manually enable `SYN` between the primary and secondary nodes.
335
335
@@ -509,11 +509,11 @@ Failed Resource Actions:
509
509
* SAPHana_XXX_00_start_0 on node-0 'not running' (7): call=30, status='complete', last-rc-change='Sat Dec 7 15:49:12 2024', queued=0ms, exec=1680ms
510
510
```
511
511
512
-
### Cause 2
512
+
### Cause for scenario 3, symptom 2
513
513
514
514
- This issue frequently occurs if the database is modified (manually stopped or started, replication is paused, and so on) while the cluster is in maintenance mode.
515
515
516
-
### Resolution 2
516
+
### Resolution for scenario 3, symptom 2
517
517
518
518
> [!Note]
519
519
> Steps 1 through 5 should be performed by an SAP administrator.
The output shows that there are no traces found other than the reason that 'hbddaemon' didn't start. After evaluating the output, SAP vendor support should study the application logs further to understand why the SAP application didn't start.
613
613
614
614
For more information about this scenario, see the following Red Hat article: [SAPHana Resource Start Failure with Error 'FAIL: process hdbdaemon HDB Daemon not running'](https://access.redhat.com/solutions/7058526).
615
615
616
616
## Scenario 4: Issue that affect the ASCS and ERS resources
617
617
618
-
### Symptom
618
+
### Symptom for scenario 4
619
619
620
620
ASCS and ERS instances can't start under cluster control. The `/var/log/messages` log indicates The following errors:
621
621
@@ -624,11 +624,11 @@ Jun 9 23:29:16 nodeci SAPRh2_10[340480]: Unable to change to Directory /usr/sap
624
624
Jun 9 23:29:16 nodeci SAPRH2_00[340486]: Unable to change to Directory /usr/sap/Rh2/ASCS00/work. (Error 2 No such file or directory) [ntservsserver.cpp 3845]
625
625
```
626
626
627
-
### Cause
627
+
### Cause for scenario 4
628
628
629
629
Because of incorrect `InstanceName` and `START_PROFILE` attributes, the SAP instances such as ASCS and ERS, didn't start under cluster control.
630
630
631
-
### Resolution
631
+
### Resolution for scenario 4
632
632
633
633
> [!Note]
634
634
> This resolution is applicable if `InstanceName` and `START_PROFILE` are separate files.
0 commit comments