Skip to content

Commit abc740d

Browse files
authored
Fix typos and improve clarity in documentation
1 parent dcb97a4 commit abc740d

1 file changed

Lines changed: 15 additions & 15 deletions

File tree

support/azure/virtual-machines/linux/troubleshoot-unexpected-node-reboots-pacemaker-rhel.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Troubleshoot Unexpected Node Reboots in Azure Linux RHEL Pacemaker Cluster
2+
title: Troubleshoot Unexpected Node Restarts in Azure Linux RHEL Pacemaker Cluster
33
description: This article provides troubleshooting steps for resolving unexpected node restarts in RHEL Linux Pacemaker Clusters
44
author: rnirek
55
ms.author: rnirek
@@ -28,7 +28,7 @@ This article provides guidance for troubleshooting, analysis, and resolution of
2828
## Scenario 1: Network outage
2929

3030
- The cluster nodes are experiencing `corosync` communication errors. This causes continuous retransmissions because of an inability to establish communication between nodes. This issue triggers application time-outs and ultimately causes node fencing and subsequent restarts.
31-
- Services that are dependent on network connectivity, such as `waagent`, generate communication-related error messages in the logs. This further indicates network-related disruptions.
31+
- Services that are dependent on network connectivity, such as `waagent`, generate communication-related error entries in the logs. These entries further indicate network-related disruptions.
3232

3333
The following messages are logged in the `/var/log/messages` log:
3434

@@ -57,7 +57,7 @@ For further assistance or other inquiries, you can open a support request by fol
5757

5858
## Scenario 2: Cluster misconfiguration
5959

60-
The cluster nodes experience unexpected failovers or node restarts. These are often caused by cluster misconfigurations that affect the stability of Pacemaker Clusters.
60+
The cluster nodes experience unexpected failovers or node restarts. These issues often occur becuase of cluster misconfigurations that affect the stability of Pacemaker Clusters.
6161

6262
To review the cluster configuration, run the following command:
6363
```bash
@@ -69,19 +69,19 @@ sudo pcs configure show
6969
Unexpected restarts in an Azure SUSE Pacemaker cluster often occur because of misconfigurations:
7070

7171
- Incorrect STONITH configuration:
72-
- No STONITH or fencing misconfigured: Not configuring STONITH correctly can cause nodes to be marked as unhealthy and trigger unnecessary restarts.
73-
- Wrong STONITH resource settings: Incorrect parameters for Azure fencing agents, such as `fence_azure_arm`, can cause nodes to restart unexpectedly during failovers.
72+
- No STONITH or fencing misconfigured: Not configuring STONITH correctly could cause nodes to be marked as unhealthy and trigger unnecessary restarts.
73+
- Wrong STONITH resource settings: Incorrect parameters for Azure fencing agents, such as `fence_azure_arm`, could cause nodes to restart unexpectedly during failovers.
7474
- Insufficient permissions: The Azure resource group or credentials that are used for fencing might lack required permissions and cause STONITH failures.
7575

7676
- Missing or incorrect resource constraints:
77-
Poorly set constraints can cause resources to be redistributed unnecessarily. This can cause node overload and restarts. Misaligned resource dependency configurations can cause nodes to fail or go into a restart loop.
77+
Poorly set constraints could cause resources to be redistributed unnecessarily. This situation, in turn, could cause node overload and restarts. Misaligned resource dependency configurations could cause nodes to fail or go into a restart loop.
7878

7979
- Cluster threshold and time-out misconfigurations:
8080
- `failure-time-out`, `migration-threshold`, or `monitor-time-out` values might cause nodes to be prematurely restarted.
81-
- Heartbeat Timeout Settings: Incorrect `corosync` time-out settings for heartbeat intervals can cause nodes to assume that the other nodes are offline. This can trigger unnecessary restarts.
81+
- Heartbeat Timeout Settings: Incorrect `corosync` time-out settings for heartbeat intervals could cause nodes to assume that the other nodes are offline. This situation can trigger unnecessary restarts.
8282

8383
- Lack of proper health checks:
84-
Not setting correct health check intervals for critical services such as SAP HANA (High-performance ANalytic Application) can cause resource or node failures.
84+
Not setting correct health check intervals for critical services such as SAP HANA (High-performance ANalytic Application) could cause resource or node failures.
8585

8686
- Resource agent misconfiguration:
8787
- Custom resource agents misaligned with cluster: Resource agents that don't adhere to Pacemaker standards can create unpredictable behavior, including node restarts.
@@ -109,9 +109,9 @@ Unexpected restarts in an Azure SUSE Pacemaker cluster often occur because of mi
109109
```
110110

111111
> [!IMPORTANT]
112-
> When you troubleshoot unexpected node restarts or failures, it's crucial to assess the effect of security tools that are installed on the system. These tools might interfere with cluster operations by blocking essential processes or modifying system files. This could cause instability, unexpected time-outs, or node reboots.
112+
> When you troubleshoot unexpected node restarts or failures, it's crucial to assess the effect of security tools that are installed on the system. These tools might interfere with cluster operations by blocking essential processes or modifying system files. This situation could cause instability, unexpected time-outs, or node restarts.
113113
>
114-
> To mitigate such risks, we recommend that you disable security tools on systems that are running a Pacemaker Cluster, or make sure that appropriate exclusions are configured to prevent conflicts with the cluster and its associated applications.
114+
> To mitigate such risks, we recommend that you disable security tools on systems that are running a Pacemaker Cluster. Alternatively, you can configure appropriate exclusions to prevent conflicts with the cluster and its associated applications.
115115
116116
## Scenario 3: Migration from on-premises to Azure
117117
@@ -122,9 +122,9 @@ When you migrate a SUSE Pacemaker cluster from on-premises to Azure, unexpected
122122
The following are common mistakes that are made in this category:
123123
124124
- Incomplete or incorrect STONITH configuration:
125-
- No STONITH or fencing misfconfigured: Not configuring STONITH correctly can cause nodes to be marked as unhealthy and trigger unnecessary restarts.
126-
- Wrong STONITH resource settings: Incorrect parameters for Azure fencing agents such as `fence_azure_arm` can cause nodes to restart unexpectedly during failovers.
127-
- Insufficient permissions: The Azure resource group or credentials that are used for fencing might lack required permissions and cause STONITH failures. Key Azure-specific parameters, such as subscription ID, resource group, or VM (Virtual Machine) names, must be correctly configured in the fencing agent. Omissions here can cause fencing failures and unexpected restarts.
125+
- No STONITH or fencing misfconfigured: Not configuring STONITH correctly could cause nodes to be marked as unhealthy and trigger unnecessary restarts.
126+
- Wrong STONITH resource settings: Incorrect parameters for Azure fencing agents such as `fence_azure_arm` could cause nodes to restart unexpectedly during failovers.
127+
- Insufficient permissions: The Azure resource group or credentials that are used for fencing might lack required permissions and cause STONITH failures. Key Azure-specific parameters, such as subscription ID, resource group, or VM (Virtual Machine) names, must be correctly configured in the fencing agent. Omissions here could cause fencing failures and unexpected restarts.
128128
129129
For more information, see [Troubleshoot Azure Fence Agent startup issues in RHEL](troubleshoot-azure-fence-agent-rhel.md) and [Troubleshoot SBD service failure in RHEL Pacemaker clusters](troubleshoot-sbd-issues-rhel.md)
130130
@@ -161,7 +161,7 @@ The logs indicate that the STONITH device, `python-user`, triggers the shutdown
161161

162162
### Cause of scenario 4
163163

164-
During an outage, such as a platform or network interruption of the kind that's discussed in [Scenario 1](#scenario-1-network-outage), both nodes try to write to the STONITH device to fence the other because they lose the totem token. Typically, the STONITH device takes instruction from the first available node to write on it in order to shut down the other node. If both nodes are allowed to write to the STONITH device, they might terminate each other.
164+
During an outage, such as a platform or network interruption [see Scenario 1](#scenario-1-network-outage), both nodes try to write to the STONITH device to fence the other because they lose the totem token. Typically, the STONITH device takes instruction from the first available node to write on it in order to shut down the other node. If both nodes are allowed to write to the STONITH device, they might terminate each other.
165165

166166
### Resolution for scenario 4
167167

@@ -247,7 +247,7 @@ The `ASCS/ERS` resource is considered to be the application for SAP Netweaver cl
247247
- After you rule out external factors, such as platform or network outages, we recommend that you engage the application vendor for trace call analysis and log review.
248248

249249
## Next steps
250-
For additional help, open a support request, and submit your request by attaching [sosreport](https://access.redhat.com/solutions/3592) logs for troubleshooting.
250+
For more help, open a support request, and submit your request by attaching [sosreport](https://access.redhat.com/solutions/3592) logs for troubleshooting.
251251

252252
[!INCLUDE [Third-party disclaimer](../../../includes/third-party-disclaimer.md)]
253253

0 commit comments

Comments
 (0)