Skip to content

Commit 39e6392

Browse files
authored
Merge pull request #313427 from dennispadia/dp-suse-scaleout
Changes in HANA Scale-out document
2 parents c85c997 + d8a88ea commit 39e6392

2 files changed

Lines changed: 489 additions & 412 deletions

File tree

articles/sap/workloads/high-availability-guide-suse-pacemaker.md

Lines changed: 70 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: sap-on-azure
88
ms.subservice: sap-vm-workloads
99
ms.topic: article
1010
ms.custom: devx-track-azurepowershell, linux-related-content
11-
ms.date: 03/12/2026
11+
ms.date: 03/19/2026
1212
ms.author: radeltch
1313
# Customer intent: "As a system administrator, I want to set up Pacemaker with fencing on SUSE Linux Enterprise Server in Azure, so that I can ensure high availability and reliability for my applications running in the cloud."
1414
---
@@ -871,37 +871,52 @@ Make sure to assign the custom role to the service principal at all VM (cluster
871871
872872
```bash
873873
sudo crm cluster init
874-
# ! NTP is not configured to start at system boot.
875-
# Do you want to continue anyway (y/n)? y
876-
# /root/.ssh/id_rsa already exists - overwrite (y/n)? n
877-
# Address for ring0 [10.0.0.6] Select Enter
878-
# Port for ring0 [5405] Select Enter
879-
# SBD is already configured to use /dev/disk/by-id/scsi-36001405639245768818458b930abdf69;/dev/disk/by-id/scsi-36001405afb0ba8d3a3c413b8cc2cca03;/dev/disk/by-id/scsi-36001405f88f30e7c9684678bc87fe7bf - overwrite (y/n)? n
874+
875+
# INFO: Detected "microsoft-azure" platform
876+
# INFO: Loading "default" profile from /etc/crm/profiles.yml
877+
# INFO: Loading "microsoft-azure" profile from /etc/crm/profiles.yml
878+
# INFO: The user 'hacluster' will have the login shell configuration changed to /bin/bash
879+
# Continue (y/n)? y
880+
# INFO: Address for ring0 [10.0.0.6] Select Enter
881+
# INFO: Port for ring0 [5405] Select Enter
882+
# INFO: Do you wish to use SBD (y/n)? y
883+
# INFO: SBD is already configured to use /dev/disk/by-id/scsi-36001405639245768818458b930abdf69;/dev/disk/by-id/scsi-36001405afb0ba8d3a3c413b8cc2cca03;/dev/disk/by-id/scsi-36001405f88f30e7c9684678bc87fe7bf - overwrite (y/n)? n
880884
# Do you wish to configure an administration IP (y/n)? n
885+
# INFO: Do you wish to configure a virtual IP address (y/n)? n
881886
```
882887
883888
- If you're *not* using SBD devices for fencing:
884889

885890
```bash
886891
sudo crm cluster init
887-
# ! NTP is not configured to start at system boot.
888-
# Do you want to continue anyway (y/n)? y
889-
# /root/.ssh/id_rsa already exists - overwrite (y/n)? n
890-
# Address for ring0 [10.0.0.6] Select Enter
891-
# Port for ring0 [5405] Select Enter
892-
# Do you wish to use SBD (y/n)? n
892+
893+
# INFO: Detected "microsoft-azure" platform
894+
# INFO: Loading "default" profile from /etc/crm/profiles.yml
895+
# INFO: Loading "microsoft-azure" profile from /etc/crm/profiles.yml
896+
# INFO: The user 'hacluster' will have the login shell configuration changed to /bin/bash
897+
# Continue (y/n)? y
898+
# INFO: Address for ring0 [10.0.0.6] Select Enter
899+
# INFO: Port for ring0 [5405] Select Enter
900+
# INFO: Do you wish to use SBD (y/n)? n
893901
# WARNING: Not configuring SBD - STONITH will be disabled.
894902
# Do you wish to configure an administration IP (y/n)? n
903+
# INFO: Do you wish to configure a virtual IP address (y/n)? n
895904
```
896905

897906
14. **[2]** Add the node to the cluster.
898907

899908
```bash
900909
sudo crm cluster join
901-
# ! NTP is not configured to start at system boot.
902-
# Do you want to continue anyway (y/n)? y
903-
# IP address or hostname of existing node (for example, 192.168.1.1) []10.0.0.6
904-
# /root/.ssh/id_rsa already exists - overwrite (y/n)? n
910+
# INFO: IP address or hostname of existing node (e.g.: 192.168.1.1) []10.0.0.6
911+
# INFO: The user 'hacluster' will have the login shell configuration changed to /bin/bash
912+
# INFO: Continue (y/n)? y
913+
# INFO: Generating SSH key for hacluster
914+
# INFO: Configuring SSH passwordless with [email protected]
915+
# INFO: Configuring csync2...done
916+
# INFO: Merging known_hosts
917+
# INFO: Probing for new partitions...done
918+
# INFO: Address for ring0 [10.0.0.7] Select Enter
919+
# INFO: Done (log saved to /var/log/crmsh/crmsh.log)
905920
```
906921

907922
15. **[A]** Change the hacluster password to the same password.
@@ -919,6 +934,7 @@ Make sure to assign the custom role to the service principal at all VM (cluster
919934
a. Check the following section in the file and adjust, if the values aren't there or are different. Be sure to change the token to 30000 to allow memory-preserving maintenance. For more information, see the "Maintenance for virtual machines in Azure" article for [Linux][virtual-machines-linux-maintenance] or [Windows][virtual-machines-windows-maintenance].
920935
921936
```text
937+
{
922938
[...]
923939
token: 30000
924940
token_retransmits_before_loss_const: 10
@@ -968,73 +984,88 @@ Make sure to assign the custom role to the service principal at all VM (cluster
968984
1. **[1]** If you're using an SBD device (iSCSI target server or Azure shared disk) as a fencing device, run the following commands. Enable the use of a fencing device, and set the fence delay.
969985

970986
```bash
971-
sudo crm configure property stonith-timeout=210
972-
sudo crm configure property stonith-enabled=true
973-
974987
# List the resources to find the name of the SBD device
975988
sudo crm resource list
976989
sudo crm resource stop stonith-sbd
977990
sudo crm configure delete stonith-sbd
991+
978992
sudo crm configure primitive stonith-sbd stonith:external/sbd \
979993
params pcmk_delay_max="15" \
980994
op monitor interval="600" timeout="15"
981-
```
982995
983-
1. **[1]** If you're using an Azure fence agent for fencing, run the following commands. After assigning roles to both cluster nodes, you can configure the fencing devices in the cluster.
996+
# For SAP HANA scale-out only, configure stonith-sbd using following command
997+
sudo crm configure primitive stonith-sbd stonith:external/sbd \
998+
params pcmk_action_limit=-1 \
999+
op monitor interval="600" timeout="15"
9841000
985-
```bash
1001+
sudo crm configure property stonith-timeout=210
9861002
sudo crm configure property stonith-enabled=true
987-
sudo crm configure property concurrent-fencing=true
9881003
```
9891004

1005+
1. **[1]** If you're using an Azure fence agent for fencing, run the following commands. After assigning roles to both cluster nodes, you can configure the fencing devices in the cluster.
1006+
9901007
> [!NOTE]
9911008
> The 'pcmk_host_map' option is required in the command only if the hostnames and the Azure VM names are *not* identical. Specify the mapping in the format *hostname:vm-name*.
9921009
993-
#### [Managed identity](#tab/msi)
1010+
#### [Managed identity](#tab/msi)
9941011
9951012
```bash
9961013
# Adjust the command with your subscription ID and resource group of the VM
997-
9981014
sudo crm configure primitive rsc_st_azure stonith:fence_azure_arm \
9991015
params msi=true subscriptionId="subscription ID" resourceGroup="resource group" \
10001016
pcmk_monitor_retries=4 pcmk_action_limit=3 power_timeout=240 pcmk_reboot_timeout=900 pcmk_delay_max=15 pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \
10011017
meta failure-timeout=120s \
10021018
op monitor interval=3600 timeout=120
1003-
1004-
sudo crm configure property stonith-timeout=900
1019+
1020+
# For SAP HANA scale-out only, configure fence_azure_arm using following command
1021+
sudo crm configure primitive rsc_st_azure stonith:fence_azure_arm \
1022+
params msi=true subscriptionId="subscription ID" resourceGroup="resource group" \
1023+
pcmk_monitor_retries=4 pcmk_action_limit=-1 power_timeout=240 pcmk_reboot_timeout=900 pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \
1024+
meta failure-timeout=120s \
1025+
op monitor interval=3600 timeout=120
10051026
```
10061027
1007-
#### [Service principal](#tab/spn)
1028+
#### [Service principal](#tab/spn)
10081029
10091030
```bash
10101031
# Adjust the command with your subscription ID, resource group of the VM, tenant ID, service principal application ID and password
1011-
10121032
sudo crm configure primitive rsc_st_azure stonith:fence_azure_arm \
10131033
params subscriptionId="subscription ID" resourceGroup="resource group" tenantId="tenant ID" login="application ID" passwd="password" \
10141034
pcmk_monitor_retries=4 pcmk_action_limit=3 power_timeout=240 pcmk_reboot_timeout=900 pcmk_delay_max=15 pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \
10151035
meta failure-timeout=120s \
10161036
op monitor interval=3600 timeout=120
1017-
1018-
sudo crm configure property stonith-timeout=900
1037+
1038+
# For SAP HANA scale-out only, configure fence_azure_arm using following command
1039+
sudo crm configure primitive rsc_st_azure stonith:fence_azure_arm \
1040+
params subscriptionId="subscription ID" resourceGroup="resource group" tenantId="tenant ID" login="application ID" passwd="password" \
1041+
pcmk_monitor_retries=4 pcmk_action_limit=-1 power_timeout=240 pcmk_reboot_timeout=900 pcmk_host_map="prod-cl1-0:prod-cl1-0-vm-name;prod-cl1-1:prod-cl1-1-vm-name" \
1042+
meta failure-timeout=120s \
1043+
op monitor interval=3600 timeout=120
10191044
```
10201045
1021-
---
1046+
---
1047+
1048+
```bash
1049+
sudo crm configure property stonith-enabled=true
1050+
sudo crm configure property concurrent-fencing=true
1051+
sudo crm configure property stonith-timeout=900
1052+
```
10221053
10231054
If you're using fencing device, based on service principal configuration, read [Change from SPN to MSI for Pacemaker clusters using Azure fencing](https://techcommunity.microsoft.com/t5/running-sap-applications-on-the/sap-on-azure-high-availability-change-from-spn-to-msi-for/ba-p/3609278) and learn how to convert to managed identity configuration.
10241055

1025-
> [!IMPORTANT]
1026-
> The monitoring and fencing operations are deserialized. As a result, if there's a longer-running monitoring operation and simultaneous fencing event, there's no delay to the cluster failover because the monitoring operation is already running.
1056+
> [!IMPORTANT]
1057+
> The monitoring and fencing operations are deserialized. As a result, if there's a longer-running monitoring operation and simultaneous fencing event, there's no delay to the cluster failover because the monitoring operation is already running.
10271058

1028-
> [!TIP]
1029-
>The Azure fence agent requires outbound connectivity to the public endpoints, as documented, along with possible solutions, in [Public endpoint connectivity for VMs using standard ILB](./high-availability-guide-standard-load-balancer-outbound-connections.md).
1059+
> [!TIP]
1060+
> The Azure fence agent requires outbound connectivity to the public endpoints, as documented, along with possible solutions, in [Public endpoint connectivity for VMs using standard ILB](./high-availability-guide-standard-load-balancer-outbound-connections.md).
10301061

10311062
## Configure Pacemaker for Azure scheduled events
10321063

1033-
Azure offers [scheduled events](/azure/virtual-machines/linux/scheduled-events). Scheduled events are provided via the metadata service and allow time for the application to prepare for such events.
1064+
Azure offers [scheduled events](/azure/virtual-machines/linux/scheduled-events). Scheduled events are provided via the metadata service and allow time for the application to prepare for such events.
10341065

10351066
Resource agent [azure-events-az](https://github.com/ClusterLabs/resource-agents/pull/1161) monitors for scheduled Azure events. If events are detected and the resource agent determines that another cluster node is available, it sets a node-level health attribute `#health-azure` to `-1000000`.
10361067

1037-
When this special cluster health attribute is set for a node, the node is considered unhealthy by the cluster and all resources are migrated away from the affected node. The location constraint ensures resources with name starting with ‘health-‘ are excluded, as the agent needs to run in this unhealthy state. Once the affected cluster node is free of running cluster resources, scheduled event can execute its action, such as restart, without risk to running resources.
1068+
When this special cluster health attribute is set for a node, the node is considered unhealthy by the cluster and all resources are migrated away from the affected node. The location constraint ensures resources with name starting with ‘health-‘ are excluded, as the agent needs to run in this unhealthy state. Once the affected cluster node is free of running cluster resources, scheduled event can execute its action, such as restart, without risk to running resources.
10381069

10391070
The `#heath-azure` attribute is set back to `0` on pacemaker startup once all events have been processed, marking the node as healthy again.
10401071

0 commit comments

Comments
 (0)