Skip to content

Commit 1e3cc00

Browse files
authored
Merge pull request #13 from cybertec-postgresql/multisite-docs
Document how to change existing setups into multisite
2 parents c420cd9 + cbd8db7 commit 1e3cc00

1 file changed

Lines changed: 94 additions & 10 deletions

File tree

docs/multisite.rst

Lines changed: 94 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,76 +3,99 @@
33
Using Patroni in multisite mode
44
===============================
55

6+
.. _multisite_introduction:
7+
68
Introduction
79
++++++++++++
810

911
The multisite mode has been developed to increase resilience of Patroni setups spanning multiple sites against temporary outages. In multisite mode each site runs a separate Patroni cluster with its own DCS, being able to perform leader switches (switchovers and failovers) as usual Patroni clusters. On top of this, in multisite mode here is a global DCS for leader site election, which coordinates which site is the primary and which is the standby. In each site the local leader instance is responsible for global leader site election. The site that acquires the leader lock runs Patroni normally, other sites configure themselves as standby clusters.
1012

13+
.. _multisite_when_to_use:
1114
When to use multisite mode
1215
--------------------------
1316

1417
If network reliability and bandwidth between sites is good and latency low (<10ms), multisite mode is most likely not useful. Instead, a simple Patroni cluster that spans the two sites will be a simpler and more robust solution.
1518

1619
Multisite mode is useful when automatic cross site failover is needed, but the cross site failover needs to be much more resilient against temporary outages. It is also useful when cluster member IP addresses are not globally routable and cross site communication needs to pass through an externally visible proxy address.
1720

21+
.. _multisite_dcs_considerations:
22+
1823
DCS considerations
1924
------------------
2025

2126
There are multiple possible ways of setting up DCS for multisite mode, but in every case there are two separate concerns covered. One is the local DCS, which is backing the site-local actions of Patroni. In addition, there is the global DCS, being responsible for keeping track of site state.
2227

28+
.. _multisite_global_dcs:
29+
2330
Global DCS
2431
~~~~~~~~~~
2532

2633
The multisite deployment will only be as resilient as the global DCS cluster. DCS has to maintain quorum (more than half of all nodes connected to each other, being able to write the same changes). In case of a typical 3 node DCS cluster, this means quorum is 2, and if any 2 nodes share a potential failure point (e.g. being attached to the same network component), then that failure will bring the whole multisite cluster into read only mode within the multisite TTL timeout (see Configuration below).
2734

28-
Let's consider an example where there are 2 datacenters, and two of the three DCS nodes are in datacenter A. If the whole datacenter goes offline (e.g. power outage, fire, network connection to datacenter severed) then the other site in datacenter B will not be able to promote. If that site happened to be leader at the pont of the DCS failure, it will demote itself to avoid a split brain situation, thus retaining safety.
35+
Let's consider an example where there are 2 datacenters, and two of the three DCS nodes are in datacenter A. If the whole datacenter goes offline (e.g. power outage, fire, network connection to datacenter severed) then the other site in datacenter B will not be able to promote. If that site happened to be leader at the pont of the DCS failure, it would demote itself to avoid a split brain situation, thus retaining data safety.
2936

3037
In short, this means that to survive a full site outage the system needs to have at least 3 sites. To simplify things, one of the 3 sites is only required to have a single DCS node. If only 2 sites are available, then hosting this third quorum node on public cloud infrastructure is a viable option.
3138

3239
Here is a typical deployment architecture for using multisite mode:
3340

3441
.. image:: _static/multisite-architecture.png
3542

43+
.. _multisite_cross_site_latency:
44+
3645
Cross-site latency
3746
##################
3847

3948
If the network latencies between sites are very high, then DCS might require special tuning. For example, etcd uses a heartbeat interval of 100 ms and election timeout of 1 s by default. If round trip time between sites is more than 100 ms, these values should be increased.
4049

50+
.. _multisite_local_dcs:
51+
4152
Local DCS
4253
~~~~~~~~~
4354

4455
This is not different from a usual Patroni setup.
4556

4657

4758

59+
.. _multisite_op_howto:
60+
4861
Operational how-tos
4962
+++++++++++++++++++
5063

64+
.. _multisite_installation:
65+
5166
Installation
5267
------------
5368

69+
.. _multisite_installation_linux:
70+
5471
Linux
5572
~~~~~
5673

74+
.. _multisite_installation_linux_prerequisites:
75+
5776
Prerequisites
5877
#############
5978

6079
Before starting the installation, Python3 and the matching pip binary have to be installed on the system.
6180

6281
Patroni stores its state and some of its config in a distributed configuration store (DCS). You have to install one of the possible solutions, e.g. etcd 3.5 (https://etcd.io/docs/v3.5/install/).
6382

83+
.. _multisite_installation_linux_steps:
84+
6485
Installation steps
6586
##################
6687

6788
As systemd is is now the de-facto init system across Linux distributions, we use it in the below steps.
6889

6990
#. Download and unpack source from https://github.com/cybertec-postgresql/patroni/archive/refs/heads/multisite.zip
70-
#. `cd` to the resulting `patroni` directory
71-
#. `pip install -r requirements.txt`
72-
#. `pip install psycopg`
91+
#. ``cd`` to the resulting ``patroni`` directory
92+
#. ``pip install -r requirements.txt``
93+
#. ``pip install psycopg``
7394
#. create Patroni config (see Configuration below)
7495
#. to run Patroni as a systemd service, create a systemd unit config based on the linked example: https://github.com/patroni/patroni/blob/master/extras/startup-scripts/patroni.service
75-
#. start Patroni with `[sudo] systemctl start patroni`
96+
#. start Patroni with ``[sudo] systemctl start patroni``
97+
98+
.. _multisite_installation_windows:
7699

77100
Windows
78101
~~~~~~~
@@ -82,12 +105,14 @@ You can use Cybertec's packaged versions from https://github.com/cybertec-postgr
82105
If you need, for example, a different PostgreSQL version from what's provided, open a Github issue there, and a new release will soon be prepared.
83106

84107

108+
.. _multisite_configuration:
109+
85110
Configuration
86111
-------------
87112

88113
Configuring multisite mode is done using a top level ``multisite`` section in Patroni configuration file.
89114

90-
The configuration is very similar to the usual Patroni config. In fact, the keys and their respective values under `multisite` obey the same rules as those in a conventional configuration.
115+
The configuration is very similar to the usual Patroni config. In fact, the keys and their respective values under ``multisite`` obey the same rules as those in a conventional configuration.
91116

92117
An example configuration for two Patroni sites:
93118

@@ -114,6 +139,7 @@ An example configuration for two Patroni sites:
114139
ttl: 90
115140
retry_timeout: 40
116141
142+
.. _multisite_config_parameters:
117143

118144
Details of the configuration parameters
119145
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -137,38 +163,48 @@ Details of the configuration parameters
137163
``restore_command``
138164
PostgreSQL restore\_command to use to fetch WAL files from remote site.
139165

166+
.. _multisite_config_passwords:
167+
140168
Passwords in the YAML configuration
141169
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
142170

143-
As all standby sites replicate from the leader, users and their passwords are the same on each Postgres node. Therefore the YAML configuration should specify the same password for each user under `postgresql.authentication`.
171+
As all standby sites replicate from the leader, users and their passwords are the same on each Postgres node. Therefore the YAML configuration should specify the same password for each user under ``postgresql.authentication``.
144172

145173

174+
.. _multisite_site_failover:
175+
146176
Site failover
147177
-------------
148178

149179
In case the multisite leader lock is not updated for at least the time specified by multisite TTL, the standby leader(s) of the other site(s) will try to update the lock. If successful, the standby leader will be promoted to a proper leader. As a result, the Postgres primary instance will be now found in a new site.
150180

181+
.. _multisite_restore_order_after_failover:
182+
151183
Restoring the old leader site after site failover
152184
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153185

154186
Once the problems leading to the site failover are resolved, the old leader site will be able to join the multisite cluster as a standby leader. There is no automatic attempt made for restoring the original order - that is, if desired, switching back to the old leader site must be done manually, via a site switchover.
155187

188+
.. _multisite_connection_to_primary_after_failover:
189+
156190
Connections to the primary
157191
~~~~~~~~~~~~~~~~~~~~~~~~~~
158192

159193
Applications should be ready to try to connect to the new primary. See 'Connecting to a multisite cluster' for more details.
160194

161195

196+
.. _multisite_site_switchover:
197+
162198
Site switchover
163199
---------------
164200

165-
When circumstances arise that makes it necessary to switch the location of the Postgres primary from one site to another, one could do it by performing a site switchover. Just like a normal switchover, a site switchover can be initiated using `patronictl` (or, alternatively, and API call to the Rest API). The CTL command is as simple as
201+
When circumstances arise that makes it necessary to switch the location of the Postgres primary from one site to another, one could do it by performing a site switchover. Just like a normal switchover, a site switchover can be initiated using ``patronictl`` (or, alternatively, and API call to the Rest API). The CTL command is as simple as
166202

167203
```
168204
patronictl site-switchover
169205
```
170206

171-
Answer the prompts as you would with other `patronictl` commands.
207+
Answer the prompts as you would with other ``patronictl`` commands.
172208

173209
The API call could look like the following (replace 'dc2' with the desired site name):
174210

@@ -178,11 +214,59 @@ curl --data-binary '{ "target_site": "dc2"}' http://127.0.0.1:8008/site_switchov
178214

179215
Once the site switchover is done, the old leader site will become a standby site automatically.
180216

217+
.. _multisite_connection_to_primary_after_switchover:
218+
181219
Connections to the primary
182220
~~~~~~~~~~~~~~~~~~~~~~~~~~
183221

184-
Applications should be ready to try to connect to the new primary. See 'Connecting to a multisite cluster' for more details.
222+
Applications should be ready to try to connect to the new primary. See :ref:`_multisite_connection_to_cluster` for more details.
223+
224+
225+
.. _multisite_connection_to_cluster:
226+
227+
Connecting to a multisite cluster
228+
---------------------------------
229+
230+
There are multiple ways one could set up application connections to a multisite Patroni cluster. We consider here connecting to the primary instance - connections to replicas can be solved with sloght modifications.
231+
232+
1. Single IP address using HAProxy
233+
234+
This is the simplest from the application standpoint, but setting it up is the most complex of all listed solutions (extra node(s) for HAProxy itself, and Keepalived for ensuring HAProxy's availability). Unless you need the load balancing features HAProxy provides, you should probably choose one of the other methods.
235+
236+
2. Multi-host connection strings
237+
238+
With this solution, all potential primary instances are listed in the connection string. To ensure connections land on the primary, the connection failover feature of the DB driver should be used (``targetServerType=primary`` for [JDBC](https://jdbc.postgresql.org/documentation/use/#connection-fail-over), ``target_session_attrs="read-write"`` for [libpq](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-MULTIPLE-HOSTS), ``TargetSessionAttributes.Primary`` for .NET's [Npgsql](https://www.npgsql.org/doc/failover-and-load-balancing.html?tabs=7)). The big advantage of this solution is that it doesn't require any extra setup on the DB side. A disadvantage can be that with many nodes (e.g. two sites with three nodes each) it can take a while to have a connection opened. This is less of a problem when using connection poolers.
239+
240+
3. Per-site endpoint IP combined with multi-host connection strings
241+
242+
[vip-manager](https://github.com/cybertec-postgresql/vip-manager/) provides a relatively easy way of maintaining a single IP address that always points to the leader of a single site. One could set it up for each site, and then use the endpoint IPs in a multi-host connection string as described above. As the number of addresses to check is less than in (2), establishing a connection is faster on average. The downside is the added complexity (vip-manager has to be installed on the Patroni nodes, and configured to pull the necessary information from DCS).
243+
244+
245+
.. _multisite_transforming_standby_to_multisite:
246+
247+
Transforming an existing setup into multisite
248+
---------------------------------------------
249+
250+
If the present setup consists of a standby cluster replicating from a leader site, the following steps have to be performed:
251+
252+
1. Set up the global DCS
253+
1.1 if a separate DCS cluster is going to be used, set up the new cluster as usual (one node in both Patroni sites, and a third node in a third site)
254+
2. Enable multisite on leader site's Patroni cluster
255+
2.1 apply the multisite config to all nodes' Patroni config files
256+
2.2 reload local configuration on the leader site cluster's nodes (``patronictl reload``)
257+
2.3 check if ``patronictl list`` shows an extra line saying 'Multisite <leader-site> is leader'
258+
3. Enable multisite on the standby cluster
259+
3.1 repeat the steps from 2. on the standby cluster
260+
3.2 after reloading the config, you should see ``patronictl list`` saying 'Multisite <standby-site> is standby, replicating from <leader-site>'
261+
4. Remove ``standby_cluster`` specification from the dynamic config
262+
4.1 use ``patronictl edit-config`` to remove all lines belonging to the standby cluster definition
263+
264+
If the present setup is one Patroni cluster over two sites, first turn that setup into a stanby cluster setup, and perform the above steps to enable multisite.
265+
266+
Moving from an existing Postgres setup to multisite can be achieved by setting up a full multisite cluster which is still replicating from the original primary. This can be achieved by using the usual standby cluster specification, this time on the leader site's cluster. On cutover simply remove the standby cluster specification, thus promoting the leader site.
267+
185268

269+
.. _multisite_glossary:
186270

187271
Glossary
188272
++++++++

0 commit comments

Comments
 (0)