You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/reliability/reliability-postgresql-flexible-server.md
+23-23Lines changed: 23 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: Find out about reliability and high availability in Azure Database
5
5
author: sunilagarwal
6
6
ms.author: anaharris
7
7
ms.reviewer: maghan, anaharris
8
-
ms.date: 08/24/2023
8
+
ms.date: 12/21/2023
9
9
ms.service: postgresql
10
10
ms.topic: conceptual
11
11
ms.custom:
@@ -22,7 +22,7 @@ ms.custom:
22
22
23
23
This article describes high availability in Azure Database for PostgreSQL - Flexible Server, which includes [availability zones](#availability-zone-support) and [cross-region recovery and business continuity](#cross-region-disaster-recovery-and-business-continuity). For a more detailed overview of reliability in Azure, see [Azure reliability](/azure/architecture/framework/resiliency/overview).
24
24
25
-
Azure Database for PostgreSQL: Flexible Server offers high availability support by provisioning physically separate primary and standby replica either within the same availability zone (zonal) or across availability zones (zone-redundant). This high availability model is designed to ensure that committed data is never lost in the case of failures. The model is also designed so that the database doesn't become a single point of failure in your software architecture. For more information on high availability and availability zone support, see [Availability zone support](#availability-zone-support).
25
+
Azure Database for PostgreSQL - Flexible Server offers high availability support by provisioning physically separated primary and standby replicas, either within the same availability zone (zonal) or across availability zones (zone-redundant). This high availability model is designed to ensure that committed data is never lost in the case of failures. The model is also designed so that the database doesn't become a single point of failure in your software architecture. For more information on high availability and availability zone support, see [Availability zone support](#availability-zone-support).
26
26
27
27
## Availability zone support
28
28
@@ -47,9 +47,9 @@ Azure Database for PostgreSQL - Flexible Server supports both [zone-redundant an
47
47
48
48
**Zone redundancy:**
49
49
50
-
- The **zone-redundancy** option is only available in a [regions that support availability zones](../postgresql/flexible-server/overview.md#azure-regions).
50
+
- The **zone-redundancy** option is only available in [regions that support availability zones](../postgresql/flexible-server/overview.md#azure-regions).
51
51
52
-
- Zone-redundancy zones are**not** supported for:
52
+
- Zone-redundancy is**not** supported for:
53
53
54
54
- Azure Database for PostgreSQL – Single Server SKU.
55
55
- Burstable compute tier.
@@ -79,7 +79,7 @@ Azure Database for PostgreSQL - Flexible Server supports both [zone-redundant an
79
79
80
80
- Ability to restart the server to pick up any static server parameter changes.
81
81
82
-
- Periodic maintenance activities such as minor version upgrades happen at the standby first andthe service failed to reduce downtime.
82
+
- Periodic maintenance activities such as minor version upgrades happen at the standby first and, to reduce downtime, the standby is promoted to primary so that workloads can keep on, while the maintenance tasks are applied on the remaining node.
83
83
84
84
### High availability limitations
85
85
@@ -91,7 +91,7 @@ Azure Database for PostgreSQL - Flexible Server supports both [zone-redundant an
91
91
92
92
- The standby server typically recovers WAL files at 40 MB/s. If your workload exceeds this limit, you can encounter extended time for the recovery to complete either during the failover or after establishing a new standby.
93
93
94
-
- Configuring for availability zones induces some latency to writes and commits—no impact on reading queries. The performance impact varies depending on your workload. As a general guideline, writes and commit impact can be around 20-30% impact.
94
+
- Configuring for availability zones induces some latency to writes and commits, while it doesn't produce any impact on reading queries. The performance impact varies depending on your workload. As a general guideline, writes and commit impact can be around 20-30% impact.
95
95
96
96
- Restarting the primary database server also restarts the standby replica.
97
97
@@ -103,7 +103,7 @@ Azure Database for PostgreSQL - Flexible Server supports both [zone-redundant an
103
103
104
104
- If logical decoding or logical replication is configured with an availability-configured Flexible Server, in the event of a failover to the standby server, the logical replication slots aren't copied over to the standby server. To maintain logical replication slots and ensure data consistency after a failover, it is recommended to use the PG Failover Slots extension. For more information on how to enable this extension, please refer to the [documentation](../postgresql/flexible-server/concepts-extensions.md#pg_failover_slots-preview).
105
105
106
-
- Configuring availability zones between private (VNET) and public access isn't supported. You must configure availability zones within a VNET (spanned across availability zones within a region) or public access.
106
+
- Configuring availability zones between private (VNET) and public access with private endpoints isn't supported. You must configure availability zones within a VNET (spanned across availability zones within a region) or public access with private endpoints.
107
107
108
108
- Availability zones are configured only within a single region. Availability zones can't be configured across regions.
109
109
@@ -125,7 +125,7 @@ To learn how to enable or disable high availability configuration in your flexib
125
125
126
126
#### Transaction completion
127
127
128
-
Application transaction-triggered writes and commits are first logged to the WAL on the primary server. It's then streamed to the standby server using the Postgres streaming protocol. Once the logs are persisted on the standby server storage, the primary server is acknowledged for write completion. Only then and the application confirmed the writes. An extra round-trip adds more latency to your application. The percentage of impact depends on the application. This acknowledgment process doesn't wait for the logs to be applied to the standby server. The standby server is permanently in recovery mode until it's promoted.
128
+
Application transaction-triggered writes and commits are first logged to the WAL on the primary server. These are then streamed to the standby server using the Postgres streaming protocol. Once the logs are persisted on the standby server storage, the primary server is acknowledged for write completion. Only then the application is confirmed the commit of its transaction. This additional round-trip adds more latency to your application. The percentage of impact depends on the application. This acknowledgment process doesn't wait for the logs to be applied to the standby server. The standby server is permanently in recovery mode until it's promoted.
129
129
130
130
#### Health check
131
131
@@ -146,7 +146,7 @@ The health of primary and standby servers are continuously monitored, and approp
146
146
|**Healthy**| Replication is in steady state and healthy. |
147
147
|**Failing Over**| The database server is in the process of failing over to the standby. |
148
148
|**Removing Standby**| In the process of deleting standby server. |
149
-
|**Not Enabled**|Zone redundant high availability isn't enabled. |
> You can enable high availability during server creation or at a later time as well. If you are enabling or disabling high availability during the post-create stage, operating when the primary server activity is low is recommended.
@@ -166,7 +166,7 @@ PostgreSQL client applications are connected to the primary server using the DB
166
166
167
167
For flexible servers configured with high availability, log data is replicated in real-time to the standby server. Any user errors on the primary server - such as an accidental drop of a table or incorrect data updates, are replicated to the standby replica. So, you can't use standby to recover from such logical errors. To recover from such errors, you have to perform a point-in-time restore from the backup. Using a flexible server's point-in-time restore capability, you can restore to the time before the error occurred. A new database server is restored as a single-zone flexible server with a new user-provided server name for databases configured with high availability. You can use the restored server for a few use cases:
168
168
169
-
- You can use the restored server for production and optionally enable zone-redundant high availability.
169
+
- You can use the restored server for production and optionally enable high availability with standby replica on either same zone or another zone in the same region.
170
170
171
171
- If you want to restore an object, export it from the restored database server and import it to your production database server.
172
172
- If you want to clone your database server for testing and development purposes or to restore for any other purposes, you can perform the point-in-time restore.
@@ -179,11 +179,11 @@ To learn how to do a point-in-time restore of a flexible server, see [Point-in-t
179
179
180
180
Planned downtime events include Azure scheduled periodic software updates and minor version upgrades. You can also use a planned failover to return the primary server to a preferred availability zone. When configured in high availability, these operations are first applied to the standby replica while the applications continue to access the primary server. Once the standby replica is updated, primary server connections are drained, and a failover is triggered, which activates the standby replica to be the primary with the same database server name. Client applications have to reconnect with the same database server name to the new primary server and can resume their operations. A new standby server is established in the same zone as the old primary.
181
181
182
-
For other user-initiated operations such as scale-compute or scale-storage, the changes are applied on the standby first, followed by the primary. Currently, the service isn't failed over to the standby, and hence while the scale operation is carried out on the primary server, applications encounters a short downtime.
182
+
For other user-initiated operations such as scale-compute or scale-storage, the changes are applied on the standby first, followed by the primary. Currently, the service isn't failed over to the standby, and hence while the scale operation is carried out on the primary server, applications encounter a short downtime.
183
183
184
-
You can also use this feature to failover to the standby server with reduced downtime. For example, your primary could be on a different availability zone after an unplanned failover than the application. You want to bring the primary server back to the previous zone to colocate with your application.
184
+
You can also use this feature to failover to the standby server with reduced downtime. For example, your primary could be on a different availability zone than the application, after an unplanned failover. You want to bring the primary server back to the previous zone to colocate with your application.
185
185
186
-
When executing this feature, the standby server is first prepared to ensure it's caught up with recent transactions, allowing the application to continue performing reads/writes. The standby is then promoted, and the connections to the primary are severed. Your application can continue to write to the primary while a new standby server is established in the background. The following are the steps involved with planned failover.
186
+
When executing this feature, the standby server is first prepared to ensure it's caught up with recent transactions, allowing the application to continue performing reads/writes. The standby is then promoted, and the connections to the primary are severed. Your application can continue to write to the primary while a new standby server is established in the background. The following are the steps involved with planned failover:
@@ -192,9 +192,9 @@ When executing this feature, the standby server is first prepared to ensure it's
192
192
| 3 | Application writes are blocked when the standby server is close to the primary log sequence number (LSN). | Yes |
193
193
| 4 | Standby server is promoted to be an independent server. | Yes |
194
194
| 5 | DNS record is updated with the new standby server's IP address. | Yes |
195
-
| 6 | Application to reconnect and resume its read/write with new primary | No |
195
+
| 6 | Application to reconnect and resume its read/write with new primary.| No |
196
196
| 7 | A new standby server in another zone is established. | No |
197
-
| 8 | Standby server starts to recover logs (from Azure BLOB) that it missed during its establishment. | No |
197
+
| 8 | Standby server starts to recover logs (from Azure Blob) that it missed during its establishment. | No |
198
198
| 9 | A steady state between the primary and the standby server is established. | No |
199
199
| 10 | Planned failover process is complete. | No |
200
200
@@ -204,7 +204,7 @@ Application downtime starts at step #3 and can resume operation post step #5. Th
204
204
> With flexible server, you can optionally schedule Azure-initiated maintenance activities by choosing a 60-minute window on a day of your preference where the activities on the databases are expected to be low. Azure maintenance tasks such as patching or minor version upgrades would happen during that window. If you don't choose a custom window, a system allocated 1-hr window between 11 pm - 7 am local time is selected for your server.
205
205
> These Azure-initiated maintenance activities are also performed on the standby replica for flexible servers that are configured with availability zones.
206
206
207
-
For a list of possible planned downtime events, see [Planned downtime events](/azure/postgresql/flexible-server/concepts-business-continuity#planned-downtime-events)
207
+
For a list of possible planned downtime events, see [Planned downtime events](/azure/postgresql/flexible-server/concepts-business-continuity#planned-downtime-events).
208
208
209
209
#### Unplanned failover
210
210
@@ -230,40 +230,40 @@ The following are the steps during forced failover:
230
230
| 6 | Once the server is up, the DNS record is updated with the same hostname but using the standby's IP address. | Yes |
231
231
| 7 | Application can reconnect to the new primary server and resume the operation. | No |
232
232
| 8 | A standby server in the preferred zone is established. | No |
233
-
| 9 | Standby server starts to recover logs (from Azure BLOB) that it missed during its establishment. | No |
233
+
| 9 | Standby server starts to recover logs (from Azure Blob) that it missed during its establishment. | No |
234
234
| 10 | A steady state between the primary and the standby server is established. | No |
235
235
| 11 | Forced failover process is complete. | No |
236
236
237
237
Application downtime is expected to start after step #1 and persists until step #6 is completed. The rest of the steps happen in the background without affecting the application writes and commits.
238
238
239
-
> [!IMPORTANT]
239
+
> [!IMPORTANT]
240
240
> The end-to-end failover process includes (a) failing over to the standby server after the primary failure and (b) establishing a new standby server in a steady state. As your application incurs downtime until the failover to the standby is complete, **please measure the downtime from your application/client perspective** instead of the overall end-to-end failover process.
241
241
242
242
#### Considerations while performing forced failovers
243
243
244
244
- The overall end-to-end operation time can be seen as longer than the actual downtime experienced by the application.
245
245
246
-
> [!IMPORTANT]
246
+
> [!IMPORTANT]
247
247
> Always observe the downtime from the application perspective!
248
248
249
249
- Don't perform immediate, back-to-back failovers. Wait for at least 15-20 minutes between failovers, allowing the new standby server to be fully established.
250
250
251
-
- It's recommended that your perform a forced failover during a low-activity period to reduce downtime.
251
+
- It's recommended that you perform a forced failover during a low-activity period to reduce downtime.
252
252
253
253
### Zone-down experience
254
254
255
-
**Zonal**. To recover from a zone-level failure, you can [perform point-in-time restore](#point-in-time-restore-of-high-availability-servers) using the backup. You can choose a custom restore point with the latest time to restore the latest data. A new flexible server is deployed in another nonaffected zone. The time taken to restore depends on the previous backup and the volume of transaction logs to recover.
255
+
**Zonal**: To recover from a zone-level failure, you can [perform point-in-time restore](#point-in-time-restore-of-high-availability-servers) using the backup. You can choose a custom restore point with the latest time to restore the latest data. A new flexible server is deployed in another nonaffected zone. The time taken to restore depends on the previous backup and the volume of transaction logs to recover.
256
256
257
257
For more information on point-in-time restore, see [Backup and restore in Azure Database for PostgreSQL-Flexible Server](/azure/postgresql/flexible-server/concepts-backup-restore).
258
258
259
-
**Zone-redundant**. Flexible server is automatically failed over to the standby server within 60-120 s with zero data loss.
259
+
**Zone-redundant**: Flexible server is automatically failed over to the standby server within 60-120 seconds with zero data loss.
260
260
261
261
## Configurations without availability zones
262
262
263
263
Although it's not recommended, you can configure your flexible server without high availability enabled. For flexible servers configured without high availability, the service provides local redundant storage with three copies of data, zone-redundant backup (in regions where it's supported), and built-in server resiliency to automatically restart a crashed server and relocate the server to another physical node. Uptime [SLA of 99.9%](https://azure.microsoft.com/support/legal/sla/postgresql) is offered in this configuration. During planned or unplanned failover events, if the server goes down, the service maintains the availability of the servers using the following automated procedure:
264
264
265
265
1. A new compute Linux VM is provisioned.
266
-
1. The storage with data files is mapped to the new virtual machine
266
+
1. The storage with data files is mapped to the new virtual machine.
267
267
1. PostgreSQL database engine is brought online on the new virtual machine.
268
268
269
269
The picture below shows the transition between VM and storage failure.
0 commit comments