Skip to content

Commit 7705ae7

Browse files
Merge pull request #261885 from nachoalonsoportillo/patch-34
Add several changes
2 parents 07d7037 + 1351d2a commit 7705ae7

1 file changed

Lines changed: 23 additions & 23 deletions

File tree

articles/reliability/reliability-postgresql-flexible-server.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Find out about reliability and high availability in Azure Database
55
author: sunilagarwal
66
ms.author: anaharris
77
ms.reviewer: maghan, anaharris
8-
ms.date: 08/24/2023
8+
ms.date: 12/21/2023
99
ms.service: postgresql
1010
ms.topic: conceptual
1111
ms.custom:
@@ -22,7 +22,7 @@ ms.custom:
2222

2323
This article describes high availability in Azure Database for PostgreSQL - Flexible Server, which includes [availability zones](#availability-zone-support) and [cross-region recovery and business continuity](#cross-region-disaster-recovery-and-business-continuity). For a more detailed overview of reliability in Azure, see [Azure reliability](/azure/architecture/framework/resiliency/overview).
2424

25-
Azure Database for PostgreSQL: Flexible Server offers high availability support by provisioning physically separate primary and standby replica either within the same availability zone (zonal) or across availability zones (zone-redundant). This high availability model is designed to ensure that committed data is never lost in the case of failures. The model is also designed so that the database doesn't become a single point of failure in your software architecture. For more information on high availability and availability zone support, see [Availability zone support](#availability-zone-support).
25+
Azure Database for PostgreSQL - Flexible Server offers high availability support by provisioning physically separated primary and standby replicas, either within the same availability zone (zonal) or across availability zones (zone-redundant). This high availability model is designed to ensure that committed data is never lost in the case of failures. The model is also designed so that the database doesn't become a single point of failure in your software architecture. For more information on high availability and availability zone support, see [Availability zone support](#availability-zone-support).
2626

2727
## Availability zone support
2828

@@ -47,9 +47,9 @@ Azure Database for PostgreSQL - Flexible Server supports both [zone-redundant an
4747

4848
**Zone redundancy:**
4949

50-
- The **zone-redundancy** option is only available in a [regions that support availability zones](../postgresql/flexible-server/overview.md#azure-regions).
50+
- The **zone-redundancy** option is only available in [regions that support availability zones](../postgresql/flexible-server/overview.md#azure-regions).
5151

52-
- Zone-redundancy zones are **not** supported for:
52+
- Zone-redundancy is **not** supported for:
5353

5454
- Azure Database for PostgreSQL – Single Server SKU.
5555
- Burstable compute tier.
@@ -79,7 +79,7 @@ Azure Database for PostgreSQL - Flexible Server supports both [zone-redundant an
7979

8080
- Ability to restart the server to pick up any static server parameter changes.
8181

82-
- Periodic maintenance activities such as minor version upgrades happen at the standby first and the service failed to reduce downtime.
82+
- Periodic maintenance activities such as minor version upgrades happen at the standby first and, to reduce downtime, the standby is promoted to primary so that workloads can keep on, while the maintenance tasks are applied on the remaining node.
8383

8484
### High availability limitations
8585

@@ -91,7 +91,7 @@ Azure Database for PostgreSQL - Flexible Server supports both [zone-redundant an
9191

9292
- The standby server typically recovers WAL files at 40 MB/s. If your workload exceeds this limit, you can encounter extended time for the recovery to complete either during the failover or after establishing a new standby.
9393

94-
- Configuring for availability zones induces some latency to writes and commits—no impact on reading queries. The performance impact varies depending on your workload. As a general guideline, writes and commit impact can be around 20-30% impact.
94+
- Configuring for availability zones induces some latency to writes and commits, while it doesn't produce any impact on reading queries. The performance impact varies depending on your workload. As a general guideline, writes and commit impact can be around 20-30% impact.
9595

9696
- Restarting the primary database server also restarts the standby replica.
9797

@@ -103,7 +103,7 @@ Azure Database for PostgreSQL - Flexible Server supports both [zone-redundant an
103103

104104
- If logical decoding or logical replication is configured with an availability-configured Flexible Server, in the event of a failover to the standby server, the logical replication slots aren't copied over to the standby server. To maintain logical replication slots and ensure data consistency after a failover, it is recommended to use the PG Failover Slots extension. For more information on how to enable this extension, please refer to the [documentation](../postgresql/flexible-server/concepts-extensions.md#pg_failover_slots-preview).
105105

106-
- Configuring availability zones between private (VNET) and public access isn't supported. You must configure availability zones within a VNET (spanned across availability zones within a region) or public access.
106+
- Configuring availability zones between private (VNET) and public access with private endpoints isn't supported. You must configure availability zones within a VNET (spanned across availability zones within a region) or public access with private endpoints.
107107

108108
- Availability zones are configured only within a single region. Availability zones can't be configured across regions.
109109

@@ -125,7 +125,7 @@ To learn how to enable or disable high availability configuration in your flexib
125125

126126
#### Transaction completion
127127

128-
Application transaction-triggered writes and commits are first logged to the WAL on the primary server. It's then streamed to the standby server using the Postgres streaming protocol. Once the logs are persisted on the standby server storage, the primary server is acknowledged for write completion. Only then and the application confirmed the writes. An extra round-trip adds more latency to your application. The percentage of impact depends on the application. This acknowledgment process doesn't wait for the logs to be applied to the standby server. The standby server is permanently in recovery mode until it's promoted.
128+
Application transaction-triggered writes and commits are first logged to the WAL on the primary server. These are then streamed to the standby server using the Postgres streaming protocol. Once the logs are persisted on the standby server storage, the primary server is acknowledged for write completion. Only then the application is confirmed the commit of its transaction. This additional round-trip adds more latency to your application. The percentage of impact depends on the application. This acknowledgment process doesn't wait for the logs to be applied to the standby server. The standby server is permanently in recovery mode until it's promoted.
129129

130130
#### Health check
131131

@@ -146,7 +146,7 @@ The health of primary and standby servers are continuously monitored, and approp
146146
| **Healthy** | Replication is in steady state and healthy. |
147147
| **Failing Over** | The database server is in the process of failing over to the standby. |
148148
| **Removing Standby** | In the process of deleting standby server. |
149-
| **Not Enabled** | Zone redundant high availability isn't enabled. |
149+
| **Not Enabled** | High availability isn't enabled. |
150150

151151
> [!NOTE]
152152
> You can enable high availability during server creation or at a later time as well. If you are enabling or disabling high availability during the post-create stage, operating when the primary server activity is low is recommended.
@@ -166,7 +166,7 @@ PostgreSQL client applications are connected to the primary server using the DB
166166

167167
For flexible servers configured with high availability, log data is replicated in real-time to the standby server. Any user errors on the primary server - such as an accidental drop of a table or incorrect data updates, are replicated to the standby replica. So, you can't use standby to recover from such logical errors. To recover from such errors, you have to perform a point-in-time restore from the backup. Using a flexible server's point-in-time restore capability, you can restore to the time before the error occurred. A new database server is restored as a single-zone flexible server with a new user-provided server name for databases configured with high availability. You can use the restored server for a few use cases:
168168

169-
- You can use the restored server for production and optionally enable zone-redundant high availability.
169+
- You can use the restored server for production and optionally enable high availability with standby replica on either same zone or another zone in the same region.
170170

171171
- If you want to restore an object, export it from the restored database server and import it to your production database server.
172172
- If you want to clone your database server for testing and development purposes or to restore for any other purposes, you can perform the point-in-time restore.
@@ -179,11 +179,11 @@ To learn how to do a point-in-time restore of a flexible server, see [Point-in-t
179179

180180
Planned downtime events include Azure scheduled periodic software updates and minor version upgrades. You can also use a planned failover to return the primary server to a preferred availability zone. When configured in high availability, these operations are first applied to the standby replica while the applications continue to access the primary server. Once the standby replica is updated, primary server connections are drained, and a failover is triggered, which activates the standby replica to be the primary with the same database server name. Client applications have to reconnect with the same database server name to the new primary server and can resume their operations. A new standby server is established in the same zone as the old primary.
181181

182-
For other user-initiated operations such as scale-compute or scale-storage, the changes are applied on the standby first, followed by the primary. Currently, the service isn't failed over to the standby, and hence while the scale operation is carried out on the primary server, applications encounters a short downtime.
182+
For other user-initiated operations such as scale-compute or scale-storage, the changes are applied on the standby first, followed by the primary. Currently, the service isn't failed over to the standby, and hence while the scale operation is carried out on the primary server, applications encounter a short downtime.
183183

184-
You can also use this feature to failover to the standby server with reduced downtime. For example, your primary could be on a different availability zone after an unplanned failover than the application. You want to bring the primary server back to the previous zone to colocate with your application.
184+
You can also use this feature to failover to the standby server with reduced downtime. For example, your primary could be on a different availability zone than the application, after an unplanned failover. You want to bring the primary server back to the previous zone to colocate with your application.
185185

186-
When executing this feature, the standby server is first prepared to ensure it's caught up with recent transactions, allowing the application to continue performing reads/writes. The standby is then promoted, and the connections to the primary are severed. Your application can continue to write to the primary while a new standby server is established in the background. The following are the steps involved with planned failover.
186+
When executing this feature, the standby server is first prepared to ensure it's caught up with recent transactions, allowing the application to continue performing reads/writes. The standby is then promoted, and the connections to the primary are severed. Your application can continue to write to the primary while a new standby server is established in the background. The following are the steps involved with planned failover:
187187

188188
| **Step** | **Description** | **App downtime expected?** |
189189
| --- | --- | --- |
@@ -192,9 +192,9 @@ When executing this feature, the standby server is first prepared to ensure it's
192192
| 3 | Application writes are blocked when the standby server is close to the primary log sequence number (LSN). | Yes |
193193
| 4 | Standby server is promoted to be an independent server. | Yes |
194194
| 5 | DNS record is updated with the new standby server's IP address. | Yes |
195-
| 6 | Application to reconnect and resume its read/write with new primary | No |
195+
| 6 | Application to reconnect and resume its read/write with new primary. | No |
196196
| 7 | A new standby server in another zone is established. | No |
197-
| 8 | Standby server starts to recover logs (from Azure BLOB) that it missed during its establishment. | No |
197+
| 8 | Standby server starts to recover logs (from Azure Blob) that it missed during its establishment. | No |
198198
| 9 | A steady state between the primary and the standby server is established. | No |
199199
| 10 | Planned failover process is complete. | No |
200200

@@ -204,7 +204,7 @@ Application downtime starts at step #3 and can resume operation post step #5. Th
204204
> With flexible server, you can optionally schedule Azure-initiated maintenance activities by choosing a 60-minute window on a day of your preference where the activities on the databases are expected to be low. Azure maintenance tasks such as patching or minor version upgrades would happen during that window. If you don't choose a custom window, a system allocated 1-hr window between 11 pm - 7 am local time is selected for your server.
205205
> These Azure-initiated maintenance activities are also performed on the standby replica for flexible servers that are configured with availability zones.
206206
207-
For a list of possible planned downtime events, see [Planned downtime events](/azure/postgresql/flexible-server/concepts-business-continuity#planned-downtime-events)
207+
For a list of possible planned downtime events, see [Planned downtime events](/azure/postgresql/flexible-server/concepts-business-continuity#planned-downtime-events).
208208

209209
#### Unplanned failover
210210

@@ -230,40 +230,40 @@ The following are the steps during forced failover:
230230
| 6 | Once the server is up, the DNS record is updated with the same hostname but using the standby's IP address. | Yes |
231231
| 7 | Application can reconnect to the new primary server and resume the operation. | No |
232232
| 8 | A standby server in the preferred zone is established. | No |
233-
| 9 | Standby server starts to recover logs (from Azure BLOB) that it missed during its establishment. | No |
233+
| 9 | Standby server starts to recover logs (from Azure Blob) that it missed during its establishment. | No |
234234
| 10 | A steady state between the primary and the standby server is established. | No |
235235
| 11 | Forced failover process is complete. | No |
236236

237237
Application downtime is expected to start after step #1 and persists until step #6 is completed. The rest of the steps happen in the background without affecting the application writes and commits.
238238

239-
> [!IMPORTANT]
239+
> [!IMPORTANT]
240240
> The end-to-end failover process includes (a) failing over to the standby server after the primary failure and (b) establishing a new standby server in a steady state. As your application incurs downtime until the failover to the standby is complete, **please measure the downtime from your application/client perspective** instead of the overall end-to-end failover process.
241241
242242
#### Considerations while performing forced failovers
243243

244244
- The overall end-to-end operation time can be seen as longer than the actual downtime experienced by the application.
245245

246-
> [!IMPORTANT]
246+
> [!IMPORTANT]
247247
> Always observe the downtime from the application perspective!
248248
249249
- Don't perform immediate, back-to-back failovers. Wait for at least 15-20 minutes between failovers, allowing the new standby server to be fully established.
250250

251-
- It's recommended that your perform a forced failover during a low-activity period to reduce downtime.
251+
- It's recommended that you perform a forced failover during a low-activity period to reduce downtime.
252252

253253
### Zone-down experience
254254

255-
**Zonal**. To recover from a zone-level failure, you can [perform point-in-time restore](#point-in-time-restore-of-high-availability-servers) using the backup. You can choose a custom restore point with the latest time to restore the latest data. A new flexible server is deployed in another nonaffected zone. The time taken to restore depends on the previous backup and the volume of transaction logs to recover.
255+
**Zonal**: To recover from a zone-level failure, you can [perform point-in-time restore](#point-in-time-restore-of-high-availability-servers) using the backup. You can choose a custom restore point with the latest time to restore the latest data. A new flexible server is deployed in another nonaffected zone. The time taken to restore depends on the previous backup and the volume of transaction logs to recover.
256256

257257
For more information on point-in-time restore, see [Backup and restore in Azure Database for PostgreSQL-Flexible Server](/azure/postgresql/flexible-server/concepts-backup-restore).
258258

259-
**Zone-redundant**. Flexible server is automatically failed over to the standby server within 60-120 s with zero data loss.
259+
**Zone-redundant**: Flexible server is automatically failed over to the standby server within 60-120 seconds with zero data loss.
260260

261261
## Configurations without availability zones
262262

263263
Although it's not recommended, you can configure your flexible server without high availability enabled. For flexible servers configured without high availability, the service provides local redundant storage with three copies of data, zone-redundant backup (in regions where it's supported), and built-in server resiliency to automatically restart a crashed server and relocate the server to another physical node. Uptime [SLA of 99.9%](https://azure.microsoft.com/support/legal/sla/postgresql) is offered in this configuration. During planned or unplanned failover events, if the server goes down, the service maintains the availability of the servers using the following automated procedure:
264264

265265
1. A new compute Linux VM is provisioned.
266-
1. The storage with data files is mapped to the new virtual machine
266+
1. The storage with data files is mapped to the new virtual machine.
267267
1. PostgreSQL database engine is brought online on the new virtual machine.
268268

269269
The picture below shows the transition between VM and storage failure.

0 commit comments

Comments
 (0)