You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/reliability/concept-business-continuity-high-availability-disaster-recovery.md
+13-4Lines changed: 13 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Understand business continuity, high availability, and disaster rec
4
4
author: anaharris-ms
5
5
ms.service: azure
6
6
ms.topic: conceptual
7
-
ms.date: 01/17/2025
7
+
ms.date: 11/04/2025
8
8
ms.author: anaharris
9
9
ms.custom: subject-reliability
10
10
ms.subservice: azure-reliability
@@ -42,6 +42,8 @@ A business continuity plan doesn't only take into consideration the resiliency f
42
42
43
43
Business continuity planning should include the following sequential steps:
44
44
45
+
1.**Criticality tier classification**. Workloads can be classified into different *criticality tiers* based on their importance to the business. Each tier has different requirements for availability, and therefore different requirements for business continuity planning. To determine your workload's critical tier, see [Well-Architected Framework - Select your criticality tier](/azure/well-architected/design-guides/disaster-recovery#select-your-criticality-tier).
46
+
45
47
1.**Risk identification**. Identify risks to a workload's availability or functionality. Possible risks could be network issues, hardware failures, human error, region outage, etc. Understand the impact of each risk.
46
48
47
49
1.**Risk classification**. Classify each risk as either a common risk, which should be factored into plans for HA, or an uncommon risk, which should be part of DR planning.
@@ -83,7 +85,7 @@ Here are some examples:
83
85
84
86
Business continuity plans must address both common and uncommon risks.
85
87
86
-
-*Common risks* are planned and expected. For example, in a cloud environment it's common for there to be *transient failures* including brief network outages, equipment restarts due to patches, timeouts when a service is busy, and so forth. Because these events happen regularly, workloads need to be resilient to them.
88
+
-*Common risks* are planned and expected. For example, in a cloud environment it's common for there to be *transient failures*or *blips*,including brief network outages, equipment restarts due to patches, timeouts when a service is busy, and so forth. Because these events happen regularly, workloads need to be resilient to them.
87
89
88
90
A high availability strategy must consider and control for each risk of this type.
89
91
@@ -93,7 +95,8 @@ Business continuity plans must address both common and uncommon risks.
93
95
94
96
High availability and disaster recovery are interrelated, and so it's important to plan strategies for both of them together.
95
97
96
-
It's important to understand that risk classification depends on workload architecture and the business requirements, and some risks can be classified as HA for one workload and DR for another workload. For example, a full Azure region outage would generally be considered a DR risk to workloads in that region. But for workloads that use multiple Azure regions in an active-active configuration with full replication, redundancy, and automatic region failover, a region outage is classified as an HA risk.
98
+
Risk classification depends on workload architecture and the business requirements, and some risks can be classified as HA for one workload and DR for another workload. For example, a full Azure region outage would generally be considered a DR risk to workloads in that region. But for workloads that use multiple Azure regions in an active-active configuration with full replication, redundancy, and automatic region failover, a region outage is classified as an HA risk.
99
+
97
100
98
101
#### Risk mitigation
99
102
@@ -283,7 +286,11 @@ Regardless of the cause of the disaster, it's important that you create a well-d
283
286
284
287
DR isn't an automatic feature of Azure. However, many services do provide features and capabilities that you can use to support your DR strategies. You should review the [reliability guides for each Azure service](./overview-reliability-guidance.md) to understand how the service works and its capabilities, and then map those capabilities to your DR plan.
285
288
286
-
The following sections list some common elements of a disaster recovery plan, and describe how Azure can help you to achieve them.
289
+
A strong DR plan turns strategy into decisive action. It provides a clear roadmap for responding to disasters, minimizing downtime, and ensuring business continuity.
290
+
291
+
To make this possible, every DR plan should be documented to include a clear runbook, a well-defined communication plan, and a structured escalation path. To learn more about these DR plan elements, see [Well-Architected Framework - Document your DR plan](/azure/well-architected/design-guides/disaster-recovery#document-your-dr-plan).
292
+
293
+
The following sections list some common approaches in a disaster recovery plan, and describe how Azure can help you to achieve them.
287
294
288
295
#### Failover and failback
289
296
@@ -316,6 +323,8 @@ Many Azure data and storage services support backups, such as the following:
316
323
- Many Azure database services, including [Azure SQL Database](./reliability-sql-database.md) and [Azure Cosmos DB](/azure/reliability/reliability-cosmos-db-nosql), have an automated backup capability for your databases.
317
324
-[Azure Key Vault](./reliability-key-vault.md) provides features to back up your secrets, certificates, and keys.
318
325
326
+
To learn more about recovery strategies for backup and restore, see [Well-Architected Framework - Recovery strategy for backup and restore](/azure/well-architected/design-guides/disaster-recovery#recovery-strategy-for-backup-and-restore).
327
+
319
328
#### Automated deployments
320
329
321
330
To rapidly deploy and configure required resources in the event of a disaster, use Infrastructure as code (IaC) assets, such as Bicep files, ARM templates, or Terraform configuration file. Using IaC reduces your recovery time and potential for error, compared to manually deploying and configuring resources.
0 commit comments