Skip to content

Commit 8fa0310

Browse files
Merge pull request #313203 from duongau/expressroute-freshness-review-560387-batch4
Expressroute freshness review - batch 4
2 parents e37f00e + 53b5911 commit 8fa0310

9 files changed

Lines changed: 177 additions & 197 deletions

articles/expressroute/design-architecture-for-resiliency.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
---
22
title: Design and architect Azure ExpressRoute for resiliency
33
description: Learn how to design and architect Azure ExpressRoute for resiliency to ensure high availability and reliability in your network connections between on-premises and Azure.
4-
services: expressroute
54
author: duongau
65
ms.service: azure-expressroute
76
ms.topic: concept-article
8-
ms.date: 07/16/2024
7+
ms.date: 03/12/2026
98
ms.author: duau
109
ms.custom: ai-usage
1110
# Customer intent: As a network architect, I want to design Azure ExpressRoute for maximum resiliency, so that I can ensure high availability and reliability for our on-premises connections to Azure, supporting critical workloads without interruption.
@@ -31,7 +30,7 @@ There are three ExpressRoute resiliency architectures that can be utilized to en
3130

3231
### Maximum resiliency
3332

34-
The Maximum resiliency architecture in ExpressRoute is structured to eliminate any single point of failure within the Microsoft network path. This setup is achieved by configuring a pair of circuits across two distinct locations for site diversity with ExpressRoute. The objective of Maximum resiliency is to enhance reliability, resiliency, and availability, as a result ensuring the highest level of resilience for business and/or mission-critical workloads. For such operations, we recommend that you configure maximum resiliency. This architectural design is recommended as part of the [Well Architected Framework](/azure/well-architected/service-guides/azure-expressroute#reliability) under the reliability pillar. The ExpressRoute engineering team developed a [guided portal experience](expressroute-howto-circuit-portal-resource-manager.md?pivots=expressroute-preview) to assist you in configuring maximum resiliency.
33+
The Maximum resiliency architecture in ExpressRoute is structured to eliminate any single point of failure within the Microsoft network path. This setup is achieved by configuring a pair of circuits across two distinct locations for site diversity with ExpressRoute. The objective of Maximum resiliency is to enhance reliability, resiliency, and availability, as a result ensuring the highest level of resilience for business and/or mission-critical workloads. For such operations, Microsoft recommends that you configure maximum resiliency. This architectural design is recommended as part of the [Well Architected Framework](/azure/well-architected/service-guides/azure-expressroute#reliability) under the reliability pillar. The ExpressRoute engineering team developed a [guided portal experience](expressroute-howto-circuit-portal-resource-manager.md?pivots=expressroute-preview) to assist you in configuring maximum resiliency.
3534

3635
:::image type="content" source="./media/expressroute-howto-circuit-portal-resource-manager/maximum-resiliency.png" alt-text="Diagram of maximum resiliency for an ExpressRoute connection.":::
3736

@@ -43,7 +42,7 @@ High resiliency, also referred to as ExpressRoute Metro, enables the use of mult
4342

4443
### Standard resiliency
4544

46-
Standard resiliency in ExpressRoute is a single circuit with two connections configured at a single site. Built-in redundancy (Active-Active) is configured to facilitate failover across the two connections of the circuit. Today, ExpressRoute offers two connections at a single peering location. If a failure happens at this site, users might experience loss of connectivity to their Azure workloads. This configuration is also known as *single-homed* as it represents users with an ExpressRoute circuit configured with only one peering location. This configuration is considered the *least* resilient and **not recommended** for business or mission-critical workloads because it doesn't provide site resiliency.
45+
Standard resiliency in ExpressRoute is a single circuit with two connections configured at a single site. Built-in redundancy (Active-Active) is configured to facilitate failover across the two connections of the circuit. ExpressRoute offers two connections at a single peering location. If a failure happens at this site, users might experience loss of connectivity to their Azure workloads. This configuration is also known as *single-homed* as it represents users with an ExpressRoute circuit configured with only one peering location. This configuration is considered the *least* resilient and **not recommended** for business or mission-critical workloads because it doesn't provide site resiliency.
4746

4847
:::image type="content" source="./media/design-architecture-for-resiliency/standard-resiliency.png" alt-text="Diagram illustrating a single ExpressRoute circuit, with each link configured at a single peering location.":::
4948

@@ -53,11 +52,11 @@ Standard resiliency in ExpressRoute is a single circuit with two connections con
5352

5453
Azure offers several features to ensure regional resiliency. One such feature is [availability zones](/azure/reliability/availability-zones-overview). Availability zones protect applications and data from data center failures by spanning across multiple physical locations within a region. Regions and availability zones are central to your application design and resiliency strategy. By utilizing availability zones, you can achieve higher availability and resilience in your deployments. For more information, see [Regions & availability zones](/azure/reliability/overview).
5554

56-
We recommend deploying your [ExpressRoute Virtual Network Gateways](expressroute-about-virtual-network-gateways.md) as zone redundant across availability zones within a region. These availability zones are separate physical locations with independent infrastructure (power, cooling, and networking). The purpose is to protect your on-premises network connectivity to Azure from zone level failures. [Zone-redundant ExpressRoute gateways](../vpn-gateway/about-zone-redundant-vnet-gateways.md?toc=%2Fazure%2Fexpressroute%2Ftoc.json) provide resiliency, scalability, and higher availability for accessing mission-critical services on Azure.
55+
Microsoft recommends deploying your [ExpressRoute Virtual Network Gateways](expressroute-about-virtual-network-gateways.md) as zone redundant across availability zones within a region. These availability zones are separate physical locations with independent infrastructure (power, cooling, and networking). The purpose is to protect your on-premises network connectivity to Azure from zone level failures. [Zone-redundant ExpressRoute gateways](../vpn-gateway/about-zone-redundant-vnet-gateways.md?toc=%2Fazure%2Fexpressroute%2Ftoc.json) provide resiliency, scalability, and higher availability for accessing mission-critical services on Azure.
5756

5857
Equipment failures or disasters in regional and zonal data centers can affect ExpressRoute gateway deployments in virtual networks. If gateways aren't deployed as zone-redundant, such failures within an Azure data center can affect the ability for users to access their Azure workloads.
5958

60-
If you have an existing non-zone redundant ExpressRoute gateways, there's now the ability to [migrate to an availability zone enabled gateway](gateway-migration.md).
59+
If you have an existing non-zone redundant ExpressRoute gateways, there's the ability to [migrate to an availability zone enabled gateway](gateway-migration.md).
6160

6261
## Recommendations
6362

@@ -76,7 +75,7 @@ During the initial planning phase, it's crucial to determine whether to configur
7675

7776
#### Evaluate the resiliency of multi-site redundant ExpressRoute circuits
7877

79-
After deploying multi-site redundant ExpressRoute circuits with [maximum resiliency](expressroute-howto-circuit-portal-resource-manager.md), it's essential to ensure that on-premises routes are advertised over the redundant circuits to fully utilize the benefits of multi-site redundancy. To evaluate the resiliency and test the failover of redundant circuits and routes Learn more here.
78+
After deploying multi-site redundant ExpressRoute circuits with [maximum resiliency](expressroute-howto-circuit-portal-resource-manager.md), it's essential to ensure that on-premises routes are advertised over the redundant circuits to fully utilize the benefits of multi-site redundancy. To evaluate the resiliency and test the failover of redundant circuits and routes, see [Evaluate ExpressRoute circuit resiliency](evaluate-circuit-resiliency.md).
8079

8180
#### Plan for active-active configuration
8281

@@ -108,7 +107,7 @@ To maximize availability, both the customer and service provider segments on you
108107

109108
#### Plan for geo-redundancy
110109

111-
For disaster recovery planning, we recommend setting up ExpressRoute circuits in multiple peering locations and regions. ExpressRoute circuits can be created in the same metropolitan area or different metropolitan areas, and different service providers can be used for diverse paths through each circuit. Geo-redundant ExpressRoute circuits are utilized to create a robust backend network connectivity for disaster recovery. To learn more, see [Designing for high availability](designing-for-high-availability-with-expressroute.md).
110+
For disaster recovery planning, Microsoft recommends setting up ExpressRoute circuits in multiple peering locations and regions. ExpressRoute circuits can be created in the same metropolitan area or different metropolitan areas, and different service providers can be used for diverse paths through each circuit. Geo-redundant ExpressRoute circuits are utilized to create a robust backend network connectivity for disaster recovery. To learn more, see [Designing for high availability](designing-for-high-availability-with-expressroute.md).
112111

113112
> [!NOTE]
114113
> Using site-to-site VPN as a backup solution for ExpressRoute connectivity is not recommended when dealing with latency-sensitive, mission-critical, or bandwidth-intensive workloads. In such cases, it's advisable to design for disaster recovery with ExpressRoute multi-site resiliency to ensure maximum availability.
@@ -124,7 +123,7 @@ Virtual Network (VNet) Peering provides a more efficient and direct method, enab
124123

125124
#### Configure monitoring & alerting for ExpressRoute circuits
126125

127-
As a baseline, we recommend configuring [Network Insights](expressroute-network-insights.md) within Azure Monitor to view all ExpressRoute circuit metrics, including ExpressRoute Direct and Global Reach. Within the circuits card you can visualize topologies and dependencies for peerings, connections, and gateways. The insights available for circuits include availability, throughput, and packet drops.
126+
As a baseline, Microsoft recommends configuring [Network Insights](expressroute-network-insights.md) within Azure Monitor to view all ExpressRoute circuit metrics, including ExpressRoute Direct and Global Reach. Within the circuits card you can visualize topologies and dependencies for peerings, connections, and gateways. The insights available for circuits include availability, throughput, and packet drops.
128127

129128
#### Configure service health alerts for ExpressRoute circuit maintenance notifications
130129

Lines changed: 10 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
---
22
title: 'Azure ExpressRoute: Designing for high availability'
33
description: This page provides architectural recommendations for high availability while using Azure ExpressRoute.
4-
services: expressroute
54
author: duongau
65
ms.service: azure-expressroute
76
ms.topic: concept-article
8-
ms.date: 11/18/2024
7+
ms.date: 03/16/2026
98
ms.author: duau
109
# Customer intent: As a network architect, I want to design a high availability Azure ExpressRoute setup, so that I can ensure robust and uninterrupted connectivity for my organization's critical applications and services.
1110
---
@@ -21,25 +20,25 @@ Azure ExpressRoute is designed for high availability, providing carrier-grade pr
2120

2221
The following figure illustrates the recommended way to connect using an Azure ExpressRoute circuit to maximize availability.
2322

24-
[![1]][1]
23+
:::image type="content" source="./media/designing-for-high-availability-with-expressroute/exr-reco.png" alt-text="Diagram of the recommended way to connect using Azure ExpressRoute for high availability.":::
2524

2625
For high availability, it's essential to maintain redundancy throughout the end-to-end network. This means maintaining redundancy within your on-premises network and not compromising redundancy within your service provider network. At a minimum, this involves avoiding single points of network failure. Redundant power and cooling for network devices further improve high availability.
2726

2827
### First mile physical layer design considerations
2928

3029
If you terminate both the primary and secondary connections of an Azure ExpressRoute circuit on the same Customer Premises Equipment (CPE), you compromise high availability within your on-premises network. Additionally, configuring both connections using the same port of a CPE forces the partner to compromise high availability on their network segment. This can occur by terminating the two connections under different subinterfaces or merging the two connections within the partner network, as illustrated below.
3130

32-
[![2]][2]
31+
:::image type="content" source="./media/designing-for-high-availability-with-expressroute/suboptimal-lastmile-connectivity.png" alt-text="Diagram of suboptimal last mile connectivity for ExpressRoute circuits.":::
3332

3433
Terminating the primary and secondary connections of an Azure ExpressRoute circuit in different geographical locations can compromise network performance. If traffic is actively load-balanced across connections terminated in different locations, substantial differences in network latency between the two paths can result in suboptimal performance.
3534

36-
For geo-redundant design considerations, see [Designing for disaster recovery with Azure ExpressRoute][DR].
35+
For geo-redundant design considerations, see [Designing for disaster recovery with Azure ExpressRoute](./designing-for-disaster-recovery-with-expressroute-privatepeering.md).
3736

3837
### Active-active connections
3938

4039
Microsoft network operates the primary and secondary connections of Azure ExpressRoute circuits in active-active mode. However, you can force the redundant connections to operate in active-passive mode through your route advertisements. Advertising more specific routes and BGP AS path prepending are common techniques to prefer one path over the other.
4140

42-
To improve high availability, it's recommended to operate both connections in active-active mode. This allows Microsoft network to load balance traffic across the connections on a per-flow basis.
41+
To improve high availability, Microsoft recommends operating both connections in active-active mode. This allows Microsoft network to load balance traffic across the connections on a per-flow basis.
4342

4443
Running connections in active-passive mode risks both connections failing if the active path fails. Common causes for failure include lack of active management of the passive connection and passive connection advertising stale routes.
4544

@@ -52,7 +51,7 @@ Alternatively, running connections in active-active mode results in only about h
5251

5352
Microsoft peering is designed for communication between public endpoints. Typically, on-premises private endpoints are Network Address Translated (NATed) with public IPs on the customer or partner network before communicating over Microsoft peering. Using both primary and secondary connections in an active-active setup affects how quickly you recover from a failure in one of the connections. Two different NAT options are illustrated below:
5453

55-
[![3]][3]
54+
:::image type="content" source="./media/designing-for-high-availability-with-expressroute/nat-options.png" alt-text="Diagram of NAT options for Microsoft peering with ExpressRoute.":::
5655

5756
#### Option 1:
5857

@@ -67,7 +66,7 @@ A common NAT pool is used before splitting traffic between the primary and secon
6766
The NAT pool remains reachable even if the primary or secondary connection fails, allowing the network layer to reroute packets and recover faster.
6867

6968
> [!NOTE]
70-
> * If using NAT option 1 (independent NAT pools for primary and secondary connections) and mapping a port of an IP address from one NAT pool to an on-premises server, the server will not be reachable via the Azure ExpressRoute circuit if the corresponding connection fails.
69+
> * If using NAT option 1 (independent NAT pools for primary and secondary connections) and mapping a port of an IP address from one NAT pool to an on-premises server, the server won't be reachable via the Azure ExpressRoute circuit if the corresponding connection fails.
7170
> * Terminating Azure ExpressRoute BGP connections on stateful devices can cause failover issues during planned or unplanned maintenance by Microsoft or your Azure ExpressRoute Provider. Test your setup to ensure proper failover, and when possible, terminate BGP sessions on stateless devices.
7271
7372
## Fine-tuning features for private peering
@@ -76,27 +75,14 @@ This section reviews optional features that help improve the high availability o
7675

7776
### Availability Zone aware Azure ExpressRoute virtual network gateways
7877

79-
An Availability Zone in an Azure region combines a fault domain and an update domain. To achieve the highest resiliency and availability, configure a zone-redundant Azure ExpressRoute virtual network gateway. For more information, see [About zone-redundant virtual network gateways in Azure Availability Zones][zone redundant vgw]. To configure a zone-redundant virtual network gateway, see [Create a zone-redundant virtual network gateway in Azure Availability Zones][conf zone redundant vgw].
78+
An Availability Zone in an Azure region combines a fault domain and an update domain. To achieve the highest resiliency and availability, configure a zone-redundant Azure ExpressRoute virtual network gateway. For more information, see [About zone-redundant virtual network gateways in Azure Availability Zones](../vpn-gateway/about-zone-redundant-vnet-gateways.md). To configure a zone-redundant virtual network gateway, see [Create a zone-redundant virtual network gateway in Azure Availability Zones](../vpn-gateway/create-zone-redundant-vnet-gateway.md).
8079

8180
### Improving failure detection time
8281

83-
Azure ExpressRoute supports BFD over private peering, reducing failure detection time over the Layer 2 network between Microsoft Enterprise Edge (MSEEs) and their BGP neighbors on the on-premises side from about 3 minutes (default) to less than a second. Quick failure detection helps hasten recovery. For more information, see [Configure BFD over Azure ExpressRoute][BFD].
82+
Azure ExpressRoute supports BFD over private peering, reducing failure detection time over the Layer 2 network between Microsoft Enterprise Edge (MSEEs) and their BGP neighbors on the on-premises side from about 3 minutes (default) to less than a second. Quick failure detection helps hasten recovery. For more information, see [Configure BFD over Azure ExpressRoute](./expressroute-bfd.md).
8483

8584
## Next steps
8685

8786
This article discussed designing for high availability of an Azure ExpressRoute circuit. An Azure ExpressRoute circuit peering point is pinned to a geographical location and can be affected by catastrophic failures impacting the entire location.
8887

89-
For design considerations to build geo-redundant network connectivity to the Microsoft backbone that can withstand catastrophic failures affecting an entire region, see [Designing for disaster recovery with Azure ExpressRoute private peering][DR].
90-
91-
<!--Image References-->
92-
[1]: ./media/designing-for-high-availability-with-expressroute/exr-reco.png "Recommended way to connect using ExpressRoute"
93-
[2]: ./media/designing-for-high-availability-with-expressroute/suboptimal-lastmile-connectivity.png "Suboptimal last mile connectivity"
94-
[3]: ./media/designing-for-high-availability-with-expressroute/nat-options.png "NAT options"
95-
96-
97-
<!--Link References-->
98-
[zone redundant vgw]: ../vpn-gateway/about-zone-redundant-vnet-gateways.md
99-
[conf zone redundant vgw]: ../vpn-gateway/create-zone-redundant-vnet-gateway.md
100-
[Configure Global Reach]: ./expressroute-howto-set-global-reach.md
101-
[BFD]: ./expressroute-bfd.md
102-
[DR]: ./designing-for-disaster-recovery-with-expressroute-privatepeering.md
88+
For design considerations to build geo-redundant network connectivity to the Microsoft backbone that can withstand catastrophic failures affecting an entire region, see [Designing for disaster recovery with Azure ExpressRoute private peering](./designing-for-disaster-recovery-with-expressroute-privatepeering.md).

0 commit comments

Comments
 (0)