You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/scale-containers-azure-container-apps/includes/1-introduction.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
Containerized applications require dynamic scaling to handle varying workloads while controlling costs. This module guides you through configuring automatic horizontal scaling in Azure Container Apps to build responsive, cost-efficient container deployments that adapt to real-time demand.
2
2
3
-
Imagine you're a developer building an order processing service for an e-commerce platform. The application experiences predictable traffic spikes during sales events and unpredictable bursts when marketing campaigns launch. Your current deployment uses fixed resources, leading to poor response times during peak periods and wasted capacity during quiet hours. The operations team reports that costs have increased substantially because the application runs at full capacity around the clock. Meanwhile, customer complaints about slow checkout times have increased during flash sales.
3
+
Imagine you're a developer building an order processing service for an e-commerce platform. The application experiences predictable traffic spikes during sales events and unpredictable bursts when marketing campaigns launch. Your current deployment uses fixed resources, leading to poor response times during peak periods and wasted capacity during quiet hours. The operations team reports that costs increased substantially because the application runs at full capacity around the clock. Meanwhile, customer complaints about slow checkout times increased during flash sales.
4
4
5
5
Your team decides to implement automatic scaling in Azure Container Apps. You need the application to scale out rapidly when HTTP requests increase, process messages from Azure Service Bus queues during order fulfillment, and scale back to zero during idle periods to minimize costs. The platform must handle both synchronous API traffic and asynchronous background processing with different scaling behaviors. Leadership expects the solution to reduce infrastructure costs by at least 40% while maintaining response times under 200 milliseconds during peak load.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/scale-containers-azure-container-apps/includes/2-configure-scale-rules.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,17 +6,17 @@ Scale definitions in Azure Container Apps consist of three components: limits, r
6
6
7
7
Azure Container Apps is powered by [KEDA (Kubernetes Event-driven Autoscaling)](https://keda.sh/), which provides the underlying scaling infrastructure. When you configure scale rules, the platform translates your settings into KEDA specifications that monitor your defined triggers and adjust replica counts accordingly. Each replica is an instance of your container app that runs independently and can handle requests.
8
8
9
-
The default scale behavior creates up to 10 replicas with a minimum of zero when ingress is enabled and no custom rules are defined. If ingress is disabled and you don't specify a minimum replica count or custom scale rule, your container app scales to zero and cannot restart because there is no trigger to activate it. You can configure the minimum to one or more replicas to ensure your application remains available without waiting for scale-up.
9
+
The default scale behavior creates up to 10 replicas with a minimum of zero when ingress is enabled and no custom rules are defined. If ingress is disabled and you don't specify a minimum replica count or custom scale rule, your container app scales to zero and can't restart because there's no trigger to activate it. You can configure the minimum to one or more replicas to ensure your application remains available without waiting for scale-up.
10
10
11
11
Billing in Azure Container Apps depends on replica count. When your application scales to zero, you incur no compute charges. Replicas that are running but not actively processing requests are billed at a lower idle rate. Setting a minimum replica count of one or more ensures availability but increases costs compared to scale-to-zero configurations.
12
12
13
13
## Configure HTTP scale rules
14
14
15
15
HTTP scaling adjusts replica count based on concurrent HTTP requests to your container app. The platform calculates concurrent requests by counting the number of requests received in the past 15 seconds and dividing by 15. When this value exceeds your configured threshold, the platform creates additional replicas to handle the load.
16
16
17
-
The default HTTP concurrency threshold is 10 requests per replica. You can adjust this value based on your application's capacity and response time requirements. Lower thresholds trigger scaling earlier, providing more headroom but potentially creating more replicas than necessary. Higher thresholds maximize utilization of each replica but may cause latency increases before new replicas are available.
17
+
The default HTTP concurrency threshold is 10 requests per replica. You can adjust this value based on your application's capacity and response time requirements. Lower thresholds trigger scaling earlier, providing more headroom but potentially creating more replicas than necessary. Higher thresholds maximize utilization of each replica but might cause latency increases before new replicas are available.
18
18
19
-
HTTP scaling is appropriate for synchronous API workloads and web applications where request volume directly correlates with resource needs. This scaling type supports scale-to-zero, meaning your application can have zero replicas when no requests arrive and automatically start replicas when traffic resumes. Container Apps jobs cannot use HTTP scaling rules because jobs do not expose HTTP endpoints.
19
+
HTTP scaling is appropriate for synchronous API workloads and web applications where request volume directly correlates with resource needs. This scaling type supports scale-to-zero, meaning your application can have zero replicas when no requests arrive and automatically start replicas when traffic resumes. Container Apps jobs can't use HTTP scaling rules because jobs don't expose HTTP endpoints.
20
20
21
21
The following command creates a container app with an HTTP scale rule that triggers scaling when concurrent requests exceed 50 per replica:
22
22
@@ -45,7 +45,7 @@ Like HTTP scaling, TCP scaling supports scale-to-zero. When all TCP connections
45
45
46
46
CPU and memory scaling adjust replica count based on resource utilization across your container app replicas. These rules are implemented as KEDA custom scalers and trigger scaling when average utilization exceeds your configured percentage threshold. CPU scaling monitors processor utilization, while memory scaling monitors memory consumption.
47
47
48
-
Resource-based scaling has a critical limitation: CPU and memory rules cannot scale your application to zero. The platform requires at least one running replica to measure utilization, so these scaling types always maintain a minimum of one replica regardless of your configured minimum. If you need scale-to-zero capability, combine resource scaling with HTTP or event-driven rules, or use HTTP scaling as your primary trigger.
48
+
Resource-based scaling has a critical limitation: CPU and memory rules can't scale your application to zero. The platform requires at least one running replica to measure utilization, so these scaling types always maintain a minimum of one replica regardless of your configured minimum. If you need scale-to-zero capability, combine resource scaling with HTTP or event-driven rules, or use HTTP scaling as your primary trigger.
49
49
50
50
CPU scaling is appropriate for compute-intensive workloads such as image processing, video transcoding, or machine learning inference where processor utilization directly indicates capacity needs. Memory scaling suits applications with memory-intensive operations like caching, data aggregation, or processing large datasets where memory consumption reflects workload intensity.
51
51
@@ -73,7 +73,7 @@ scale:
73
73
74
74
The scaling algorithm in Azure Container Apps uses several timing parameters that affect how quickly your application responds to load changes. Understanding these parameters helps you configure rules that balance responsiveness with stability.
75
75
76
-
The polling interval determines how frequently the platform checks your scale triggers. For custom scalers including CPU, memory, and event-driven triggers, the polling interval is 30 seconds. HTTP and TCP rules use a 15-second calculation window. This means changes in load may not trigger scaling for up to 30 seconds after they occur.
76
+
The polling interval determines how frequently the platform checks your scale triggers. For custom scalers including CPU, memory, and event-driven triggers, the polling interval is 30 seconds. HTTP and TCP rules use a 15-second calculation window. This means changes in load might not trigger scaling for up to 30 seconds after they occur.
77
77
78
78
The cool-down period is how long the platform waits after the last scaling event before considering scale-down to zero replicas. The default cool-down period is 300 seconds (five minutes). This delay prevents rapid scale-down when traffic temporarily drops and helps avoid repeated scale-up and scale-down cycles for bursty workloads.
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/scale-containers-azure-container-apps/includes/3-event-driven-scaling-keda.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
Event-driven scaling enables your container apps to respond to external signals beyond HTTP traffic. Azure Container Apps integrates with KEDA (Kubernetes Event-driven Autoscaling) to provide scaling based on message queues, event streams, and other Azure services. This capability is essential for applications that process asynchronous workloads where scaling based on request volume alone does not reflect actual work being performed.
1
+
Event-driven scaling enables your container apps to respond to external signals beyond HTTP traffic. Azure Container Apps integrates with KEDA (Kubernetes Event-driven Autoscaling) to provide scaling based on message queues, event streams, and other Azure services. This capability is essential for applications that process asynchronous workloads where scaling based on request volume alone doesn't reflect actual work being performed.
2
2
3
3
## Understand KEDA integration
4
4
@@ -51,7 +51,7 @@ Azure Event Hubs scaling is designed for high-throughput streaming scenarios whe
51
51
52
52
The metadata parameters for Event Hubs scaling include `consumerGroup`, `unprocessedEventThreshold`, and `checkpointStrategy`. The `unprocessedEventThreshold` sets the number of unprocessed events per partition that triggers scaling. The `checkpointStrategy` specifies how the scaler determines checkpoint positions, with `blobMetadata` being the recommended approach for applications using Azure Blob Storage for checkpointing.
53
53
54
-
Event Hubs partitions affect the maximum effective replica count. Since each partition can only be read by one consumer at a time within a consumer group, your application cannot benefit from more replicas than partitions. If your Event Hub has 32 partitions, setting `maxReplicas` higher than 32 provides no additional scaling benefit.
54
+
Event Hubs partitions affect the maximum effective replica count. Since each partition can be read by one consumer at a time within a consumer group, your application can't benefit from more replicas than partitions. If your Event Hub has 32 partitions, setting `maxReplicas` higher than 32 provides no additional scaling benefit.
55
55
56
56
The following YAML configuration demonstrates Event Hubs scaling with checkpoint-based lag monitoring:
Copy file name to clipboardExpand all lines: learn-pr/wwl-data-ai/scale-containers-azure-container-apps/includes/4-keda-scalers-custom-workloads.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,13 +6,13 @@ Azure Container Apps supports any ScaledObject-based KEDA scaler, providing acce
6
6
7
7
When evaluating whether a specific scaler meets your requirements, consider the authentication methods it supports, the metrics it exposes, and how those metrics translate to replica counts. Review the [KEDA scalers documentation](https://keda.sh/docs/scalers/) for detailed specifications of each scaler, including required and optional metadata parameters, supported authentication mechanisms, and example configurations.
8
8
9
-
Scalers are categorized by their maintainer. Microsoft maintains Azure-native scalers with direct support. Community-maintained scalers receive contributions from the open-source community and may have varying levels of documentation and support. External scalers run as separate components and require additional deployment steps not covered by the built-in Container Apps configuration.
9
+
Scalers are categorized by their maintainer. Microsoft maintains Azure-native scalers with direct support. Community-maintained scalers receive contributions from the open-source community and might have varying levels of documentation and support. External scalers run as separate components and require additional deployment steps not covered by the built-in Container Apps configuration.
10
10
11
11
## Configure Apache Kafka scaling
12
12
13
13
Apache Kafka scaling triggers replica changes based on consumer group lag. The scaler monitors the difference between the latest offset in each partition and the committed offset of your consumer group. When lag accumulates, the scaler increases replica count to process messages faster and reduce the backlog.
14
14
15
-
The key metadata parameters for Kafka scaling include `bootstrapServers`, `consumerGroup`, `topic`, and `lagThreshold`. The `lagThreshold` parameter sets the lag per partition that triggers scaling. For example, if you set `lagThreshold` to 100 and your consumer group has 500 messages of lag across partitions, the scaler calculates that 5 replicas are needed.
15
+
The key metadata parameters for Kafka scaling include `bootstrapServers`, `consumerGroup`, `topic`, and `lagThreshold`. The `lagThreshold` parameter sets the lag per partition that triggers scaling. For example, if you set `lagThreshold` to 100 and your consumer group has 500 messages of lag across partitions, the scaler calculates that five replicas are needed.
16
16
17
17
Kafka authentication typically uses SASL mechanisms. You configure credentials as Container Apps secrets and reference them in the scaler authentication settings. The scaler supports SASL/PLAIN, SASL/SCRAM, and TLS authentication depending on your Kafka cluster configuration.
18
18
@@ -103,17 +103,17 @@ Follow these steps to convert a KEDA scaler specification:
103
103
104
104
1. Configure scale limits using `--min-replicas` and `--max-replicas`. These correspond to the `minReplicaCount` and `maxReplicaCount` in KEDA ScaledObject specifications.
105
105
106
-
The Container Apps format differs from native KEDA in several ways. Container Apps doesn't support the full TriggerAuthentication resource type; instead, you reference secrets directly in the scale rule. Some advanced KEDA features like external scalers or custom scaling intervals may not be available or may require different configuration approaches.
106
+
The Container Apps format differs from native KEDA in several ways. Container Apps doesn't support the full TriggerAuthentication resource type; instead, you reference secrets directly in the scale rule. Some advanced KEDA features like external scalers or custom scaling intervals might not be available or might require different configuration approaches.
107
107
108
108
## Best practices
109
109
110
110
- **Start with Azure-native scalers:** Azure Service Bus, Event Hubs, and Storage Queue scalers have first-party support and are maintained by Microsoft. Use these for Azure resources before considering community-maintained alternatives.
111
111
112
-
- **Test scaler behavior in staging:** Custom scalers may have unexpected polling or threshold behaviors. Validate scaling patterns in a non-production environment before deploying to production. Monitor how quickly scaling responds to load changes and verify that thresholds produce the expected replica counts.
112
+
- **Test scaler behavior in staging:** Custom scalers might have unexpected polling or threshold behaviors. Validate scaling patterns in a non-production environment before deploying to production. Monitor how quickly scaling responds to load changes and verify that thresholds produce the expected replica counts.
113
113
114
114
- **Combine scheduled and reactive scaling:** Use cron scalers to establish baseline capacity before known peak periods. Add event-driven or HTTP scalers to handle variations and unexpected spikes. This combination ensures capacity is available when needed while still responding to actual demand.
115
115
116
-
- **Document scaler configurations:** Custom scaler metadata is not self-documenting. Maintain documentation that explains why specific thresholds were chosen, how authentication is configured, and what metrics drive scaling decisions. This documentation helps team members understand and maintain scaling configurations over time.
116
+
- **Document scaler configurations:** Custom scaler metadata isn't self-documenting. Maintain documentation that explains why specific thresholds were chosen, how authentication is configured, and what metrics drive scaling decisions. This documentation helps team members understand and maintain scaling configurations over time.
0 commit comments