feat(ecs_service): native ECS deployment strategies (rolling/blue_green/linear/canary) by mabadir · Pull Request #5 · flightcontrolhq/modules

mabadir · 2026-06-04T16:16:48Z

Summary

Replaces the dual-service / CodeDeploy-era blue/green design with the native ECS deployment controller strategies: rolling, blue_green, linear, and canary — and makes the strategy a per-deployment decision that requires no Terraform changes to switch.

Native strategies (`b4ee71a`)

deployment_controller.type = "ECS" always — the CodeDeploy/EXTERNAL mapping is removed
New deployment_type values linear / canary alongside rolling / blue_green
New deployment_strategy_config variable (bake time, canary %, linear step %) that seeds deployment_configuration at create time
New test_listener_rule_arn variable for blue/green test-traffic validation (TEST_TRAFFIC_SHIFT stages)
ECS infrastructure IAM role with AmazonECSInfrastructureRolePolicyForLoadBalancers — ECS assumes it to rewrite listener rules and (de)register targets during traffic shifts
load_balancer.advanced_configuration wiring (production/alternate target group, production + optional test listener rule, role)
deployment_configuration added to ignore_changes: the Flightcontrol deploy manager passes the authoritative configuration (strategy, bake times, pause lifecycle hooks) on every UpdateService call, so Terraform must not fight it
Requires AWS provider >= 6.21 (linear_configuration / canary_configuration)

Strategy is per-deployment (`3c72a84`)

Whenever a load balancer is attached, the module now always provisions:

the production (tg-1) + alternate (tg-2) target-group pair (the rolling-only aws_lb_target_group.this is removed)
the ECS infrastructure role
the service's advanced_configuration

Rolling deployments serve from the production target group only and never touch the alternate, so any service can switch between all four strategies on a single UpdateService call. deployment_type is purely the create-time seed.

Trade-off: every load-balanced rolling service carries one idle target group and one unused IAM role.

Outputs

production_target_group_arn / alternate_target_group_arn (+ names), ecs_infrastructure_role_arn
target_group_arn retained as an alias of the production target group
Removed: blue/green dual-service outputs

Tests

Fixed the module test suite so it actually runs: basic_service was failing on main (mock provider emits invalid JSON policies / ARNs), which skipped every other run. Added mock_data / mock_resource defaults and fixed two stale assertions.

tofu validate ✅ — tofu test ✅ 11/11 passed

Breaking changes

deployment_controller is always ECS; services created with the old CodeDeploy blue/green mapping are not migrated (pre-GA, clean removal)
aws_lb_target_group.this removed — existing load-balanced services will see target-group recreation on upgrade
Blue/green dual-service variables/outputs removed

🤖 Generated with Claude Code

Greptile Summary

This PR replaces the CodeDeploy/dual-service blue-green design with native ECS deployment controller strategies (rolling, blue_green, linear, canary) and restructures target-group provisioning so the strategy is a per-deployment decision rather than a Terraform-level choice.

Always-on infra: production (tg-1) + alternate (tg-2) target groups, an ECS infrastructure IAM role, and load_balancer.advanced_configuration are provisioned for every LB-attached service, regardless of deployment_type; deployment_configuration and load_balancer are placed in ignore_changes so the Flightcontrol deploy manager owns the authoritative configuration.
New variables and outputs: deployment_strategy_config, test_listener_rule_arn, and new named outputs (production_target_group_arn, alternate_target_group_arn, ecs_infrastructure_role_arn) alongside backward-compatible aliases for existing callers; deployment_type now accepts linear and canary in addition to rolling and blue_green.
Test suite fixed: mock provider defaults and two stale assertions corrected; new canary_deployment and linear_deployment test cases added (all plan-only against mocked providers).

Confidence Score: 3/5

Two unresolved questions about real-world AWS API behavior make this risky to merge without additional validation.

The core architectural bet — that AWS accepts advanced_configuration on a CreateService call where deployment_configuration is absent (rolling default) — is not exercised by any real apply; all 11 tests run against mocked providers with command = plan. If AWS rejects this combination, every load-balanced rolling service (the module default) would fail to provision. Separately, production_listener_rule is hardcoded to the first ALB rule, so services with multiple listener rules silently get incomplete traffic shifting during blue_green/linear/canary deployments without any validation or warning. The infrastructure and IAM changes themselves are clean and the structural refactoring is well-thought-out.

compute/ecs_service/ecs_service.tf — the advanced_configuration block and the hardcoded alb["0"] reference are the two areas that need the most scrutiny before this lands in production.

Important Files Changed

Filename	Overview
compute/ecs_service/ecs_service.tf	Core ECS service resource: `advanced_configuration` now always wired for every LB-attached service; `production_listener_rule` hardcoded to first ALB rule (`alb["0"]`), which silently excludes additional rules from traffic shifting
compute/ecs_service/target_groups.tf	Removed `aws_lb_target_group.this`; tg_1/tg_2 now always provisioned when LB is attached — clean and consistent
compute/ecs_service/iam_infrastructure.tf	New ECS infrastructure role using `AmazonECSInfrastructureRolePolicyForLoadBalancers`; correctly gated on `enable_load_balancer`; trust policy and attachment look correct
compute/ecs_service/outputs.tf	`target_group_arn` retained as alias; `target_group_name` dropped without a backward-compatible alias despite being the matching pair to the retained ARN output
compute/ecs_service/variables.tf	New `deployment_strategy_config`, `test_listener_rule_arn` variables and expanded `deployment_type` validation look correct; `optional()` defaults are sane
compute/ecs_service/tests/basic.tftest.hcl	Mock provider defaults fixed; new canary/linear test cases added; all tests run as `plan` only with mocked providers, so no real AWS API validation is exercised

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[terraform apply\necs_service module] --> B{enable_load_balancer?}
    B -- No --> C[No TGs, no\nadvanced_configuration]
    B -- Yes --> D[Create tg-1 production\n+ tg-2 alternate]
    D --> E[Create ECS\ninfrastructure IAM role]
    E --> F[Wire advanced_configuration\nalways — all strategies]
    F --> G{deployment_type seed}
    G -- rolling --> H[No deployment_configuration block\nignore_changes covers it]
    G -- blue_green / linear / canary --> I[deployment_configuration block\nstrategy + bake/canary/linear tuning]
    H --> J[ECS service created\nserves from tg-1 only]
    I --> J
    J --> K[Deploy Manager calls UpdateService\nwith authoritative strategy\n+ pause lifecycle hooks]
    K -- rolling --> L[ECS: task replacement\nvia min/max healthy pct\ntg-1 only]
    K -- blue_green/linear/canary --> M[ECS: rewrites production_listener_rule\nbetween tg-1 and tg-2\nvia infrastructure role]

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
compute/ecs_service/ecs_service.tf:102-106
**Only the first ALB listener rule participates in traffic shifting**

`production_listener_rule` is hardcoded to `alb["0"].arn`. The AWS ECS `advanced_configuration` API only accepts a single production listener rule ARN, so during blue_green/linear/canary deployments ECS rewrites only this one rule. Any additional listener rules (`alb["1"]`, `alb["2"]`, …) continue forwarding to tg_1 for the entire deployment — traffic on those rules never shifts to the new revision. Consider adding a validation block that requires `length(var.load_balancer_attachment.listener_rules) <= 1` when using a native traffic-shift strategy, or document this constraint prominently so callers with multi-rule services aren't caught off guard.

### Issue 2 of 3
compute/ecs_service/ecs_service.tf:93-111
**`advanced_configuration` wired unconditionally for rolling deployments — not validated against real AWS API**

The whole design bet is that AWS silently accepts `advanced_configuration` (alternate TG, production listener rule, infrastructure role) when `deployment_configuration` is absent (the rolling default). All tests run with `command = plan` against mocked providers, so the AWS API's acceptance of this combination has not been exercised. If AWS rejects `advanced_configuration` on a `CreateService` call that carries no `deployment_configuration`, every rolling service with a load balancer (the module default) would fail to provision. A real `tofu apply` against AWS — even in a scratch account — should be added to the test suite to confirm this assumption before merging.

### Issue 3 of 3
compute/ecs_service/outputs.tf:96-104
**`target_group_name` dropped without a backward-compatible alias**

`target_group_arn` was deliberately kept as an alias for `production_target_group_arn`, but `target_group_name` — the natural companion — was removed entirely. Callers that reference `module.xxx.target_group_name` (e.g., in CloudWatch dashboards or external scripts) will get a Terraform error on upgrade. This is not listed as a named breaking change in the PR description. Adding `output "target_group_name" { value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].name : null }` would match the alias pattern used for the ARN.

_{Reviews (1): Last reviewed commit: "feat(ecs_service): make deployment strat..." | Re-trigger Greptile}

Greptile also left 3 inline comments on this PR.

…en/linear/canary) Replace the CodeDeploy/external-controller blue/green wiring with the ECS deployment controller's built-in strategies: - deployment_type now accepts rolling | blue_green | linear | canary; the deployment controller is always ECS (CODE_DEPLOY mapping removed) - new deployment_strategy_config seeds bake time + canary/linear tuning at create time; deployment_configuration is in ignore_changes because the Flightcontrol deploy manager passes the authoritative config (including pause lifecycle hooks) on every UpdateService call - native strategies wire load_balancer.advanced_configuration: alternate target group (tg-2), production listener rule (first ALB rule or the NLB listener), optional test_listener_rule_arn, and a new ECS infrastructure role (AmazonECSInfrastructureRolePolicyForLoadBalancers) - target groups tg-1/tg-2 are now production/alternate and gate on any native traffic-shift strategy, not just blue_green; outputs renamed accordingly (production_/alternate_target_group_*) and ecs_infrastructure_role_arn added - provider floor bumped to aws >= 6.21 (linear/canary configuration) Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Always provision the production/alternate target-group pair, the ECS infrastructure role, and the service's load_balancer advanced_configuration whenever a load balancer is attached — not just for native traffic-shift deployment_types. Rolling deployments serve from the production target group only, so any service can now switch between rolling / blue_green / linear / canary on a single UpdateService call with zero Terraform changes; deployment_type is purely the create-time seed. Also fixes the module test suite: mock the iam_policy_document / partition / region / vpc data sources and computed ARNs so all 11 runs execute (basic_service was failing on main and skipping the rest). Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

greptile-apps · 2026-06-04T16:23:02Z

+        production_listener_rule = (
+          local.enable_nlb_listener
+          ? aws_lb_listener.nlb[0].arn
+          : aws_lb_listener_rule.alb["0"].arn
+        )


Only the first ALB listener rule participates in traffic shifting

production_listener_rule is hardcoded to alb["0"].arn. The AWS ECS advanced_configuration API only accepts a single production listener rule ARN, so during blue_green/linear/canary deployments ECS rewrites only this one rule. Any additional listener rules (alb["1"], alb["2"], …) continue forwarding to tg_1 for the entire deployment — traffic on those rules never shifts to the new revision. Consider adding a validation block that requires length(var.load_balancer_attachment.listener_rules) <= 1 when using a native traffic-shift strategy, or document this constraint prominently so callers with multi-rule services aren't caught off guard.

Prompt To Fix With AI

This is a comment left during a code review. Path: compute/ecs_service/ecs_service.tf Line: 102-106 Comment: **Only the first ALB listener rule participates in traffic shifting** `production_listener_rule` is hardcoded to `alb["0"].arn`. The AWS ECS `advanced_configuration` API only accepts a single production listener rule ARN, so during blue_green/linear/canary deployments ECS rewrites only this one rule. Any additional listener rules (`alb["1"]`, `alb["2"]`, …) continue forwarding to tg_1 for the entire deployment — traffic on those rules never shifts to the new revision. Consider adding a validation block that requires `length(var.load_balancer_attachment.listener_rules) <= 1` when using a native traffic-shift strategy, or document this constraint prominently so callers with multi-rule services aren't caught off guard. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-04T16:23:02Z

  dynamic "load_balancer" {
-    for_each = local.enable_load_balancer && var.deployment_type == "blue_green" ? [1] : []
+    for_each = local.enable_load_balancer ? [1] : []
    content {
      target_group_arn = aws_lb_target_group.tg_1[0].arn
      container_name   = local.lb_container_name
      container_port   = local.lb_container_port
+
+      advanced_configuration {
+        alternate_target_group_arn = aws_lb_target_group.tg_2[0].arn
+        production_listener_rule = (
+          local.enable_nlb_listener
+          ? aws_lb_listener.nlb[0].arn
+          : aws_lb_listener_rule.alb["0"].arn
+        )
+        test_listener_rule = var.test_listener_rule_arn
+        role_arn           = aws_iam_role.ecs_infrastructure[0].arn
+      }
    }
  }


advanced_configuration wired unconditionally for rolling deployments — not validated against real AWS API

The whole design bet is that AWS silently accepts advanced_configuration (alternate TG, production listener rule, infrastructure role) when deployment_configuration is absent (the rolling default). All tests run with command = plan against mocked providers, so the AWS API's acceptance of this combination has not been exercised. If AWS rejects advanced_configuration on a CreateService call that carries no deployment_configuration, every rolling service with a load balancer (the module default) would fail to provision. A real tofu apply against AWS — even in a scratch account — should be added to the test suite to confirm this assumption before merging.

Prompt To Fix With AI

This is a comment left during a code review. Path: compute/ecs_service/ecs_service.tf Line: 93-111 Comment: **`advanced_configuration` wired unconditionally for rolling deployments — not validated against real AWS API** The whole design bet is that AWS silently accepts `advanced_configuration` (alternate TG, production listener rule, infrastructure role) when `deployment_configuration` is absent (the rolling default). All tests run with `command = plan` against mocked providers, so the AWS API's acceptance of this combination has not been exercised. If AWS rejects `advanced_configuration` on a `CreateService` call that carries no `deployment_configuration`, every rolling service with a load balancer (the module default) would fail to provision. A real `tofu apply` against AWS — even in a scratch account — should be added to the test suite to confirm this assumption before merging. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-04T16:23:04Z

 output "target_group_arn" {
-  description = "The ARN of the target group (null if load balancer disabled or blue/green deployment)."
-  value       = local.enable_load_balancer && var.deployment_type == "rolling" ? aws_lb_target_group.this[0].arn : null
+  description = "The ARN of the production target group the service serves from (alias of production_target_group_arn; null if load balancer disabled)."
+  value       = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn : null
 }

 output "target_group_arn_suffix" {
-  description = "The ARN suffix of the target group for CloudWatch metrics."
-  value       = local.enable_load_balancer && var.deployment_type == "rolling" ? aws_lb_target_group.this[0].arn_suffix : null
+  description = "The ARN suffix of the production target group for CloudWatch metrics."
+  value       = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].arn_suffix : null
 }


target_group_name dropped without a backward-compatible alias

target_group_arn was deliberately kept as an alias for production_target_group_arn, but target_group_name — the natural companion — was removed entirely. Callers that reference module.xxx.target_group_name (e.g., in CloudWatch dashboards or external scripts) will get a Terraform error on upgrade. This is not listed as a named breaking change in the PR description. Adding output "target_group_name" { value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].name : null } would match the alias pattern used for the ARN.

Prompt To Fix With AI

This is a comment left during a code review. Path: compute/ecs_service/outputs.tf Line: 96-104 Comment: **`target_group_name` dropped without a backward-compatible alias** `target_group_arn` was deliberately kept as an alias for `production_target_group_arn`, but `target_group_name` — the natural companion — was removed entirely. Callers that reference `module.xxx.target_group_name` (e.g., in CloudWatch dashboards or external scripts) will get a Terraform error on upgrade. This is not listed as a named breaking change in the PR description. Adding `output "target_group_name" { value = local.enable_load_balancer ? aws_lb_target_group.tg_1[0].name : null }` would match the alias pattern used for the ARN. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

- Reject >1 ALB listener rule for native traffic-shift strategies: advanced_configuration only rewrites a single production listener rule, so extra rules would keep serving the old revision for the whole deployment. Documented the constraint on listener_rules since the strategy is a per-deployment decision the deploy manager can change without Terraform. - Restore the target_group_name output as an alias of production_target_group_name, matching the target_group_arn alias. - Make TestEcsServiceWithAlb assert the deployed rolling service carries advanced_configuration (alternate TG, production listener rule, infrastructure role), validating that the real AWS API accepts it on CreateService without a deployment_configuration. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

mabadir and others added 2 commits June 4, 2026 00:25

greptile-apps Bot reviewed Jun 4, 2026

View reviewed changes

mabadir added the run-terratest Run Terratest integration tests (real AWS apply) on this PR label Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ecs_service): native ECS deployment strategies (rolling/blue_green/linear/canary)#5

feat(ecs_service): native ECS deployment strategies (rolling/blue_green/linear/canary)#5
mabadir wants to merge 3 commits into
mainfrom
native-ecs-deployment-strategies

mabadir commented Jun 4, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot Jun 4, 2026

Uh oh!

greptile-apps Bot Jun 4, 2026

Uh oh!

greptile-apps Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mabadir commented Jun 4, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Native strategies (b4ee71a)

Strategy is per-deployment (3c72a84)

Outputs

Tests

Breaking changes

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mabadir commented Jun 4, 2026 •

edited by greptile-apps Bot

Loading

Native strategies (`b4ee71a`)

Strategy is per-deployment (`3c72a84`)