Skip to content

Infra reorg#27

Merged
kevincodex1 merged 6 commits into
mainfrom
infra-reorg
Jun 5, 2026
Merged

Infra reorg#27
kevincodex1 merged 6 commits into
mainfrom
infra-reorg

Conversation

@kevincodex1
Copy link
Copy Markdown
Contributor

@kevincodex1 kevincodex1 commented Jun 5, 2026

move fly files under infra directory

Summary by CodeRabbit

  • Documentation

    • Clarified node prerequisites with examples and added comprehensive deployment guides for Fly.io and AWS, plus platform deployment layout and AWS runbook.
  • New Features

    • One‑click AWS deployment for a single node: EC2-based node with persistent encrypted storage, networking/security defaults, automated encrypted snapshots, SSM-based upgrade/session commands, and Docker Compose–based bootstrap.
  • Chores

    • Added AWS deployment templates, variables, provider declarations/lockfile, infra ignore rules, and updated Fly build configuration.

kevincodex1 and others added 2 commits June 5, 2026 08:32
Deployment configs now live under infra/, organized per target.
Dockerfiles and docker-compose.yml intentionally stay at the repo
root (shared by CI, build scripts, and the macOS app).

Deploy with: fly deploy -c infra/fly/fly.toml

Co-Authored-By: OpenClaude <[email protected]>
infra/aws/ provisions a production-leaning single-node deployment:
- t4g.small (AL2023 arm64) running the published ghcr.io/gitlawb/node
  image + postgres:16 via docker compose
- separate encrypted EBS data volume (prevent_destroy) with daily DLM
  snapshots; survives instance replacement
- postgres password via random_password -> SSM SecureString, fetched at
  boot by instance profile (never in user-data); optional secrets
  (operator key, Pinata JWT, S3 secret) follow the same path
- SSM Session Manager access (no SSH by default), IMDSv2 required,
  metrics port closed unless explicitly opened
- SSM command document for image upgrades (user-data runs once)

Verified: terraform fmt/init/validate clean; rendered compose passes
docker compose config; rendered user-data passes bash -n.

Co-Authored-By: OpenClaude <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b7edae70-00c6-463f-b128-c1993d14beb6

📥 Commits

Reviewing files that changed from the base of the PR and between a759c01 and 5f06251.

📒 Files selected for processing (1)
  • infra/aws/main.tf
🚧 Files skipped from review as they are similar to previous changes (1)
  • infra/aws/main.tf

📝 Walkthrough

Walkthrough

Adds AWS Terraform infra for a single EC2 gitlawb node, Docker Compose/user-data bootstrap, SSM-stored secrets and DLM snapshots, supporting docs, and a Fly.io Dockerfile path fix.

Changes

Deployment documentation

Layer / File(s) Summary
Deployment guides and platform documentation
docs/RUN-A-NODE.md, infra/README.md, infra/aws/README.md, infra/fly/fly.toml
Clarifies the public HTTP URL prerequisite with examples (e.g., Fly.io), adds an infra/ directory overview and per-platform deployment guidance, provides an AWS operations guide, and updates the Fly.io Dockerfile path to ../../Dockerfile.

AWS Terraform infrastructure

Layer / File(s) Summary
Terraform contracts, variables, and examples
infra/aws/versions.tf, infra/aws/variables.tf, infra/aws/terraform.tfvars.example, infra/aws/.gitignore, infra/aws/.terraform.lock.hcl
Declares required Terraform/provider versions and AWS region wiring, input variables for deployment/runtime/integrations, example tfvars, infra .gitignore, and a provider lockfile.
AWS infrastructure provisioning and lifecycle
infra/aws/main.tf
Subnet/AMI lookup, SSM SecureString secrets, IAM role/instance profile for SSM, security group for HTTP/p2p and optional metrics/SSH, encrypted persistent EBS volume with prevent_destroy, Elastic IP, EC2 instance with rendered user-data, DLM daily snapshot lifecycle, and an SSM upgrade document.
Runtime bootstrap and outputs
infra/aws/compose.yaml.tftpl, infra/aws/user-data.sh.tftpl, infra/aws/outputs.tf
Adds rendered Docker Compose (Postgres + node), user-data that installs Docker/Compose, discovers/formats/mounts data disk, fetches secrets into /opt/gitlawb/.env, writes compose, starts stack, and exposes Terraform outputs (elastic_ip, api_url, instance_id, data_volume_id, postgres_password_ssm_param, ssm session/upgrade commands).

Sequence Diagram(s)

sequenceDiagram
  participant EC2 as EC2 instance
  participant UserData as user-data script
  participant Docker as Docker/Compose
  participant SSM as SSM Parameter Store
  participant EBS as EBS volume
  
  EC2->>UserData: cloud-init executes
  UserData->>Docker: install Docker + Compose
  UserData->>EBS: discover unmounted disk
  UserData->>EBS: format ext4 if needed
  UserData->>EBS: mount to /mnt/data
  UserData->>SSM: fetch postgres_password and optional secrets
  UserData->>UserData: write /opt/gitlawb/.env and compose.yaml
  UserData->>Docker: docker compose pull
  UserData->>Docker: docker compose up -d --remove-orphans
  Docker-->>EC2: Postgres + node running
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 A hop through clouds to AWS we go,
Terraform seeds where the volumes grow.
Compose and secrets snug in a file,
Snapshots rest in the night awhile.
SSM hums and the node wakes with a smile.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title "Infra reorg" is vague and does not convey meaningful information about the specific changes made in this pull request. Use a more descriptive title that clearly indicates the main change, such as "Add AWS Terraform infrastructure configuration for gitlawb node deployment" or "Set up infrastructure with AWS deployment and Fly.io configuration reorganization".
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch infra-reorg

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
infra/aws/main.tf (1)

165-172: ⚡ Quick win

Guard against opening SSH without a key pair.

When ssh_ingress_cidr is set but ssh_key_name is null, SSH gets exposed but isn’t actually usable. Add a plan-time precondition so this misconfiguration fails fast.

🔒 Suggested guardrail
 resource "aws_instance" "node" {
   ami                    = nonsensitive(data.aws_ssm_parameter.al2023_arm64.value)
   instance_type          = var.instance_type
   subnet_id              = local.subnet_id
   vpc_security_group_ids = [aws_security_group.node.id]
   iam_instance_profile   = aws_iam_instance_profile.node.name
   key_name               = var.ssh_key_name
   user_data              = local.user_data
   tags                   = merge(local.common_tags, { Name = var.name_prefix })

+  lifecycle {
+    precondition {
+      condition     = var.ssh_ingress_cidr == null || var.ssh_key_name != null
+      error_message = "ssh_key_name must be set when ssh_ingress_cidr is configured."
+    }
+    ignore_changes = [ami]
+  }
-
-  lifecycle {
-    # New AL2023 AMI releases shouldn't force-replace the instance; replace
-    # deliberately (taint) for OS upgrades. user_data changes also only apply
-    # at first boot — use the upgrade SSM command for image bumps.
-    ignore_changes = [ami]
-  }
 }

Also applies to: 251-252

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infra/aws/main.tf` around lines 165 - 172, Add a plan-time precondition to
the resource that creates the SSH rule (the block containing the for_each using
var.ssh_ingress_cidr) so Terraform fails if ssh_ingress_cidr is set but
ssh_key_name is null; implement a precondition like: condition =
var.ssh_ingress_cidr == null || var.ssh_key_name != null and a clear
error_message, referencing var.ssh_ingress_cidr and var.ssh_key_name to guard
the aws_security_group_rule (or the resource that contains the shown content)
from exposing unusable SSH access.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@infra/aws/outputs.tf`:
- Around line 6-9: The output "api_url" currently reconstructs the endpoint from
aws_eip.node.public_ip and var.gitlawb_port and ignores the module override
var.public_url/local.public_url; change the output "api_url" value to return the
module's public_url when provided (i.e., use var.public_url or local.public_url
if non-empty) and only fall back to building
"http://${aws_eip.node.public_ip}:${var.gitlawb_port}" when public_url is unset,
so the output honors the module contract.

In `@infra/aws/README.md`:
- Around line 7-15: Add a language tag to the fenced code block that begins with
"Elastic IP ──► EC2 t4g.small (Amazon Linux 2023, arm64)" by changing the
opening ``` to ```text so the block is explicitly marked as plain text (this
fixes the MD040 markdownlint warning).

In `@infra/aws/user-data.sh.tftpl`:
- Around line 3-4: Remove command tracing around secret handling: change the
global "set -euxo pipefail" to "set -euo pipefail" (remove the -x) and/or
explicitly disable tracing just around the SSM fetch and .env rendering blocks
by inserting "set +x" before those sections and "set -x" after them; target the
existing "set -euxo pipefail" invocation and the secret-handling block (the SSM
fetch/.env rendering region referenced in the diff) so secrets are never written
to /var/log/gitlawb-bootstrap.log.
- Around line 14-16: The curl download of Docker Compose (the command using
COMPOSE_VERSION and uname -m) lacks retry and timeout controls; update that curl
invocation to add robust flags such as --fail, --retry with a nonzero count,
--retry-connrefused, --retry-delay (or --retry-max-time) and a global --max-time
to bound total time, so transient network failures won't break first-boot
provisioning and the download will time out deterministically.

In `@infra/aws/variables.tf`:
- Around line 51-55: The snapshot_retain_count variable allows zero/negative
values; add a Terraform validation block to enforce a lower bound (e.g.,
var.snapshot_retain_count >= 1) so invalid values are rejected at plan time;
update the variable "snapshot_retain_count" to include validation { condition =
var.snapshot_retain_count >= 1 description = "snapshot_retain_count must be >=
1" } to prevent failures when aws_dlm_lifecycle_policy is applied.

In `@infra/README.md`:
- Around line 5-10: The fenced code block showing the infra directory tree is
missing a language tag (MD040); update the triple-backtick block that contains
the tree (the infra/ ├── fly/ ... └── aws/ lines) to use a language tag of
"text" (i.e. change ``` to ```text) so the markdownlint rule is satisfied.

---

Nitpick comments:
In `@infra/aws/main.tf`:
- Around line 165-172: Add a plan-time precondition to the resource that creates
the SSH rule (the block containing the for_each using var.ssh_ingress_cidr) so
Terraform fails if ssh_ingress_cidr is set but ssh_key_name is null; implement a
precondition like: condition = var.ssh_ingress_cidr == null || var.ssh_key_name
!= null and a clear error_message, referencing var.ssh_ingress_cidr and
var.ssh_key_name to guard the aws_security_group_rule (or the resource that
contains the shown content) from exposing unusable SSH access.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: baa07535-3f9e-447a-8a18-3397b0454bad

📥 Commits

Reviewing files that changed from the base of the PR and between 09a3397 and 2cae505.

📒 Files selected for processing (13)
  • docs/RUN-A-NODE.md
  • infra/README.md
  • infra/aws/.gitignore
  • infra/aws/.terraform.lock.hcl
  • infra/aws/README.md
  • infra/aws/compose.yaml.tftpl
  • infra/aws/main.tf
  • infra/aws/outputs.tf
  • infra/aws/terraform.tfvars.example
  • infra/aws/user-data.sh.tftpl
  • infra/aws/variables.tf
  • infra/aws/versions.tf
  • infra/fly/fly.toml

Comment thread infra/aws/outputs.tf
Comment thread infra/aws/README.md Outdated
Comment thread infra/aws/user-data.sh.tftpl
Comment thread infra/aws/user-data.sh.tftpl Outdated
Comment thread infra/aws/variables.tf
Comment thread infra/README.md Outdated
- api_url output now honors the public_url override (local.public_url)
- disable xtrace around SSM secret fetch + .env write so secret values
  never reach the bootstrap log
- add retry/timeout flags to the docker compose download
- validate snapshot_retain_count (1-1000) at plan time
- add language tags to fenced tree blocks (MD040)

Co-Authored-By: OpenClaude <[email protected]>
@kevincodex1
Copy link
Copy Markdown
Contributor Author

@coderabbitai please review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== infra/fly/fly.toml ==="
cat infra/fly/fly.toml

echo ""
echo "=== infra/README.md ==="
cat infra/README.md

echo ""
echo "=== infra/aws/README.md ==="
cat infra/aws/README.md

echo ""
echo "=== infra/aws/main.tf ==="
cat infra/aws/main.tf

echo ""
echo "=== infra/aws/variables.tf ==="
cat infra/aws/variables.tf

echo ""
echo "=== infra/aws/outputs.tf ==="
cat infra/aws/outputs.tf

echo ""
echo "=== infra/aws/user-data.sh.tftpl ==="
cat infra/aws/user-data.sh.tftpl

echo ""
echo "=== infra/aws/compose.yaml.tftpl ==="
cat infra/aws/compose.yaml.tftpl

echo ""
echo "=== infra/aws/versions.tf ==="
cat infra/aws/versions.tf

echo ""
echo "=== infra/aws/terraform.tfvars.example ==="
cat infra/aws/terraform.tfvars.example

echo ""
echo "=== infra/aws/.gitignore ==="
cat infra/aws/.gitignore

echo ""
echo "=== infra/aws/.terraform.lock.hcl ==="
cat infra/aws/.terraform.lock.hcl

echo ""
echo "=== docs/RUN-A-NODE.md ==="
cat docs/RUN-A-NODE.md

Length of output: 42487



@kevincodex1 Here is a detailed review of PR #27.


PR Review: Infra Reorg

Overall this is a solid, well-structured Terraform deployment with good security practices (SSM secrets, IMDSv2, encrypted EBS, prevent_destroy). A few issues worth addressing before merging to production:


🔴 Critical / High

1. Security Group VPC mismatch when subnet_id is set (infra/aws/main.tf)

aws_security_group.node is always created in the default VPC, but if a caller passes a custom subnet_id that lives in a non-default VPC, the security group and the instance will be in different VPCs — the instance won't apply the SG and Terraform may fail or silently misconfigure networking.

# current — always default VPC
resource "aws_security_group" "node" {
  vpc_id = data.aws_vpc.default.id
  ...
}

Fix: derive the VPC from the selected subnet instead:

data "aws_subnet" "selected" {
  id = local.subnet_id
}

resource "aws_security_group" "node" {
  vpc_id = data.aws_subnet.selected.vpc_id
  ...
}

2. s3_access_key_id embedded in user_data instead of SSM (infra/aws/main.tf, infra/aws/user-data.sh.tftpl)

s3_secret_access_key is correctly stored in SSM SecureString and fetched at boot. However s3_access_key_id (also marked sensitive = true in variables.tf) is passed as a Terraform template variable directly into user-data and written to .env without going through SSM:

# user-data.sh.tftpl — s3_access_key_id is a template substitution, not an SSM fetch
S3_ACCESS_KEY_ID=${s3_access_key_id}

This means the access key ID is:

  • Embedded in the EC2 user-data (visible in the AWS console to anyone with ec2:DescribeInstanceAttribute)
  • Stored in Terraform state in plaintext (in the rendered user_data string)

Fix: Create an SSM parameter for s3_access_key_id mirroring the s3_secret pattern and fetch it at boot time, or at minimum document that the IAM user for S3 should be scoped to a single bucket with no other privileges.


3. user_data changes will force-replace the instance (infra/aws/main.tf)

ignore_changes = [ami] is set on aws_instance.node, but user_data is not ignored. Any change to a variable that flows into the rendered user-data (e.g., bootstrap_peers, public_url, chain_rpc_url, or any optional integration key) will trigger a full instance replacement on the next terraform apply. Since user-data only runs at first boot this is likely unintentional and dangerous.

lifecycle {
  ignore_changes = [ami]
  # user_data changes only take effect on first boot;
  # add user_data here or document explicitly that tfvars changes
  # require a deliberate replace, not a routine apply
}

Fix: Either add user_data to ignore_changes (and document how to re-bootstrap if needed), or add a prominent warning in the README that changing any terraform.tfvars variable that feeds into user-data will replace the instance.


🟡 Medium

4. fly.toml app name is hardcoded to gitlawb-node-test (infra/fly/fly.toml)

app = "gitlawb-node-test"

This looks like a test configuration. Anyone running fly deploy -c infra/fly/fly.toml will deploy to gitlawb-node-test. Consider renaming to gitlawb-node or adding a comment that this must be changed before use.


5. DLM snapshot policy tag is too broad (infra/aws/main.tf)

target_tags = { Snapshot = "true" }

This will snapshot any EBS volume in the account with this tag, not just the one belonging to this stack. In a shared AWS account this could cause unexpected snapshots of unrelated volumes.

Fix: Use a more specific tag combination, for example:

target_tags = {
  Snapshot   = "true"
  ManagedBy  = "terraform"
  Project    = "gitlawb-node"
}

🟢 Low / Nit

6. docker compose up -d in user-data is missing --remove-orphans

The upgrade SSM document uses --remove-orphans but the first-boot user-data does not:

# user-data.sh.tftpl
docker compose up -d   # missing --remove-orphans

# upgrade SSM doc (main.tf)
docker compose pull && docker compose up -d --remove-orphans  # consistent

Minor inconsistency, but harmless on first boot. Consider adding for consistency.


7. No healthcheck on the node service (infra/aws/compose.yaml.tftpl)

The postgres service has a proper healthcheck. The node service has none. This means docker compose ps won't show a meaningful health status and the depends_on: condition: service_healthy pattern can't be used for anything depending on node. Consider adding a simple HTTP healthcheck:

healthcheck:
  test: ["CMD-SHELL", "curl -sf http://localhost:${gitlawb_port}/health || exit 1"]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 30s

✅ Positive highlights

  • SSM SecureString for all secrets with least-privilege IAM policy — well done
  • IMDSv2 enforcement (http_tokens = "required")
  • prevent_destroy on the EBS data volume
  • set +x disabling xtrace around the secret fetch and .env write
  • blkid guard before formatting the data volume (preserves data on reattach)
  • EBS volume retry loop handling Nitro NVMe attachment lag
  • Lock file committed for reproducible provider versions
  • api_url output correctly honouring the public_url override
  • Well-structured, readable documentation across all README files
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

- security group now derives its VPC from the selected subnet, so
  subnet_id overrides into non-default VPCs work
- s3_access_key_id moves to SSM SecureString (same path as the secret
  key) instead of being embedded in user-data and state
- ignore user_data drift on the instance: it only runs at first boot,
  so re-rendering caused pointless stop/starts; README documents the
  -replace workflow for config changes
- DLM snapshot targeting is stack-specific (Snapshot + Name tags), not
  any Snapshot=true volume in the account
- comment in fly.toml that it targets the shared test instance
- user-data compose up gains --remove-orphans (consistent with the
  upgrade SSM document)
- node service gets an HTTP /health healthcheck in compose

Co-Authored-By: OpenClaude <[email protected]>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
infra/aws/main.tf (1)

75-81: 💤 Low value

Consider supporting a customer-managed KMS key (optional).

The static analysis hint flags that this SSM parameter uses the AWS-managed aws/ssm key rather than a customer-managed CMK. For production deployments requiring stricter key rotation policies or cross-account audit trails, adding an optional kms_key_id variable (defaulting to null for AWS-managed) would provide flexibility:

key_id = var.ssm_kms_key_id  # null → AWS-managed default

This is acceptable as-is for most use cases, especially given the explicit acknowledgment at lines 94-96.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@infra/aws/main.tf` around lines 75 - 81, The aws_ssm_parameter resource
aws_ssm_parameter.s3_access_key currently stores a SecureString using the
AWS-managed key; add optional support for a customer-managed KMS key by
introducing a new variable (e.g., var.ssm_kms_key_id with default null) and set
the resource's key_id attribute to that variable (key_id = var.ssm_kms_key_id)
so null continues to use AWS-managed aws/ssm while allowing a CMK when provided;
update any docs/variables.tf accordingly and keep the existing
count/name/type/value/tags logic in aws_ssm_parameter.s3_access_key.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@infra/aws/main.tf`:
- Around line 75-81: The aws_ssm_parameter resource
aws_ssm_parameter.s3_access_key currently stores a SecureString using the
AWS-managed key; add optional support for a customer-managed KMS key by
introducing a new variable (e.g., var.ssm_kms_key_id with default null) and set
the resource's key_id attribute to that variable (key_id = var.ssm_kms_key_id)
so null continues to use AWS-managed aws/ssm while allowing a CMK when provided;
update any docs/variables.tf accordingly and keep the existing
count/name/type/value/tags logic in aws_ssm_parameter.s3_access_key.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 87a7be83-4f60-46cc-94bb-d03978ff2f00

📥 Commits

Reviewing files that changed from the base of the PR and between b7c602d and 81dbc68.

📒 Files selected for processing (6)
  • infra/aws/README.md
  • infra/aws/compose.yaml.tftpl
  • infra/aws/main.tf
  • infra/aws/terraform.tfvars.example
  • infra/aws/user-data.sh.tftpl
  • infra/fly/fly.toml
✅ Files skipped from review due to trivial changes (3)
  • infra/fly/fly.toml
  • infra/aws/terraform.tfvars.example
  • infra/aws/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • infra/aws/compose.yaml.tftpl
  • infra/aws/user-data.sh.tftpl

Add ssm_kms_key_id (default null = AWS-managed aws/ssm key). When set,
all five SecureString parameters are encrypted with the CMK and the
instance role gains kms:Decrypt scoped to that key — without it the
boot-time secret fetch would fail.

Co-Authored-By: OpenClaude <[email protected]>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@infra/aws/main.tf`:
- Around line 134-145: The KMS Decrypt statement currently grants broad access
whenever var.ssm_kms_key_id != null; update the IAM statement that builds
Statement (the block referencing local.secret_param_arns and
data.aws_kms_key.ssm[0].arn) to add a Condition restricting kms:Decrypt to SSM
usage by including kms:ViaService = "ssm.<region>.amazonaws.com" (use the
module/stack region value) and an EncryptionContext condition using the
PARAMETER_ARN key (kms:EncryptionContext:PARAMETER_ARN) limited to
local.secret_param_arns so decrypt can only be performed for the SSM Parameter
ARNs you manage rather than the entire CMK.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b68064eb-b00c-4b6d-8dc2-86878ea5e6cf

📥 Commits

Reviewing files that changed from the base of the PR and between 81dbc68 and a759c01.

📒 Files selected for processing (4)
  • infra/aws/README.md
  • infra/aws/main.tf
  • infra/aws/terraform.tfvars.example
  • infra/aws/variables.tf
✅ Files skipped from review due to trivial changes (1)
  • infra/aws/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • infra/aws/terraform.tfvars.example
  • infra/aws/variables.tf

Comment thread infra/aws/main.tf
The grant on the customer-managed key was unconditioned, allowing the
instance role to decrypt any ciphertext under the same CMK. Restrict it
with kms:ViaService = ssm.<region>.amazonaws.com and
kms:EncryptionContext:PARAMETER_ARN limited to the stack's parameter
ARNs.

Co-Authored-By: OpenClaude <[email protected]>
@kevincodex1 kevincodex1 merged commit 9e6e832 into main Jun 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant