Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ AWS cost tooling is powerful, but day-to-day cost visibility can still feel frag
- AWS account onboarding through a standardized `AssumeRole` flow
- Persisted reporting model backed by PostgreSQL rather than live Cost Explorer requests on every view
- Budget alerts and notification workflows backed by worker processes
- EventBridge-scheduled Lambda execution for recurring verified-account cost sync
- SES-backed auth and alert email delivery
- Terraform-managed infrastructure for DNS, SES, CI/CD bootstrap, ECS, RDS, S3, and CloudFront

Expand Down Expand Up @@ -189,6 +190,7 @@ npm test
- the current production pattern uses:
- `underflow.<domain>` for the web frontend
- `api.underflow.<domain>` for the API
- scheduled cost sync runs through EventBridge + Lambda, while the ECS worker stays focused on alert evaluation
- Terraform can provision:
- Route 53 hosted zone, SES identity, DKIM, MAIL FROM, and DMARC records
- bootstrap CI/CD infrastructure such as Terraform remote state and GitHub OIDC
Expand Down
15 changes: 15 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,25 @@ Infrastructure code provisions the production AWS footprint and supporting integ
- A scheduled Lambda can run cost sync across all verified AWS accounts on a fixed interval
- Reporting endpoints expose summary, by-service, timeseries, and sync history views
- The frontend presents this through workspace-scoped dashboards and detail pages
- Manual syncs and scheduled syncs share the same persistence path and use advisory locks to avoid duplicate per-account work

### Alerts and notifications

- Alert rules are attached to a workspace, optionally scoped to a specific AWS account
- The ECS worker evaluates active alerts on a schedule
- Notification delivery and status are persisted and surfaced in the frontend feed

## Runtime Ownership

- ECS API
- handles browser/app HTTP traffic
- owns auth, workspace management, AWS account onboarding, reporting APIs, and manual sync triggers
- ECS worker
- handles scheduled alert evaluation and related background work
- Lambda + EventBridge
- handles recurring verified-account cost sync every 6 hours
- writes CloudWatch invocation logs and DB-backed sync history through existing `cost_sync_runs`

## Email / SES Integration Boundary

Email is treated as a real integration boundary rather than a mocked afterthought.
Expand All @@ -93,6 +105,9 @@ Email is treated as a real integration boundary rather than a mocked afterthough
- Background processing is intentionally split by responsibility:
- Lambda handles scheduled cost sync
- ECS worker handles alert evaluation
- Runtime configuration is intentionally split as well:
- shared DB/AWS/logging config is used by API, worker, and Lambda
- auth/cookie-specific config is validated only in the API runtime
- Some cloud integrations are fully wired but still benefit from live-account validation before they should be considered fully hardened

## What A Reviewer Should Notice
Expand Down
52 changes: 52 additions & 0 deletions docs/production-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ npm install
npm run build:lambda
```

The Lambda artifact is built from the API codebase and must exist before Terraform can package it locally.

### 4. Run the first production apply

The deploy workflow is designed to own the ongoing rollout, but it is still useful to understand the shape:
Expand Down Expand Up @@ -155,6 +157,7 @@ Add these as `production` environment secrets in GitHub:
- applies Terraform with the image URI
- runs migrations through ECS
- waits for API and worker services to stabilize
- updates the scheduled sync Lambda code package and handler configuration when Lambda-related changes are present

### `deploy-web.yml`

Expand Down Expand Up @@ -187,6 +190,7 @@ For the split-domain setup, prefer leaving `AUTH_COOKIE_DOMAIN` empty so auth co
- writes visible execution history through existing `cost_sync_runs` rows
- emits invocation-level logs to CloudWatch
- syncs all verified AWS accounts while relying on advisory locks to avoid duplicate per-account work
- validates only the shared runtime env needed for DB/AWS/logging rather than API-only auth/cookie config

### Web

Expand All @@ -213,5 +217,53 @@ Run these checks immediately after the first deployment:
## Rollback Guidance

- roll back API/worker by redeploying the previous image tag
- roll back the scheduled sync Lambda by applying the previous Terraform/code revision if the issue is limited to recurring sync
- redeploy the previous web build if the issue is frontend-only
- if a migration introduced the problem, stop and restore from backup rather than improvising production SQL

## Lambda Troubleshooting

If the scheduled sync Lambda fails in production:

1. Inspect the Lambda invocation response with tail logs:

```bash
aws lambda invoke \
--region us-west-2 \
--function-name underflow-prod-scheduled-cost-sync \
--log-type Tail \
response.json \
--query 'LogResult' \
--output text | base64 --decode

cat response.json
```

2. Check the configured handler:

```bash
aws lambda get-function-configuration \
--region us-west-2 \
--function-name underflow-prod-scheduled-cost-sync \
--query 'Handler' \
--output text
```

Expected handler:

```text
dist/jobs/scheduled-cost-sync-handler.handler
```

3. If local `terraform apply` is being used, rebuild the Lambda artifact first:

```powershell
cd apps\api
npm run build:lambda
```

4. If the Lambda still fails before structured app logs appear, look for bootstrap errors such as:

- `Runtime.HandlerNotFound`
- missing module/package errors
- missing required runtime environment variables
27 changes: 27 additions & 0 deletions docs/production-operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ This document is the minimum viable runbook for operating Underflow with a first
- scheduled cost sync Lambda
- runs periodic verified-account sync every 6 hours
- writes invocation logs to CloudWatch and sync history through existing DB tables
- uses shared runtime DB/AWS/logging config rather than the API-only auth/cookie config
- `apps/web`
- static frontend served separately from the API
- PostgreSQL
Expand Down Expand Up @@ -55,6 +56,13 @@ Production defaults and expectations:
6. Deploy the frontend.
7. Verify health and a basic authenticated page load.

If deploying locally through Terraform instead of GitHub Actions, rebuild the Lambda artifact before `plan` or `apply`:

```powershell
cd apps\api
npm run build:lambda
```

### Roll back

1. Roll back the API and worker to the last known good image/build.
Expand Down Expand Up @@ -87,6 +95,8 @@ Recommended counters to track from logs:

- successful sync count
- failed sync count
- scheduled Lambda invocation count
- scheduled Lambda failure count
- successful alert delivery count
- failed alert delivery count
- failed auth email delivery count
Expand All @@ -112,6 +122,23 @@ Check:
- the selected account/date filters in the UI are correct
- sync history does not show Cost Explorer permission or data-availability errors

### Scheduled sync Lambda fails before app logs appear

Check:

- the deployed Lambda handler matches:
- `dist/jobs/scheduled-cost-sync-handler.handler`
- the latest Lambda artifact was rebuilt before Terraform applied it
- the Lambda invoke response includes tail logs from:
- `aws lambda invoke --log-type Tail ...`
- required shared runtime env is present:
- `DATABASE_URL`
- `DATABASE_SSL_ENABLED`
- `DATABASE_SSL_REJECT_UNAUTHORIZED`
- `AWS_SES_REGION`
- `COST_SYNC_LOOKBACK_DAYS`
- `LOG_LEVEL`

### SES / email failures

Check:
Expand Down
Loading