Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion deploy/edge/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# Secrets and generated artifacts — never commit
# Secrets and generated artifacts — never commit. Live only on the box under
# deploy/edge/secrets/ (legacy) and /srv/trakrf/secrets/ (runtime).
secrets/.env
secrets/cloudflared.env
secrets/mosquitto/passwd
secrets/traefik/certs/
secrets/traefik/lego/
# legacy paths (pre-TRA-988), keep ignored until the box is migrated + cleaned
.env
cloudflared.env
mosquitto/passwd
Expand Down
123 changes: 94 additions & 29 deletions deploy/edge/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,76 +2,139 @@

Rootless Podman quadlets for the offline demo box (`trakrf-demo`). Hosts the
backend + Timescale + Mosquitto + a Traefik TLS edge, all systemd-managed.
Design spec: `docs/superpowers/specs/2026-06-07-deploy-edge-design.md`.
Design specs: `docs/superpowers/specs/2026-06-07-deploy-edge-design.md`,
`docs/superpowers/specs/2026-06-13-srv-trakrf-runtime-layout-design.md` (TRA-988).

Tim drives the demo from his laptop at **`https://app.demo.trakrf.id`** over the
Slate WiFi. Break-glass = a shell over the tailnet (`systemctl --user`,
`journalctl --user -u <svc>`, `podman`).

## Layout
## Runtime layout — `/srv/trakrf` (TRA-988)

The running stack reads **only** from `/srv/trakrf`, never from this git working
tree. The repo is the source of truth; `install.sh` deploys it. This means a
branch switch / pull / `reset` on the checkout can never pull live config out
from under the services.

```
/srv/trakrf/
quadlets/ *.container, trakrf.network → symlinked into ~/.config/containers/systemd/
config/ mosquitto/mosquitto.conf, traefik/{traefik,dynamic}.yaml
scripts/ trakrf-backup.sh
systemd/ trakrf-backup.{service,timer} → symlinked into ~/.config/systemd/user/
secrets/ .env cloudflared.env mosquitto/passwd traefik/certs/ (chmod 600; hand-placed; never in git)
backups/ trakrf-YYYYMMDD-HHMMSS.sql.gz (daily pg_dump)
```

TimescaleDB data stays on the Podman named volume `timescale_data`.

## Repo layout (source of truth)

| Path | What |
|---|---|
| `quadlets/*.container`, `*.network` | the 5 services + user network (symlinked into `~/.config/containers/systemd/`) |
| `install.sh` | symlink quadlets + `systemctl --user daemon-reload` |
| `quadlets/*.container`, `*.network` | the 5 services + user network (bind-mount from `/srv/trakrf`) |
| `config/mosquitto/mosquitto.conf` | broker config (plain `:1883`, basic auth) |
| `config/traefik/{traefik,dynamic}.yaml` | edge static + dynamic config |
| `scripts/trakrf-backup.sh` | `pg_dump` → `/srv/trakrf/backups` |
| `systemd/trakrf-backup.{service,timer}` | daily backup user timer |
| `install.sh` | deploy `config/`+`quadlets/`+`scripts/`+`systemd/` → `/srv/trakrf`, link + reload units, enable backup timer |
| `db-init.sh` | one-time DB bootstrap (trakrf schema, search_path, obfuscation key) |
| `mosquitto/mosquitto.conf` | broker config (plain `:1883`, basic auth) |
| `traefik/traefik.yaml`, `dynamic.yaml` | edge static + dynamic config |
| `smoke-test.sh` | broker→subscriber→ingest proof |
| `.env.example` | template; copy to `.env` (gitignored) and fill |
| `secrets/*.example` | templates; real secrets live only on the box under `/srv/trakrf/secrets/` |

## First-time bring-up (fresh box)

```bash
# 1. Host prereqs (one time)
sudo apt-get install -y podman mosquitto-clients
sudo apt-get install -y podman mosquitto-clients rsync
loginctl enable-linger "$USER"
echo 'net.ipv4.ip_unprivileged_port_start=443' | sudo tee /etc/sysctl.d/99-trakrf-rootless-ports.conf
sudo sysctl --system # lets rootless Traefik bind :443

# 2. Secrets -> .env
cp deploy/edge/.env.example deploy/edge/.env
# 2. Runtime root (one time)
sudo mkdir -p /srv/trakrf && sudo chown "$(id -un):$(id -gn)" /srv/trakrf
mkdir -p /srv/trakrf/secrets/mosquitto /srv/trakrf/secrets/traefik/certs

# 3. Secrets -> /srv/trakrf/secrets/.env (runtime reads here)
cp deploy/edge/secrets/.env.example /srv/trakrf/secrets/.env
PGPW=$(openssl rand -hex 16); MQPW=$(openssl rand -hex 12)
sed -i "s|POSTGRES_PASSWORD=CHANGEME|POSTGRES_PASSWORD=$PGPW|;s|postgres://postgres:CHANGEME@|postgres://postgres:$PGPW@|" deploy/edge/.env
sed -i "s|mqtt://trakrf-mqtt:CHANGEME@|mqtt://trakrf-mqtt:$MQPW@|" deploy/edge/.env
sed -i "s|JWT_SECRET=CHANGEME|JWT_SECRET=$(openssl rand -hex 32)|" deploy/edge/.env
sed -i "s|OBFUSCATION_KEY=CHANGEME|OBFUSCATION_KEY=$(openssl rand -hex 32)|" deploy/edge/.env
sed -i "s|POSTGRES_PASSWORD=CHANGEME|POSTGRES_PASSWORD=$PGPW|;s|postgres://postgres:CHANGEME@|postgres://postgres:$PGPW@|" /srv/trakrf/secrets/.env
sed -i "s|mqtt://trakrf-mqtt:CHANGEME@|mqtt://trakrf-mqtt:$MQPW@|" /srv/trakrf/secrets/.env
sed -i "s|JWT_SECRET=CHANGEME|JWT_SECRET=$(openssl rand -hex 32)|" /srv/trakrf/secrets/.env
sed -i "s|OBFUSCATION_KEY=CHANGEME|OBFUSCATION_KEY=$(openssl rand -hex 32)|" /srv/trakrf/secrets/.env
# broker passwd (hashed) — same value as MQTT_URL above
touch deploy/edge/mosquitto/passwd
podman run --rm -v "$PWD/deploy/edge/mosquitto/passwd:/passwd:Z" \
touch /srv/trakrf/secrets/mosquitto/passwd
podman run --rm -v "/srv/trakrf/secrets/mosquitto/passwd:/passwd:Z" \
--entrypoint mosquitto_passwd docker.io/library/eclipse-mosquitto:2.0.21 -b /passwd trakrf-mqtt "$MQPW"
chmod 600 deploy/edge/mosquitto/passwd
chmod 600 /srv/trakrf/secrets/mosquitto/passwd
# rootless: hand the file to the container's mosquitto uid (1883) so the broker can read
# it at 0600 (mosquitto runs as 1883, not container-root). Re-run after any passwd change.
podman unshare chown 1883:1883 deploy/edge/mosquitto/passwd
podman unshare chown 1883:1883 /srv/trakrf/secrets/mosquitto/passwd

# 3. Install quadlets + start (Timescale must be up before db-init/migrate)
deploy/edge/install.sh
# 4. Deploy + start (Timescale must be up before db-init/migrate)
deploy/edge/install.sh # config+quadlets+scripts+systemd -> /srv/trakrf; links units; enables backup timer
systemctl --user start timescaledb.service
deploy/edge/db-init.sh # schema + search_path + obfuscation key
systemctl --user start traefik.service # pulls up migrate -> backend via deps
systemctl --user enable --now podman-auto-update.timer

# 4. Verify
# 5. Verify
curl -fsS http://127.0.0.1:8080/health
deploy/edge/smoke-test.sh
```

On a box whose volume is already initialized, a reboot self-starts everything
(linger + `Restart=always` + `[Install] WantedBy=default.target`).

## Migrating an existing box (deploy/edge bind-mounts → /srv/trakrf)

Reversible, no hardware needed; the old `deploy/edge` working-tree config stays
in place as instant rollback until verified.

```bash
sudo mkdir -p /srv/trakrf && sudo chown "$(id -un):$(id -gn)" /srv/trakrf
# seed secrets from the live box, by hand (install.sh never touches secrets/)
mkdir -p /srv/trakrf/secrets/mosquitto /srv/trakrf/secrets/traefik
cp deploy/edge/.env /srv/trakrf/secrets/.env
cp deploy/edge/cloudflared.env /srv/trakrf/secrets/cloudflared.env
cp deploy/edge/mosquitto/passwd /srv/trakrf/secrets/mosquitto/passwd
cp -a deploy/edge/traefik/certs /srv/trakrf/secrets/traefik/certs
chmod -R go-rwx /srv/trakrf/secrets

deploy/edge/install.sh # repoints quadlet symlinks at /srv/trakrf
systemctl --user start trakrf-backup.service # smoke-test one dump
# restart one at a time, verifying each:
for u in timescaledb mosquitto backend traefik cloudflared; do
systemctl --user restart "$u".service; sleep 5
podman ps --format '{{.Names}} {{.Status}}' | grep "$u"
done
```

**Rollback** a single service to the old paths: point its symlink back and reload —
```bash
ln -sf "$PWD/deploy/edge/quadlets/<unit>.container" ~/.config/containers/systemd/<unit>.container
systemctl --user daemon-reload && systemctl --user restart <unit>.service
```

## Backups

`trakrf-backup.timer` runs `scripts/trakrf-backup.sh` daily (04:00, `Persistent=true`)
→ `pg_dump | gzip` to `/srv/trakrf/backups/trakrf-<UTC>.sql.gz`, keeping the last
14. Run on demand: `systemctl --user start trakrf-backup.service`. Restore:
`gunzip -c /srv/trakrf/backups/<file>.sql.gz | podman exec -i timescaledb psql -U postgres -d postgres`.

## TLS cert (`app.demo.trakrf.id`)

Issued out-of-band via Let's Encrypt **Cloudflare DNS-01** (the box is offline at
the venue, so no runtime ACME). Scoped name, **not** the `*.trakrf.id` wildcard.

```bash
export CLOUDFLARE_DNS_API_TOKEN=<cloudflare DNS-edit token for trakrf.id>
podman run --rm -e CLOUDFLARE_DNS_API_TOKEN -v "$PWD/deploy/edge/traefik/lego:/.lego:Z" \
podman run --rm -e CLOUDFLARE_DNS_API_TOKEN -v "/srv/trakrf/secrets/traefik/lego:/.lego:Z" \
docker.io/goacme/lego:latest run --accept-tos --email [email protected] \
--dns cloudflare --domains app.demo.trakrf.id
cp deploy/edge/traefik/lego/certificates/app.demo.trakrf.id.{crt,key} deploy/edge/traefik/certs/
chmod 600 deploy/edge/traefik/certs/app.demo.trakrf.id.key
cp /srv/trakrf/secrets/traefik/lego/certificates/app.demo.trakrf.id.{crt,key} /srv/trakrf/secrets/traefik/certs/
chmod 600 /srv/trakrf/secrets/traefik/certs/app.demo.trakrf.id.key
systemctl --user restart traefik.service
```

Expand All @@ -83,7 +146,9 @@ Tracks the floating `ghcr.io/trakrf/backend:preview` tag via `AutoUpdate=registr
+ `podman-auto-update.timer`. Updates pull only when the box has uplink (prep /
between-demos on house WiFi) — **never during a demo** (box is offline). Migrate
runs before serve on every update (backend `Requires=migrate`). Stay hands-off on
`preview` during demo windows. *Next iteration:* a `demo` tag that defaults to
`preview` during demo windows. **Caution:** `preview` is a moving tag — changes
merged while the box is offline (e.g. shipping) land on next uplink. Pinning to a
stable tag is a tracked follow-up. *Next iteration:* a `demo` tag that defaults to
tracking `prod`, with manual `preview → demo` promotion.

## Network (Slate-side, separate from this box)
Expand All @@ -101,9 +166,9 @@ tracking `prod`, with manual `preview → demo` promotion.

## Known / follow-ups

- The simulated-MQTT smoke test proves broker→subscriber→ingest. Full
`asset_scans` derivation + geofence **fire** need a registered
scan_device/scan_point + output device — provisioned by **real CS463/Shelly
onboarding** (or a demo-data fixture). Validate that path with hardware.
- Unclean-shutdown resilience (rootless port-forward wedge after a hard power
loss) + clean-shutdown via power button — developed/plug-pull-tested on a home
box, separate tickets.
- Pin off the floating `:preview` tag (see Updates) — separate ticket.
- gnome-kiosk deprioritized (laptop-driven demos); reopen for a trade-show booth.
- Prometheus/Grafana = TRA-908 fast-follow (+2 quadlets).
8 changes: 4 additions & 4 deletions deploy/edge/db-init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
# These settings persist in the Postgres catalog (survive restarts); only a fresh
# volume needs a re-run.
set -euo pipefail
cd "$(dirname "$0")"
[ -f .env ] || { echo "deploy/edge/.env missing (cp .env.example .env and fill it)"; exit 1; }
KEY=$(grep -oP '^OBFUSCATION_KEY=\K.*' .env || true)
[ -n "${KEY:-}" ] && [ "$KEY" != CHANGEME ] || { echo "OBFUSCATION_KEY not set in .env"; exit 1; }
ENV_FILE=/srv/trakrf/secrets/.env
[ -f "$ENV_FILE" ] || { echo "$ENV_FILE missing (see deploy/edge/README.md bring-up)"; exit 1; }
KEY=$(grep -oP '^OBFUSCATION_KEY=\K.*' "$ENV_FILE" || true)
[ -n "${KEY:-}" ] && [ "$KEY" != CHANGEME ] || { echo "OBFUSCATION_KEY not set in $ENV_FILE"; exit 1; }
podman exec -i timescaledb psql -U postgres -d postgres -v ON_ERROR_STOP=1 <<SQL
CREATE SCHEMA IF NOT EXISTS trakrf;
ALTER DATABASE postgres SET search_path = trakrf, public;
Expand Down
43 changes: 36 additions & 7 deletions deploy/edge/install.sh
Original file line number Diff line number Diff line change
@@ -1,13 +1,42 @@
#!/usr/bin/env bash
# Symlink deploy/edge quadlets into the rootless systemd user dir and reload.
# Deploy deploy/edge -> /srv/trakrf and (re)link rootless systemd units. Idempotent.
# Secrets in /srv/trakrf/secrets are NEVER touched here; seed them by hand once.
set -euo pipefail
export XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/$(id -u)}"
SRC="$(cd "$(dirname "$0")/quadlets" && pwd)"
DEST="$HOME/.config/containers/systemd"
mkdir -p "$DEST"
for f in "$SRC"/*.container "$SRC"/*.network; do

ROOT=/srv/trakrf
SRC="$(cd "$(dirname "$0")" && pwd)" # deploy/edge
QUADLET_DIR="$HOME/.config/containers/systemd"
USER_UNIT_DIR="$HOME/.config/systemd/user"

[ -d "$ROOT" ] && [ -w "$ROOT" ] || {
echo "ERROR: $ROOT missing or not writable."
echo "Run once: sudo mkdir -p $ROOT && sudo chown $(id -un):$(id -gn) $ROOT"
exit 1
}

mkdir -p "$ROOT"/{quadlets,config,scripts,systemd,secrets,backups} "$QUADLET_DIR" "$USER_UNIT_DIR"

# 1. Sync repo -> /srv/trakrf (NEVER secrets/). Everything the runtime reads lives here,
# so the running box never depends on the git working tree.
rsync -a --delete "$SRC/config/" "$ROOT/config/"
rsync -a --delete "$SRC/quadlets/" "$ROOT/quadlets/"
rsync -a --delete "$SRC/scripts/" "$ROOT/scripts/"
rsync -a --delete "$SRC/systemd/" "$ROOT/systemd/"
chmod +x "$ROOT"/scripts/*.sh

# 2. Link quadlet units (Podman quadlet generator dir) -> /srv/trakrf
for f in "$ROOT"/quadlets/*.container "$ROOT"/quadlets/*.network; do
[ -e "$f" ] || continue
ln -sf "$f" "$DEST/$(basename "$f")"
ln -sf "$f" "$QUADLET_DIR/$(basename "$f")"
done

# 3. Link the backup timer (plain user units) -> /srv/trakrf, then enable
for u in "$ROOT"/systemd/trakrf-backup.service "$ROOT"/systemd/trakrf-backup.timer; do
ln -sf "$u" "$USER_UNIT_DIR/$(basename "$u")"
done

systemctl --user daemon-reload
echo "Linked quadlets:"; ls -l "$DEST"
systemctl --user enable --now trakrf-backup.timer

echo "Deployed to $ROOT; units linked + reloaded. Secrets (untouched): $ROOT/secrets"
2 changes: 1 addition & 1 deletion deploy/edge/quadlets/backend.container
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Requires=migrate.service
ContainerName=backend
Image=ghcr.io/trakrf/backend:preview
Network=trakrf.network
EnvironmentFile=%h/platform/deploy/edge/.env
EnvironmentFile=/srv/trakrf/secrets/.env
Exec=/server serve
PublishPort=127.0.0.1:8080:8080
AutoUpdate=registry
Expand Down
4 changes: 2 additions & 2 deletions deploy/edge/quadlets/cloudflared.container
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ Image=docker.io/cloudflare/cloudflared:latest
# Must share trakrf.network so the dashboard/Terraform ingress can target https://traefik:443 by name.
Network=trakrf.network
# Token only — kept out of the shared .env so DB/JWT secrets aren't exposed to this sidecar.
# Managed by trakrf/infra Terraform (TRA-957); drop the real value in deploy/edge/cloudflared.env.
EnvironmentFile=%h/platform/deploy/edge/cloudflared.env
# Managed by trakrf/infra Terraform (TRA-957); drop the real value in /srv/trakrf/secrets/cloudflared.env.
EnvironmentFile=/srv/trakrf/secrets/cloudflared.env
# Outbound-only: no PublishPort (the whole point — no inbound, NAT/double-NAT agnostic).
# cloudflared reads the tunnel token from $TUNNEL_TOKEN. Public hostname -> service ingress
# (app.demo.trakrf.id -> https://traefik:443) is configured in Cloudflare, not here.
Expand Down
2 changes: 1 addition & 1 deletion deploy/edge/quadlets/migrate.container
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Requires=timescaledb.service
ContainerName=trakrf-migrate
Image=ghcr.io/trakrf/backend:preview
Network=trakrf.network
EnvironmentFile=%h/platform/deploy/edge/.env
EnvironmentFile=/srv/trakrf/secrets/.env
Exec=/server migrate
AutoUpdate=registry

Expand Down
4 changes: 2 additions & 2 deletions deploy/edge/quadlets/mosquitto.container
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ Description=Mosquitto broker (demo box)
ContainerName=mosquitto
Image=docker.io/library/eclipse-mosquitto:2.0.21
Network=trakrf.network
Volume=%h/platform/deploy/edge/mosquitto/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro,Z
Volume=%h/platform/deploy/edge/mosquitto/passwd:/mosquitto/config/passwd:ro,Z
Volume=/srv/trakrf/config/mosquitto/mosquitto.conf:/mosquitto/config/mosquitto.conf:ro,Z
Volume=/srv/trakrf/secrets/mosquitto/passwd:/mosquitto/config/passwd:ro,Z
PublishPort=1883:1883

[Service]
Expand Down
2 changes: 1 addition & 1 deletion deploy/edge/quadlets/timescaledb.container
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ContainerName=timescaledb
Image=docker.io/timescale/timescaledb-ha:pg17.9-ts2.26.4
Network=trakrf.network
Volume=timescale_data:/home/postgres/pgdata/data
EnvironmentFile=%h/platform/deploy/edge/.env
EnvironmentFile=/srv/trakrf/secrets/.env
PublishPort=127.0.0.1:5432:5432
HealthCmd=pg_isready -U postgres
HealthInterval=10s
Expand Down
6 changes: 3 additions & 3 deletions deploy/edge/quadlets/traefik.container
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ Requires=backend.service
ContainerName=traefik
Image=docker.io/library/traefik:v3.3
Network=trakrf.network
Volume=%h/platform/deploy/edge/traefik/traefik.yaml:/etc/traefik/traefik.yaml:ro,Z
Volume=%h/platform/deploy/edge/traefik/dynamic.yaml:/etc/traefik/dynamic.yaml:ro,Z
Volume=%h/platform/deploy/edge/traefik/certs:/certs:ro,Z
Volume=/srv/trakrf/config/traefik/traefik.yaml:/etc/traefik/traefik.yaml:ro,Z
Volume=/srv/trakrf/config/traefik/dynamic.yaml:/etc/traefik/dynamic.yaml:ro,Z
Volume=/srv/trakrf/secrets/traefik/certs:/certs:ro,Z
PublishPort=443:443

[Service]
Expand Down
17 changes: 17 additions & 0 deletions deploy/edge/scripts/trakrf-backup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/usr/bin/env bash
# Logical pg_dump of the demo DB -> /srv/trakrf/backups, keeping the last $KEEP.
# pg_dump is consistent for a live DB and is independent of where PGDATA lives,
# so the database can stay on its Podman named volume.
set -euo pipefail
export XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/$(id -u)}" # rootless podman socket

OUT=/srv/trakrf/backups
KEEP=14
ts=$(date -u +%Y%m%d-%H%M%S)
mkdir -p "$OUT"

podman exec timescaledb pg_dump -U postgres -d postgres | gzip > "$OUT/trakrf-$ts.sql.gz"

# prune oldest beyond KEEP
ls -1t "$OUT"/trakrf-*.sql.gz 2>/dev/null | tail -n +$((KEEP + 1)) | xargs -r rm -f
echo "backup: $OUT/trakrf-$ts.sql.gz"
File renamed without changes.
2 changes: 1 addition & 1 deletion deploy/edge/smoke-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cd "$(dirname "$0")/../.." # repo root
export XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/$(id -u)}"
EPC=${EPC:-E2E0000000000000BB000001}
CAP=${CAP:-door-1}
MQPW=$(grep -oP 'trakrf-mqtt:\K[^@]+' deploy/edge/.env)
MQPW=$(grep -oP 'trakrf-mqtt:\K[^@]+' /srv/trakrf/secrets/.env)

echo "1) DB bootstrap + seed (idempotent)"
deploy/edge/db-init.sh
Expand Down
8 changes: 8 additions & 0 deletions deploy/edge/systemd/trakrf-backup.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[Unit]
Description=TrakRF demo DB backup (pg_dump -> /srv/trakrf/backups)
After=timescaledb.service
Wants=timescaledb.service

[Service]
Type=oneshot
ExecStart=/srv/trakrf/scripts/trakrf-backup.sh
9 changes: 9 additions & 0 deletions deploy/edge/systemd/trakrf-backup.timer
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[Unit]
Description=Daily TrakRF demo DB backup

[Timer]
OnCalendar=*-*-* 04:00:00
Persistent=true

[Install]
WantedBy=timers.target
Loading