Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
ad4c3d6
Bump EdgeOps console embed version to v1.0.4.
emirhandurmus Jun 30, 2026
b680933
Add unified transaction runner with SQLite priority queue and DB metr…
emirhandurmus Jun 30, 2026
fb6926e
Fix greenfield migrations for cross-database compatibility.
emirhandurmus Jun 30, 2026
8f9212e
Add ReconcileOutbox transactional outbox with background drainer.
emirhandurmus Jun 30, 2026
3993a4d
Add transaction safety settings and derive component labels from dist…
emirhandurmus Jun 30, 2026
e3b113a
Route data managers and OIDC adapter through the transaction runner.
emirhandurmus Jun 30, 2026
9895b1e
Split fog, service, and NATS reconcile into phased transactions.
emirhandurmus Jun 30, 2026
81a2bcd
Fix SQLite deadlocks in auth, WebSocket sessions, and background jobs.
emirhandurmus Jun 30, 2026
1957772
Document unified database transaction model and update release notes.
emirhandurmus Jun 30, 2026
a8f8dfb
Add transaction safety tests, integration gate, and load probe scripts.
emirhandurmus Jun 30, 2026
9dd3ae4
bump console ui version 1.0.5
emirhandurmus Jun 30, 2026
6207c57
fix (text): tx grep test fixed
emirhandurmus Jun 30, 2026
1b5f9ba
Fix semver versionRegex escaping for JSON Schema validation.
emirhandurmus Jul 2, 2026
e96613d
Fix fog platform reconcile when upstreamRouters is omitted from spec.
emirhandurmus Jul 2, 2026
a4a4bdd
Harden exec and log WebSocket sessions for multi-replica HA relay.
emirhandurmus Jul 2, 2026
8d7b706
Document WebSocket HA relay session fixes in the unreleased changelog.
emirhandurmus Jul 2, 2026
153fc6e
Run NATS auth reissue in post-commit background transactions.
emirhandurmus Jul 3, 2026
7abb8e4
Rebuild NATS resolver bundles from fresh auth state after reissue.
emirhandurmus Jul 3, 2026
91f36f8
Propagate fog upstream router and NATS endpoint changes to downstream…
emirhandurmus Jul 3, 2026
415bb2d
Harden multi-replica WebSocket exec and log orphan session cleanup.
emirhandurmus Jul 3, 2026
421d8d3
Recreate Kubernetes services when update races a missing resource.
emirhandurmus Jul 3, 2026
3ab6420
Bump embedded EdgeOps Console default version to v1.0.6.
emirhandurmus Jul 3, 2026
92065ea
Document reconcile correctness and WebSocket orphan fixes in CHANGELOG.
emirhandurmus Jul 3, 2026
49a089d
Raise WebSocket exec and log concurrency quota to five per resource.
emirhandurmus Jul 3, 2026
b993c4a
Send RFC 6455 Ping frames on exec and log WebSocket sessions.
emirhandurmus Jul 3, 2026
5194ebb
Wait for exec session teardown in cross-replica split test.
emirhandurmus Jul 3, 2026
78126f2
Add read-only network topology API for router and NATS graphs.
emirhandurmus Jul 3, 2026
85d0fab
Stop passing false as the cluster controller service transaction argu…
emirhandurmus Jul 3, 2026
cf702af
Bump embedded EdgeOps Console default version to v1.0.7.
emirhandurmus Jul 3, 2026
b8d9955
Filter inactive cluster controllers from list API by default.
emirhandurmus Jul 3, 2026
0ed285e
Bump embedded EdgeOps Console default version to v1.0.8.
emirhandurmus Jul 3, 2026
7ed0b2c
Add skopeo-based script to verify Dockerfile base image digest pins.
emirhandurmus Jul 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ NODE_ENV=development

# EdgeOps Console static embed (npm run build:console → dev/console/build)
EDGEOPS_CONSOLE_PATH=dev/console/build # must be absolute path
EDGEOPS_CONSOLE_VERSION=v1.0.3
EDGEOPS_CONSOLE_VERSION=v1.0.8
# EDGEOPS_CONSOLE_REPO=https://github.com/Datasance/edgeops-console
# EDGEOPS_CONSOLE_FLAVOR=datasance

Expand Down
2 changes: 1 addition & 1 deletion .github/actions/set-build-env/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ runs:
shell: bash
run: |
VERSION="${{ env.EDGEOPS_CONSOLE_VERSION }}"
if [ -z "$VERSION" ]; then VERSION="1.0.3"; fi
if [ -z "$VERSION" ]; then VERSION="1.0.8"; fi
echo "EDGEOPS_CONSOLE_VERSION=$VERSION" >> "${GITHUB_ENV}"

REPO="${{ env.EDGEOPS_CONSOLE_REPO }}"
Expand Down
44 changes: 40 additions & 4 deletions CHANGELOG.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
FROM node:24-bookworm@sha256:fdddfb3e688158251943d52eba361de991548f6814007acba4917ae6b512d6be AS console-builder

ARG EDGEOPS_CONSOLE_REPO=https://github.com/Datasance/edgeops-console
ARG EDGEOPS_CONSOLE_VERSION=v1.0.3
ARG EDGEOPS_CONSOLE_VERSION=v1.0.8
ARG EDGEOPS_CONSOLE_FLAVOR=datasance

RUN apt-get update \
Expand Down Expand Up @@ -48,9 +48,9 @@ RUN npm pack


# ubi9/nodejs-24-minimal:latest — pin manifest list digest for reproducible multi-arch builds
FROM registry.access.redhat.com/ubi9/nodejs-24-minimal@sha256:cc7648f8e1c7d628e4334328a712f30ea0820787bb92836cc93e349674c689bf
FROM registry.access.redhat.com/ubi9/nodejs-24-minimal@sha256:5f1ac8eab93c93eb2227f4ee7822668b312ee292d122dddd580bee8f17359c2f

ARG EDGEOPS_CONSOLE_VERSION=v1.0.3
ARG EDGEOPS_CONSOLE_VERSION=v1.0.8
ARG IMAGE_REGISTRY
ARG OCI_SOURCE_REPO
ARG CONTROLLER_DISTRIBUTION=iofog
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Local Docker build — mirrors CI/release build-args (see .github/actions/set-build-env).
# Override any variable: make build FLAVOR=iofog EDGEOPS_CONSOLE_VERSION=v1.0.3
# Override any variable: make build FLAVOR=iofog EDGEOPS_CONSOLE_VERSION=v1.0.8

FLAVOR ?= datasance
IMAGE_NAME ?= controller
Expand All @@ -25,7 +25,7 @@ else
$(error FLAVOR must be "datasance" or "iofog", got "$(FLAVOR)")
endif

EDGEOPS_CONSOLE_VERSION ?= v1.0.3
EDGEOPS_CONSOLE_VERSION ?= v1.0.8

IMAGE_REF = $(IMAGE_REGISTRY)/$(IMAGE_NAME):$(DOCKER_TAG)

Expand Down
68 changes: 60 additions & 8 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ Full spec: [`.cursor/controllerv3.8/docs/15-fog-platform-reconcile.md`](../.curs

## WebSocket exec & log sessions

Interactive **exec** and **log streaming** use paired WebSocket sessions between operators (Bearer JWT), Controller, and Edgelet agents (fog token). Plan 16 hardens log sessions and shared WS infra (HA, drain, OTEL). **Plan 17** redesigns **microservice exec** to log-style multi-session flow (3 concurrent per MS, agent poll + session-scoped WS). **Plan 18** production-hardens cross-replica relay via **`WsRelayTransport`** — AMQP pool + recovery when `nats.enabled=false`, NATS Core when `nats.enabled=true` (R102–R113). **Edgelet agent wire change required** for exec only (see [edgelet-invariants.md §10.1](../.cursor/controllerv3.8/docs/edgelet-invariants.md)).
Interactive **exec** and **log streaming** use paired WebSocket sessions between operators (Bearer JWT), Controller, and Edgelet agents (fog token). Plan 16 hardens log sessions and shared WS infra (HA, drain, OTEL). **Plan 17** redesigns **microservice exec** to log-style multi-session flow (5 concurrent per MS, agent poll + session-scoped WS). **Plan 18** production-hardens cross-replica relay via **`WsRelayTransport`** — AMQP pool + recovery when `nats.enabled=false`, NATS Core when `nats.enabled=true` (R102–R113). **Edgelet agent wire change required** for exec only (see [edgelet-invariants.md §10.1](../.cursor/controllerv3.8/docs/edgelet-invariants.md)).

```mermaid
flowchart TB
Expand Down Expand Up @@ -283,14 +283,14 @@ sequenceDiagram
| Topic | Normative value |
|-------|-----------------|
| MS exec entry | **Direct user WS** — no `POST …/microservices/…/exec` (R92, R94) |
| MS exec concurrency | **3** user exec WS per microservice (R93) |
| MS exec concurrency | **5** user exec WS per microservice |
| MS exec lifecycle | **Per-session** — close deletes session row only; **no** `execEnabled=false` (R98) |
| MS exec pending / max | **60s** pending for agent; **8h** max active session (Plan 16 carry-over) |
| Agent exec discovery | `GET /agent/exec/sessions` on `execSessions` change flag (R95, R100) |
| Agent exec WS | `/agent/exec/microservice/:uuid/:sessionId` only — legacy `/agent/exec/:uuid` removed (R96) |
| User session notify | **ACTIVATION** (type 5) with `{ sessionId, microserviceUuid }` (R97) |
| Fog debug provision | `POST/DELETE /iofog/:uuid/exec` unchanged; shell via `WS /microservices/system/exec/:debugMsUuid` (R99) |
| Log concurrency | **3** user log WS per microservice (or per fog for node logs) |
| Log concurrency | **5** user log WS per microservice (or per fog for node logs) |
| Log limits | Tail max **5,000** lines; **120s** pending; **2h** idle |
| Log content | Live relay only — no log line persistence; audit connect/disconnect |
| HA relay | Cross-replica sessions **require** a **relay backend** (R112): **AMQP** router queues when `nats.enabled=false`; **NATS Core** subjects on hub when `nats.enabled=true`. Same-replica may use direct WS; **fail fast** close **1013** when active backend unavailable |
Expand Down Expand Up @@ -369,18 +369,69 @@ For the full bilateral contract (including ControlPlane env vars and verificatio

| Topic | v3.8 behavior |
|-------|---------------|
| **Database** | Greenfield v3.8.0 schema — **new install only** (no v3.7 migrator). Supports **sqlite** (single-controller production), **mysql**, and **postgres** (multi-replica / HA). |
| **Database** | Greenfield v3.8.0 schema — **new install only** (no v3.7 migrator). Supports **sqlite** (single-controller production), **mysql**, and **postgres** (multi-replica / HA). All mutating paths use **`runInTransaction()`** (Plan 19, R114–R125). Plan **19-I** stabilization (R126–R135): unified ALS transaction context, phased NATS reconcile, grep gates, first-fog integration SLO. |

### Database profiles (Plan 19 / 19-I)

| Profile | Database | Controller replicas | Typical fleet size | Notes |
|---------|----------|---------------------|-------------------|-------|
| **Edge / PoT** | sqlite | 1 | ≤ **50** fogs (default warning threshold) | Single write queue; embedded OIDC |
| **Small production** | sqlite | 1 | 50–100 fogs | Supported within single-writer physics; soft warning logged above threshold |
| **Enterprise / HA** | mysql or postgres | 1+ | **100+** fogs recommended | Default for large fleets; `FOR UPDATE SKIP LOCKED` task claims; shared OIDC session store |

**Enterprise default:** mysql/postgres for fleets above **100** fogs or any multi-replica deployment. sqlite remains supported for single-node edge deployments within Plan 19 SLOs (200 fogs acceptance profile).

```mermaid
flowchart LR
subgraph callers [Mutating callers]
API[REST / Agent API]
WS[WS session DB ops]
JOBS[Background jobs]
end

subgraph runner [runInTransaction]
Q{provider?}
SQ[SQLite priority queue]
POOL[mysql/postgres pool]
TX[Real Sequelize transaction]
end

subgraph outbox [ReconcileOutbox]
INS[Same-commit insert]
DRAIN[Outbox drainer]
end

API --> runner
WS --> runner
JOBS --> runner
Q -->|sqlite| SQ --> TX
Q -->|mysql/pg| POOL --> TX
TX --> INS --> DRAIN
```

| Priority lane | Callers |
|---------------|---------|
| **interactive** | Agent routes, user RBAC API, WS session DB ops, OIDC/auth |
| **background** | Reconcile workers, outbox drainer, platform sweep, cleanup timers |

Full operator runbook: [operations/database-transactions.md](operations/database-transactions.md).

### SQLite single-node production

Small deployments with **one Controller process** may use SQLite as the production database (embedded OIDC requires a single replica in this profile).

| Topic | Behavior |
|-------|----------|
| **When to use** | Single Controller, no DB HA requirement, edge/small-cluster PoT |
| **Concurrency** | WAL journal mode + `busy_timeout` pragmas on connect; connection pool size 1 |
| **Background jobs** | Reconcile-heavy jobs start after a configurable delay (`settings.jobStartupDelaySeconds`, default 3s) and stagger by 500ms to avoid restart lock bursts |
| **Task claims** | Fog/service/NATS reconcile task claims retry on `SQLITE_BUSY` (same retry budget as `TransactionDecorator`) |
| **When to use** | Single Controller, no DB HA requirement, edge/small-cluster PoT (≤ recommended fog count) |
| **Write path** | All mutations via `runInTransaction()` — **real** ACID transactions (no `fakeTransaction`); nested reuse via **`runWithTransactionContext`** ALS (R126–R128) |
| **Concurrency** | Global **priority write queue** (interactive before background); pool `max: 1`; WAL + `busy_timeout` pragmas |
| **First-fog SLO (R133)** | sqlite integration gate: first fog reconcile + concurrent operator login/list **< 2s**; `RUN_INTEGRATION=1 npm run test:integration:first-fog` |
| **Load close gate (R135)** | `node test/load/transaction-safety-load.js --fogs=50 --soak-minutes=5` — agent p99 &lt; 200ms, operator p99 &lt; 1s |
| **Busy retry** | Exponential backoff + jitter on `SQLITE_BUSY` inside queue task (configurable max attempts) |
| **Reconcile enqueue** | **`ReconcileOutbox`** — mutation + outbox row in same commit; drainer creates reconcile tasks |
| **Background jobs** | `priority: 'background'`; startup stagger (`settings.jobStartupDelaySeconds`, default 3s) + 500ms between jobs |
| **Task claims** | Same runner; busy retry on sqlite; mysql/postgres use `FOR UPDATE SKIP LOCKED` |
| **Load SLO** | 200 fogs / 40s poll / 10 operators / 30 min soak: agent poll p99 **< 200ms**; operator REST p99 **< 1s** |
| **Persistence** | Mount a persistent volume for `controller_db.sqlite` and WAL sidecar files (`-wal`, `-shm`) |
| **Backup** | Use SQLite backup API or copy DB + WAL files during a quiet window |
| **HA path** | mysql/postgres + multiple Controller replicas — see [oidc-configuration.md](oidc-configuration.md) |
Expand Down Expand Up @@ -415,4 +466,5 @@ Agent routes and WebSocket exec/logs for agents are **outside** OIDC — see [rb
| [pki.md](pki.md) | Central CAs, cert renewal, NATS operator rotation |
| [oidc-configuration.md](oidc-configuration.md) | Embedded/external auth modes and env vars |
| [external-oidc-client-setup.md](external-oidc-client-setup.md) | External IdP client configuration |
| [operations/database-transactions.md](operations/database-transactions.md) | Transaction runner, OTEL metrics, SQLITE_BUSY runbook |
| [CONTRIBUTING](../CONTRIBUTING) | Dual-mirror CI and development |
Loading