Hanalyx · remyluslosius · Jun 20, 2026 · Jun 19, 2026 · Jun 19, 2026
diff --git a/docs/engineering/rbac_registry.md b/docs/engineering/rbac_registry.md
@@ -22,7 +22,7 @@ In a string-literal world, all three layers refer to permissions by free-form st
 
 - The spec says `host:read`. The handler checks `hosts:read`. The role grants `host.read`. All three are slightly different. Tests pass because fixtures grant superusers everything. Production fails when a real `auditor` role tries to list hosts and gets `403`.
 - A new dangerous permission gets added to a handler but never to the registry. There is no audit hook that says "this permission was added"; reviewers don't know the surface grew.
-- License-gated permissions (`remediation:execute` requires OpenWatch+) are gated in some places and not others — gating is a per-handler decoration that goes stale.
+- License-gated permissions (`audit:export` requires OpenWatch+) are gated in some places and not others — gating is a per-handler decoration that goes stale.
 
 A registry collapses all three layers onto one source. The OpenAPI validator, the handler middleware, and the role definitions all read from the same file. Misspell a permission anywhere → build fails. License gating co-locates with permission definition → middleware enforces both in one pass. Custom roles created at runtime validate every permission against the registry → no silent grant of a permission that doesn't exist.
 
@@ -132,7 +132,8 @@ type PermissionMeta struct {
 
 var Permissions = map[Permission]PermissionMeta{
     HostRead: {Category: "host", Description: "View host details...", Dangerous: false, LicenseGated: ""},
-    RemediationExecute: {Category: "remediation", Description: "Execute...", Dangerous: true, LicenseGated: "remediation_execution"},
+    RemediationExecute: {Category: "remediation", Description: "Execute...", Dangerous: true, LicenseGated: ""}, // free core (single-rule); bulk/auto gated at the handler
+    AuditExport: {Category: "audit", Description: "Export audit data...", Dangerous: false, LicenseGated: "audit_export"},
     // ...
 }
 
@@ -454,7 +455,7 @@ A custom role with `remediation:execute` is **allowed** even if the license does
 
 ### 8.5 Custom roles and policy cross-validation
 
-The `approvals` policy declares `approver_roles: [security_admin, ops_lead]`. The policy loader validates these against the active role set (built-in + custom). An unknown role → `policy.invalid` audit event; previous policy state retained. Custom roles can therefore appear in approver lists once they exist.
+The policies-as-data framework registers an `approvals` policy *type*, but **no `approvals` policy is currently configured** and no code reads `approver_roles`. The enforced approval gate today is the `remediation:approve` / `exception:approve` permission alone (held by `security_admin` and `admin`; see [remediation_exception_governance.md](remediation_exception_governance.md)). If an `approvals` policy is added later, the loader cross-validates its `approver_roles` against the active role set (built-in + custom; an unknown role → `policy.invalid` audit event, previous policy state retained). Those `approver_roles` MUST be a subset of the roles that hold the matching `*:approve` permission — otherwise the policy names a role that cannot actually approve.
 
 ---
 

diff --git a/docs/engineering/remediation_exception_governance.md b/docs/engineering/remediation_exception_governance.md
@@ -0,0 +1,87 @@
+# Remediation & Exception Governance — Role Matrix
+
+> **Status:** Current as of 2026-06-19.
+> **Authority:** `auth/permissions.yaml` is the source of truth for who can do
+> what (codegen produces `internal/auth/permissions.gen.go` / `roles.gen.go`).
+> This document is a human-readable view of it; if the two disagree, the YAML
+> wins and this doc is stale.
+> **Audience:** Operators deciding how to assign roles, and engineers working on
+> the remediation / exception lifecycles.
+
+This is the answer to "which role can **request**, **approve/reject**, and
+**execute** remediation and exceptions." Two governed lifecycles share the same
+separation-of-duties rule.
+
+---
+
+## Built-in roles (least → most privilege)
+
+`viewer` → `auditor` → `ops_lead` → `security_admin` → `admin`
+
+`admin` holds the `*` wildcard (every permission). Custom roles may be created
+and are validated against the permission registry.
+
+## Remediation
+
+| Action | permission | viewer | auditor | ops_lead | security_admin | admin |
+|--------|------------|:------:|:-------:|:--------:|:--------------:|:-----:|
+| View requests/history | `remediation:read` | ✓ | ✓ | ✓ | ✓ | ✓ |
+| **Request** | `remediation:request` | | | ✓ | ✓ | ✓ |
+| **Approve / Reject** | `remediation:approve` | | | | ✓ | ✓ |
+| Execute (Fix) | `remediation:execute` | | | ✓ | ✓ | ✓ |
+| Rollback | `remediation:rollback` | | | ✓ | ✓ | ✓ |
+
+Note the deliberate asymmetry: **`ops_lead` can request and execute remediation
+but cannot approve it** — approval needs `security_admin` or `admin`.
+
+`remediation:execute` and `remediation:rollback` are **free core** (single-rule
+manual). Bulk and automated remediation is the licensed track, gated separately
+at the handler via `license.EnforceFeature(remediation_execution)` — not via a
+permission.
+
+## Exceptions
+
+| Action | permission | viewer | auditor | ops_lead | security_admin | admin |
+|--------|------------|:------:|:-------:|:--------:|:--------------:|:-----:|
+| View | `exception:read` | ✓ | ✓ | ✓ | ✓ | ✓ |
+| **Request** | `exception:request` | | ✓ | ✓ | ✓ | ✓ |
+| Comment | `exception:comment` | | ✓ | ✓ | ✓ | ✓ |
+| **Approve** | `exception:approve` | | ✓ | | ✓ | ✓ |
+| Revoke | `exception:revoke` | | | | ✓ | ✓ |
+
+Note the asymmetry mirrors remediation in reverse: **`auditor` can approve
+exceptions but not remediation**, and **`ops_lead` can request exceptions but not
+approve them**.
+
+## Separation of duties (self-review rule)
+
+For **both** lifecycles, the reviewer must differ from the requester. Approving
+or rejecting your own request is refused with **409 `self_review`** — and there
+is **no bypass**: not for `admin`, and there is no config flag.
+
+- Remediation: `internal/remediation/service.go` (`ErrSelfReview`)
+- Exceptions: `internal/exception/service.go`
+
+**One-operator note:** because of this rule, a single-operator workspace cannot
+complete the request → approve flow today. The resolution for the free tier is
+[Remediation Approval Governance (ADR)](remediation_governance_adr.md): free-core
+single-rule remediation will not require a separate approval; the approval gate
+(with self-review) is reserved for the licensed bulk/auto track. Until that lands,
+two distinct users are required to approve any remediation/exception.
+
+## On `approver_roles` policies
+
+The policies-as-data framework registers an `approvals` policy *type*
+(`internal/policy/types.go`), but **no `approvals` policy is currently
+configured**, and no code reads `approver_roles`. The enforced approval gate
+today is purely the `remediation:approve` / `exception:approve` **permission**
+above. If an `approvals` policy is ever added, its `approver_roles` must be a
+subset of the roles that hold the corresponding `*:approve` permission, or the
+policy can name a role that cannot actually approve.
+
+## References
+
+- Source of truth: `auth/permissions.yaml`
+- RBAC registry: [rbac_registry.md](rbac_registry.md)
+- Decision record: [remediation_governance_adr.md](remediation_governance_adr.md)
+- Operator guide: [../guides/HOSTS_AND_REMEDIATION.md](../guides/HOSTS_AND_REMEDIATION.md)
diff --git a/docs/engineering/remediation_governance_adr.md b/docs/engineering/remediation_governance_adr.md
@@ -0,0 +1,112 @@
+# Remediation Approval Governance (ADR)
+
+> **Status:** Accepted 2026-06-19. Implementation pending (the conditional-approval
+> path is not yet built; today every remediation request goes through
+> request → approve → execute).
+> **Authority:** This document is the decision record for *when* a remediation
+> requires human approval. The role/permission matrix that backs it is
+> [remediation_exception_governance.md](remediation_exception_governance.md);
+> the permission source of truth is `auth/permissions.yaml`.
+> **Audience:** Anyone implementing or specing the remediation lifecycle, and
+> anyone scoping the OpenWatch+ licensed remediation track.
+
+---
+
+## Context
+
+Remediation is open-core. The boundary, decided separately, is:
+
+- **Free core:** per-rule **manual** remediation — an operator fixes one finding
+  on one host, and can roll it back.
+- **OpenWatch+ (licensed):** **bulk and automated** remediation — apply many
+  rules / fleet-wide, and policy-driven auto-remediation. Gated at the handler
+  via `license.EnforceFeature(remediation_execution)`.
+
+The shipped lifecycle is a single state machine with a human approval gate:
+
+```
+Request → pending_approval → (Approve) → approved → (MarkExecuting) → executing → executed → rolled_back
+                  │                                                                  (failed, dry_run_complete are side branches)
+                  └── (Reject) → rejected
+```
+
+Approval enforces **separation of duties**: the reviewer must differ from the
+requester. This is hard-coded with no bypass (`internal/remediation/service.go`,
+`if requestedBy == reviewedBy { return ErrSelfReview }`) and the execute handler
+refuses anything not in `approved` state
+(`internal/server/remediation_handlers.go`, 409 `only an approved request can be
+executed`).
+
+**The problem this ADR resolves:** that gate makes the product unusable for a
+single operator. A lone administrator can request but can never approve their
+own request (409 `self_review`, even as `admin`), so they never reach Fix. The
+same applies to compliance exceptions. Requiring approval here also buys *no*
+separation of duties — the requester and the approver would be the same human.
+
+## Decision
+
+**Keep the governance machinery; make the human approval step *conditional* on
+the remediation track ("A-keep").**
+
+- **Free-core, single-rule manual remediation does not require a separate human
+  approval.** A free-core request reaches an executable state directly (auto-approved
+  on creation, or a `ready` state the execute handler also accepts). The operator
+  clicks **Fix**; there is no `pending_approval` interstitial.
+- **The licensed bulk / auto-remediation track keeps the full request → approve →
+  execute flow with the self-review separation-of-duties guard.** This is where an
+  approval gate carries real risk-management value (many rules, fleet-wide, or
+  unattended), and where multiple roles realistically exist.
+
+We do **not** delete the governance code. It is exactly the machinery the
+licensed track needs.
+
+## Consequences
+
+**Stays, unchanged:**
+
+- The `remediation_requests` + `remediation_transactions` tables (migration 0037)
+  — every request and its transactions are still recorded for audit, history, and
+  rollback, approval or not.
+- The execution half of the state machine (`executing → executed → rolled_back →
+  failed`, `dry_run_complete`), `MarkExecuting`, `RecordExecution`, the
+  `RemediationWorker`, the execute/rollback handlers, the
+  `remediation:execute` / `remediation:rollback` permissions, and the frontend
+  Fix/rollback UI.
+
+**Stays, but becomes conditional — reserved for the licensed track:**
+
+- `Request` / `Approve` / `Reject`, the self-review guard, the
+  `pending_approval` / `approved` / `rejected` states, and the
+  `remediation:request` / `remediation:approve` permissions.
+
+**Changes (small, surgical):**
+
+1. A free-core single-rule request reaches an executable state without a human
+   approval transition.
+2. UI: the Fix button is live immediately for free-core (no pending-approval step).
+3. Specs/tests: the `api-remediation` ACs that assert "must be approved before
+   execute" split into free-core (no approval) vs. licensed (approval + self-review).
+   The self-review test stays, retargeted to the licensed path.
+
+**Accepted trade-off:** until the bulk/auto track ships, the approve/reject/
+self-review code is present-but-dormant (exercised only by its tests). We accept
+carrying it rather than deleting working, tested code and rebuilding it later.
+
+## Alternatives considered
+
+- **Single-operator mode (config flag relaxing self-review).** Viable, but adds a
+  config surface and an "I approved my own request" audit nuance. The conditional
+  split achieves the same outcome for the free tier without a flag.
+- **Require a second approver account.** Rejected as the *only* answer: it is poor
+  UX and, since the same human clicks both, delivers no real separation of duties.
+- **A-defer (strip governance now, rebuild for the licensed track).** Rejected:
+  throws away working, tested, just-merged code to rebuild the same machinery later.
+
+## References
+
+- Role/permission matrix + self-review rule:
+  [remediation_exception_governance.md](remediation_exception_governance.md)
+- Permission source of truth: `auth/permissions.yaml`
+- RBAC registry: [rbac_registry.md](rbac_registry.md)
+- Lifecycle code: `internal/remediation/`, `internal/server/remediation_handlers.go`
+- Spec: `specs/api/remediation.spec.yaml`
diff --git a/docs/guides/HOSTS_AND_REMEDIATION.md b/docs/guides/HOSTS_AND_REMEDIATION.md
@@ -218,12 +218,19 @@ on target hosts.
 
 ![Remediation confirmation dialog](../images/hosts/confirm-remediation.png)
 
-For organizations that require approval workflows:
+For organizations that require an approval step:
 
-1. Select findings and click **Request Remediation**.
-2. Enter a justification for the changes.
-3. An admin reviews and approves the request.
-4. Once approved, the remediation executes automatically.
+1. A user with `remediation:request` (`ops_lead`, `security_admin`, or `admin`)
+   selects findings, clicks **Request Remediation**, and enters a justification.
+2. A **different** user with `remediation:approve` (`security_admin` or `admin`)
+   reviews and approves or rejects it. You cannot approve your own request
+   (separation of duties; self-approval returns `409 self_review`).
+3. Once approved, a user with `remediation:execute` clicks **Fix** to apply the
+   change. Execution is operator-initiated, not automatic.
+
+See [Remediation & Exception Governance](../engineering/remediation_exception_governance.md)
+for the full role matrix. Single-operator workspaces cannot self-approve today;
+see the [governance ADR](../engineering/remediation_governance_adr.md).
 
 ---
 
@@ -258,7 +265,8 @@ If a remediation causes problems, you can roll back to the pre-change state.
 
 ![Rollback confirmation](../images/hosts/rollback.png)
 
-Rollback requires SUPER_ADMIN or SECURITY_ADMIN role (scan:rollback permission).
+Rollback requires the `remediation:rollback` permission (`ops_lead`,
+`security_admin`, or `admin`).
 
 ### After Rolling Back
 
@@ -269,16 +277,22 @@ returned to its previous state.
 
 ## Required Permissions
 
-| Operation | Minimum Role |
-|-----------|-------------|
-| View hosts | GUEST |
-| Add / edit / delete hosts | SECURITY_ANALYST |
-| Bulk import / export | SUPER_ADMIN, SECURITY_ADMIN |
-| Start remediation | SECURITY_ADMIN (scan:execute) |
-| Approve remediation | SUPER_ADMIN (scan:approve) |
-| Rollback remediation | SUPER_ADMIN, SECURITY_ADMIN (scan:rollback) |
-| View server intelligence | SECURITY_ANALYST |
-| Manage host groups | SECURITY_ANALYST |
+Built-in roles, least to most privilege: `viewer` → `auditor` → `ops_lead` →
+`security_admin` → `admin` (`admin` holds every permission). The permission
+source of truth is `auth/permissions.yaml`; see
+[Remediation & Exception Governance](../engineering/remediation_exception_governance.md)
+for the complete matrix.
+
+| Operation | Permission | Roles that hold it |
+|-----------|------------|--------------------|
+| View hosts | `host:read` | viewer, auditor, ops_lead, security_admin, admin |
+| Add / edit hosts | `host:write` | ops_lead, security_admin, admin |
+| Delete hosts | `host:delete` | security_admin, admin |
+| Request remediation | `remediation:request` | ops_lead, security_admin, admin |
+| Approve / reject remediation | `remediation:approve` | security_admin, admin |
+| Execute remediation (Fix) | `remediation:execute` | ops_lead, security_admin, admin |
+| Rollback remediation | `remediation:rollback` | ops_lead, security_admin, admin |
+| View server intelligence | `host:read` | viewer, auditor, ops_lead, security_admin, admin |
 
 ---
 

diff --git a/internal/auth/permissions_test.go b/internal/auth/permissions_test.go
@@ -266,6 +266,71 @@ func TestRequirePermission_HotPathLatency(t *testing.T) {
 	})
 }
 
+// @ac AC-17
+// AC-17: built-in role grants match the remediation/exception governance matrix
+// (docs/engineering/remediation_exception_governance.md). This locks separation
+// of duties — a permissions.yaml edit that, e.g., grants ops_lead
+// remediation:approve or removes auditor's exception:approve fails the build.
+func TestGovernanceRoleMatrix(t *testing.T) {
+	t.Run("system-rbac/AC-17", func(t *testing.T) {
+		grants := func(role RoleID, perm Permission) bool {
+			def, ok := BuiltInRoles[role]
+			if !ok {
+				t.Fatalf("unknown built-in role %q", role)
+			}
+			for _, p := range def.Permissions {
+				if p == perm {
+					return true
+				}
+			}
+			return false
+		}
+		cases := []struct {
+			role RoleID
+			perm Permission
+			want bool
+		}{
+			// Remediation: ops_lead requests/executes/rolls back but CANNOT
+			// approve (separation of duties). Approve is security_admin+admin only.
+			{RoleOpsLead, RemediationRequest, true},
+			{RoleOpsLead, RemediationExecute, true},
+			{RoleOpsLead, RemediationRollback, true},
+			{RoleOpsLead, RemediationApprove, false},
+			{RoleSecurityAdmin, RemediationApprove, true},
+			{RoleAdmin, RemediationApprove, true},
+			{RoleViewer, RemediationRequest, false},
+			{RoleAuditor, RemediationRequest, false},
+			{RoleAuditor, RemediationApprove, false},
+			// security_admin holds remediation:* (wildcard expanded at codegen).
+			{RoleSecurityAdmin, RemediationRequest, true},
+			{RoleSecurityAdmin, RemediationExecute, true},
+			{RoleSecurityAdmin, RemediationRollback, true},
+			// Exceptions: auditor approves; ops_lead requests but cannot approve;
+			// revoke is security_admin+admin only.
+			{RoleAuditor, ExceptionRequest, true},
+			{RoleAuditor, ExceptionApprove, true},
+			{RoleOpsLead, ExceptionRequest, true},
+			{RoleOpsLead, ExceptionApprove, false},
+			{RoleOpsLead, ExceptionRevoke, false},
+			{RoleSecurityAdmin, ExceptionApprove, true},
+			{RoleSecurityAdmin, ExceptionRevoke, true},
+			{RoleViewer, ExceptionRequest, false},
+			{RoleViewer, ExceptionApprove, false},
+			// viewer holds only the read of each governed category; admin holds all.
+			{RoleViewer, RemediationRead, true},
+			{RoleViewer, ExceptionRead, true},
+			{RoleAdmin, RemediationRequest, true},
+			{RoleAdmin, ExceptionRevoke, true},
+		}
+		for _, c := range cases {
+			if got := grants(c.role, c.perm); got != c.want {
+				t.Errorf("BuiltInRoles[%s] grants %s = %v, want %v (governance matrix — see docs/engineering/remediation_exception_governance.md)",
+					c.role, c.perm, got, c.want)
+			}
+		}
+	})
+}
+
 // Suppress unused-import warning in cases where filepath/os are vestigial.
 var _ = filepath.Join
 var _ = os.Stat