diff --git a/docs/engineering/rbac_registry.md b/docs/engineering/rbac_registry.md index 5a4a814b..a49818b4 100644 --- a/docs/engineering/rbac_registry.md +++ b/docs/engineering/rbac_registry.md @@ -22,7 +22,7 @@ In a string-literal world, all three layers refer to permissions by free-form st - The spec says `host:read`. The handler checks `hosts:read`. The role grants `host.read`. All three are slightly different. Tests pass because fixtures grant superusers everything. Production fails when a real `auditor` role tries to list hosts and gets `403`. - A new dangerous permission gets added to a handler but never to the registry. There is no audit hook that says "this permission was added"; reviewers don't know the surface grew. -- License-gated permissions (`remediation:execute` requires OpenWatch+) are gated in some places and not others — gating is a per-handler decoration that goes stale. +- License-gated permissions (`audit:export` requires OpenWatch+) are gated in some places and not others — gating is a per-handler decoration that goes stale. A registry collapses all three layers onto one source. The OpenAPI validator, the handler middleware, and the role definitions all read from the same file. Misspell a permission anywhere → build fails. License gating co-locates with permission definition → middleware enforces both in one pass. Custom roles created at runtime validate every permission against the registry → no silent grant of a permission that doesn't exist. @@ -132,7 +132,8 @@ type PermissionMeta struct { var Permissions = map[Permission]PermissionMeta{ HostRead: {Category: "host", Description: "View host details...", Dangerous: false, LicenseGated: ""}, - RemediationExecute: {Category: "remediation", Description: "Execute...", Dangerous: true, LicenseGated: "remediation_execution"}, + RemediationExecute: {Category: "remediation", Description: "Execute...", Dangerous: true, LicenseGated: ""}, // free core (single-rule); bulk/auto gated at the handler + AuditExport: {Category: "audit", Description: "Export audit data...", Dangerous: false, LicenseGated: "audit_export"}, // ... } @@ -454,7 +455,7 @@ A custom role with `remediation:execute` is **allowed** even if the license does ### 8.5 Custom roles and policy cross-validation -The `approvals` policy declares `approver_roles: [security_admin, ops_lead]`. The policy loader validates these against the active role set (built-in + custom). An unknown role → `policy.invalid` audit event; previous policy state retained. Custom roles can therefore appear in approver lists once they exist. +The policies-as-data framework registers an `approvals` policy *type*, but **no `approvals` policy is currently configured** and no code reads `approver_roles`. The enforced approval gate today is the `remediation:approve` / `exception:approve` permission alone (held by `security_admin` and `admin`; see [remediation_exception_governance.md](remediation_exception_governance.md)). If an `approvals` policy is added later, the loader cross-validates its `approver_roles` against the active role set (built-in + custom; an unknown role → `policy.invalid` audit event, previous policy state retained). Those `approver_roles` MUST be a subset of the roles that hold the matching `*:approve` permission — otherwise the policy names a role that cannot actually approve. --- diff --git a/docs/engineering/remediation_exception_governance.md b/docs/engineering/remediation_exception_governance.md new file mode 100644 index 00000000..b5bee7c6 --- /dev/null +++ b/docs/engineering/remediation_exception_governance.md @@ -0,0 +1,87 @@ +# Remediation & Exception Governance — Role Matrix + +> **Status:** Current as of 2026-06-19. +> **Authority:** `auth/permissions.yaml` is the source of truth for who can do +> what (codegen produces `internal/auth/permissions.gen.go` / `roles.gen.go`). +> This document is a human-readable view of it; if the two disagree, the YAML +> wins and this doc is stale. +> **Audience:** Operators deciding how to assign roles, and engineers working on +> the remediation / exception lifecycles. + +This is the answer to "which role can **request**, **approve/reject**, and +**execute** remediation and exceptions." Two governed lifecycles share the same +separation-of-duties rule. + +--- + +## Built-in roles (least → most privilege) + +`viewer` → `auditor` → `ops_lead` → `security_admin` → `admin` + +`admin` holds the `*` wildcard (every permission). Custom roles may be created +and are validated against the permission registry. + +## Remediation + +| Action | permission | viewer | auditor | ops_lead | security_admin | admin | +|--------|------------|:------:|:-------:|:--------:|:--------------:|:-----:| +| View requests/history | `remediation:read` | ✓ | ✓ | ✓ | ✓ | ✓ | +| **Request** | `remediation:request` | | | ✓ | ✓ | ✓ | +| **Approve / Reject** | `remediation:approve` | | | | ✓ | ✓ | +| Execute (Fix) | `remediation:execute` | | | ✓ | ✓ | ✓ | +| Rollback | `remediation:rollback` | | | ✓ | ✓ | ✓ | + +Note the deliberate asymmetry: **`ops_lead` can request and execute remediation +but cannot approve it** — approval needs `security_admin` or `admin`. + +`remediation:execute` and `remediation:rollback` are **free core** (single-rule +manual). Bulk and automated remediation is the licensed track, gated separately +at the handler via `license.EnforceFeature(remediation_execution)` — not via a +permission. + +## Exceptions + +| Action | permission | viewer | auditor | ops_lead | security_admin | admin | +|--------|------------|:------:|:-------:|:--------:|:--------------:|:-----:| +| View | `exception:read` | ✓ | ✓ | ✓ | ✓ | ✓ | +| **Request** | `exception:request` | | ✓ | ✓ | ✓ | ✓ | +| Comment | `exception:comment` | | ✓ | ✓ | ✓ | ✓ | +| **Approve** | `exception:approve` | | ✓ | | ✓ | ✓ | +| Revoke | `exception:revoke` | | | | ✓ | ✓ | + +Note the asymmetry mirrors remediation in reverse: **`auditor` can approve +exceptions but not remediation**, and **`ops_lead` can request exceptions but not +approve them**. + +## Separation of duties (self-review rule) + +For **both** lifecycles, the reviewer must differ from the requester. Approving +or rejecting your own request is refused with **409 `self_review`** — and there +is **no bypass**: not for `admin`, and there is no config flag. + +- Remediation: `internal/remediation/service.go` (`ErrSelfReview`) +- Exceptions: `internal/exception/service.go` + +**One-operator note:** because of this rule, a single-operator workspace cannot +complete the request → approve flow today. The resolution for the free tier is +[Remediation Approval Governance (ADR)](remediation_governance_adr.md): free-core +single-rule remediation will not require a separate approval; the approval gate +(with self-review) is reserved for the licensed bulk/auto track. Until that lands, +two distinct users are required to approve any remediation/exception. + +## On `approver_roles` policies + +The policies-as-data framework registers an `approvals` policy *type* +(`internal/policy/types.go`), but **no `approvals` policy is currently +configured**, and no code reads `approver_roles`. The enforced approval gate +today is purely the `remediation:approve` / `exception:approve` **permission** +above. If an `approvals` policy is ever added, its `approver_roles` must be a +subset of the roles that hold the corresponding `*:approve` permission, or the +policy can name a role that cannot actually approve. + +## References + +- Source of truth: `auth/permissions.yaml` +- RBAC registry: [rbac_registry.md](rbac_registry.md) +- Decision record: [remediation_governance_adr.md](remediation_governance_adr.md) +- Operator guide: [../guides/HOSTS_AND_REMEDIATION.md](../guides/HOSTS_AND_REMEDIATION.md) diff --git a/docs/engineering/remediation_governance_adr.md b/docs/engineering/remediation_governance_adr.md new file mode 100644 index 00000000..259fe315 --- /dev/null +++ b/docs/engineering/remediation_governance_adr.md @@ -0,0 +1,112 @@ +# Remediation Approval Governance (ADR) + +> **Status:** Accepted 2026-06-19. Implementation pending (the conditional-approval +> path is not yet built; today every remediation request goes through +> request → approve → execute). +> **Authority:** This document is the decision record for *when* a remediation +> requires human approval. The role/permission matrix that backs it is +> [remediation_exception_governance.md](remediation_exception_governance.md); +> the permission source of truth is `auth/permissions.yaml`. +> **Audience:** Anyone implementing or specing the remediation lifecycle, and +> anyone scoping the OpenWatch+ licensed remediation track. + +--- + +## Context + +Remediation is open-core. The boundary, decided separately, is: + +- **Free core:** per-rule **manual** remediation — an operator fixes one finding + on one host, and can roll it back. +- **OpenWatch+ (licensed):** **bulk and automated** remediation — apply many + rules / fleet-wide, and policy-driven auto-remediation. Gated at the handler + via `license.EnforceFeature(remediation_execution)`. + +The shipped lifecycle is a single state machine with a human approval gate: + +``` +Request → pending_approval → (Approve) → approved → (MarkExecuting) → executing → executed → rolled_back + │ (failed, dry_run_complete are side branches) + └── (Reject) → rejected +``` + +Approval enforces **separation of duties**: the reviewer must differ from the +requester. This is hard-coded with no bypass (`internal/remediation/service.go`, +`if requestedBy == reviewedBy { return ErrSelfReview }`) and the execute handler +refuses anything not in `approved` state +(`internal/server/remediation_handlers.go`, 409 `only an approved request can be +executed`). + +**The problem this ADR resolves:** that gate makes the product unusable for a +single operator. A lone administrator can request but can never approve their +own request (409 `self_review`, even as `admin`), so they never reach Fix. The +same applies to compliance exceptions. Requiring approval here also buys *no* +separation of duties — the requester and the approver would be the same human. + +## Decision + +**Keep the governance machinery; make the human approval step *conditional* on +the remediation track ("A-keep").** + +- **Free-core, single-rule manual remediation does not require a separate human + approval.** A free-core request reaches an executable state directly (auto-approved + on creation, or a `ready` state the execute handler also accepts). The operator + clicks **Fix**; there is no `pending_approval` interstitial. +- **The licensed bulk / auto-remediation track keeps the full request → approve → + execute flow with the self-review separation-of-duties guard.** This is where an + approval gate carries real risk-management value (many rules, fleet-wide, or + unattended), and where multiple roles realistically exist. + +We do **not** delete the governance code. It is exactly the machinery the +licensed track needs. + +## Consequences + +**Stays, unchanged:** + +- The `remediation_requests` + `remediation_transactions` tables (migration 0037) + — every request and its transactions are still recorded for audit, history, and + rollback, approval or not. +- The execution half of the state machine (`executing → executed → rolled_back → + failed`, `dry_run_complete`), `MarkExecuting`, `RecordExecution`, the + `RemediationWorker`, the execute/rollback handlers, the + `remediation:execute` / `remediation:rollback` permissions, and the frontend + Fix/rollback UI. + +**Stays, but becomes conditional — reserved for the licensed track:** + +- `Request` / `Approve` / `Reject`, the self-review guard, the + `pending_approval` / `approved` / `rejected` states, and the + `remediation:request` / `remediation:approve` permissions. + +**Changes (small, surgical):** + +1. A free-core single-rule request reaches an executable state without a human + approval transition. +2. UI: the Fix button is live immediately for free-core (no pending-approval step). +3. Specs/tests: the `api-remediation` ACs that assert "must be approved before + execute" split into free-core (no approval) vs. licensed (approval + self-review). + The self-review test stays, retargeted to the licensed path. + +**Accepted trade-off:** until the bulk/auto track ships, the approve/reject/ +self-review code is present-but-dormant (exercised only by its tests). We accept +carrying it rather than deleting working, tested code and rebuilding it later. + +## Alternatives considered + +- **Single-operator mode (config flag relaxing self-review).** Viable, but adds a + config surface and an "I approved my own request" audit nuance. The conditional + split achieves the same outcome for the free tier without a flag. +- **Require a second approver account.** Rejected as the *only* answer: it is poor + UX and, since the same human clicks both, delivers no real separation of duties. +- **A-defer (strip governance now, rebuild for the licensed track).** Rejected: + throws away working, tested, just-merged code to rebuild the same machinery later. + +## References + +- Role/permission matrix + self-review rule: + [remediation_exception_governance.md](remediation_exception_governance.md) +- Permission source of truth: `auth/permissions.yaml` +- RBAC registry: [rbac_registry.md](rbac_registry.md) +- Lifecycle code: `internal/remediation/`, `internal/server/remediation_handlers.go` +- Spec: `specs/api/remediation.spec.yaml` diff --git a/docs/guides/HOSTS_AND_REMEDIATION.md b/docs/guides/HOSTS_AND_REMEDIATION.md index 5eacba3d..abca7caf 100644 --- a/docs/guides/HOSTS_AND_REMEDIATION.md +++ b/docs/guides/HOSTS_AND_REMEDIATION.md @@ -218,12 +218,19 @@ on target hosts. ![Remediation confirmation dialog](../images/hosts/confirm-remediation.png) -For organizations that require approval workflows: +For organizations that require an approval step: -1. Select findings and click **Request Remediation**. -2. Enter a justification for the changes. -3. An admin reviews and approves the request. -4. Once approved, the remediation executes automatically. +1. A user with `remediation:request` (`ops_lead`, `security_admin`, or `admin`) + selects findings, clicks **Request Remediation**, and enters a justification. +2. A **different** user with `remediation:approve` (`security_admin` or `admin`) + reviews and approves or rejects it. You cannot approve your own request + (separation of duties; self-approval returns `409 self_review`). +3. Once approved, a user with `remediation:execute` clicks **Fix** to apply the + change. Execution is operator-initiated, not automatic. + +See [Remediation & Exception Governance](../engineering/remediation_exception_governance.md) +for the full role matrix. Single-operator workspaces cannot self-approve today; +see the [governance ADR](../engineering/remediation_governance_adr.md). --- @@ -258,7 +265,8 @@ If a remediation causes problems, you can roll back to the pre-change state. ![Rollback confirmation](../images/hosts/rollback.png) -Rollback requires SUPER_ADMIN or SECURITY_ADMIN role (scan:rollback permission). +Rollback requires the `remediation:rollback` permission (`ops_lead`, +`security_admin`, or `admin`). ### After Rolling Back @@ -269,16 +277,22 @@ returned to its previous state. ## Required Permissions -| Operation | Minimum Role | -|-----------|-------------| -| View hosts | GUEST | -| Add / edit / delete hosts | SECURITY_ANALYST | -| Bulk import / export | SUPER_ADMIN, SECURITY_ADMIN | -| Start remediation | SECURITY_ADMIN (scan:execute) | -| Approve remediation | SUPER_ADMIN (scan:approve) | -| Rollback remediation | SUPER_ADMIN, SECURITY_ADMIN (scan:rollback) | -| View server intelligence | SECURITY_ANALYST | -| Manage host groups | SECURITY_ANALYST | +Built-in roles, least to most privilege: `viewer` → `auditor` → `ops_lead` → +`security_admin` → `admin` (`admin` holds every permission). The permission +source of truth is `auth/permissions.yaml`; see +[Remediation & Exception Governance](../engineering/remediation_exception_governance.md) +for the complete matrix. + +| Operation | Permission | Roles that hold it | +|-----------|------------|--------------------| +| View hosts | `host:read` | viewer, auditor, ops_lead, security_admin, admin | +| Add / edit hosts | `host:write` | ops_lead, security_admin, admin | +| Delete hosts | `host:delete` | security_admin, admin | +| Request remediation | `remediation:request` | ops_lead, security_admin, admin | +| Approve / reject remediation | `remediation:approve` | security_admin, admin | +| Execute remediation (Fix) | `remediation:execute` | ops_lead, security_admin, admin | +| Rollback remediation | `remediation:rollback` | ops_lead, security_admin, admin | +| View server intelligence | `host:read` | viewer, auditor, ops_lead, security_admin, admin | --- diff --git a/internal/auth/permissions_test.go b/internal/auth/permissions_test.go index 38edfa3e..e5dac7ea 100644 --- a/internal/auth/permissions_test.go +++ b/internal/auth/permissions_test.go @@ -266,6 +266,71 @@ func TestRequirePermission_HotPathLatency(t *testing.T) { }) } +// @ac AC-17 +// AC-17: built-in role grants match the remediation/exception governance matrix +// (docs/engineering/remediation_exception_governance.md). This locks separation +// of duties — a permissions.yaml edit that, e.g., grants ops_lead +// remediation:approve or removes auditor's exception:approve fails the build. +func TestGovernanceRoleMatrix(t *testing.T) { + t.Run("system-rbac/AC-17", func(t *testing.T) { + grants := func(role RoleID, perm Permission) bool { + def, ok := BuiltInRoles[role] + if !ok { + t.Fatalf("unknown built-in role %q", role) + } + for _, p := range def.Permissions { + if p == perm { + return true + } + } + return false + } + cases := []struct { + role RoleID + perm Permission + want bool + }{ + // Remediation: ops_lead requests/executes/rolls back but CANNOT + // approve (separation of duties). Approve is security_admin+admin only. + {RoleOpsLead, RemediationRequest, true}, + {RoleOpsLead, RemediationExecute, true}, + {RoleOpsLead, RemediationRollback, true}, + {RoleOpsLead, RemediationApprove, false}, + {RoleSecurityAdmin, RemediationApprove, true}, + {RoleAdmin, RemediationApprove, true}, + {RoleViewer, RemediationRequest, false}, + {RoleAuditor, RemediationRequest, false}, + {RoleAuditor, RemediationApprove, false}, + // security_admin holds remediation:* (wildcard expanded at codegen). + {RoleSecurityAdmin, RemediationRequest, true}, + {RoleSecurityAdmin, RemediationExecute, true}, + {RoleSecurityAdmin, RemediationRollback, true}, + // Exceptions: auditor approves; ops_lead requests but cannot approve; + // revoke is security_admin+admin only. + {RoleAuditor, ExceptionRequest, true}, + {RoleAuditor, ExceptionApprove, true}, + {RoleOpsLead, ExceptionRequest, true}, + {RoleOpsLead, ExceptionApprove, false}, + {RoleOpsLead, ExceptionRevoke, false}, + {RoleSecurityAdmin, ExceptionApprove, true}, + {RoleSecurityAdmin, ExceptionRevoke, true}, + {RoleViewer, ExceptionRequest, false}, + {RoleViewer, ExceptionApprove, false}, + // viewer holds only the read of each governed category; admin holds all. + {RoleViewer, RemediationRead, true}, + {RoleViewer, ExceptionRead, true}, + {RoleAdmin, RemediationRequest, true}, + {RoleAdmin, ExceptionRevoke, true}, + } + for _, c := range cases { + if got := grants(c.role, c.perm); got != c.want { + t.Errorf("BuiltInRoles[%s] grants %s = %v, want %v (governance matrix — see docs/engineering/remediation_exception_governance.md)", + c.role, c.perm, got, c.want) + } + } + }) +} + // Suppress unused-import warning in cases where filepath/os are vestigial. var _ = filepath.Join var _ = os.Stat diff --git a/specs/system/rbac.spec.yaml b/specs/system/rbac.spec.yaml index 27bc10ed..a2ea12bd 100644 --- a/specs/system/rbac.spec.yaml +++ b/specs/system/rbac.spec.yaml @@ -67,6 +67,10 @@ spec: description: Codegen output MUST be deterministic (re-running with no changes produces no diff) type: technical enforcement: error + - id: C-08 + description: Built-in role grants MUST preserve separation of duties for the governed lifecycles. A built-in role that can request remediation MUST NOT also hold remediation:approve; the same requester-is-not-approver rule applies to exceptions. The full built-in-role to governed-permission matrix (remediation:* and exception:*) is fixed and documented in docs/engineering/remediation_exception_governance.md; a permissions.yaml edit that changes it is a deliberate governance decision and MUST update that matrix and this spec together. + type: security + enforcement: error acceptance_criteria: - id: AC-01 @@ -124,3 +128,7 @@ spec: - id: AC-16 description: RequirePermission p99 latency < 1µs for the hot path (registry lookup is a map read). priority: medium + - id: AC-17 + description: "Built-in role grants match the remediation/exception governance matrix, locking separation of duties against drift. Remediation: ops_lead grants request/execute/rollback but NOT approve; remediation:approve is held only by security_admin and admin; viewer/auditor cannot request remediation. Exceptions: auditor grants request/approve, ops_lead grants request but NOT approve, revoke is security_admin/admin only. viewer holds only remediation:read/exception:read of the governed categories; admin holds all. Verified against BuiltInRoles so a permissions.yaml edit that breaks the matrix fails the build." + priority: critical + references_constraints: [C-08]