Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/engineering/rbac_registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ In a string-literal world, all three layers refer to permissions by free-form st

- The spec says `host:read`. The handler checks `hosts:read`. The role grants `host.read`. All three are slightly different. Tests pass because fixtures grant superusers everything. Production fails when a real `auditor` role tries to list hosts and gets `403`.
- A new dangerous permission gets added to a handler but never to the registry. There is no audit hook that says "this permission was added"; reviewers don't know the surface grew.
- License-gated permissions (`remediation:execute` requires OpenWatch+) are gated in some places and not others — gating is a per-handler decoration that goes stale.
- License-gated permissions (`audit:export` requires OpenWatch+) are gated in some places and not others — gating is a per-handler decoration that goes stale.

A registry collapses all three layers onto one source. The OpenAPI validator, the handler middleware, and the role definitions all read from the same file. Misspell a permission anywhere → build fails. License gating co-locates with permission definition → middleware enforces both in one pass. Custom roles created at runtime validate every permission against the registry → no silent grant of a permission that doesn't exist.

Expand Down Expand Up @@ -132,7 +132,8 @@ type PermissionMeta struct {

var Permissions = map[Permission]PermissionMeta{
HostRead: {Category: "host", Description: "View host details...", Dangerous: false, LicenseGated: ""},
RemediationExecute: {Category: "remediation", Description: "Execute...", Dangerous: true, LicenseGated: "remediation_execution"},
RemediationExecute: {Category: "remediation", Description: "Execute...", Dangerous: true, LicenseGated: ""}, // free core (single-rule); bulk/auto gated at the handler
AuditExport: {Category: "audit", Description: "Export audit data...", Dangerous: false, LicenseGated: "audit_export"},
// ...
}

Expand Down Expand Up @@ -454,7 +455,7 @@ A custom role with `remediation:execute` is **allowed** even if the license does

### 8.5 Custom roles and policy cross-validation

The `approvals` policy declares `approver_roles: [security_admin, ops_lead]`. The policy loader validates these against the active role set (built-in + custom). An unknown role → `policy.invalid` audit event; previous policy state retained. Custom roles can therefore appear in approver lists once they exist.
The policies-as-data framework registers an `approvals` policy *type*, but **no `approvals` policy is currently configured** and no code reads `approver_roles`. The enforced approval gate today is the `remediation:approve` / `exception:approve` permission alone (held by `security_admin` and `admin`; see [remediation_exception_governance.md](remediation_exception_governance.md)). If an `approvals` policy is added later, the loader cross-validates its `approver_roles` against the active role set (built-in + custom; an unknown role → `policy.invalid` audit event, previous policy state retained). Those `approver_roles` MUST be a subset of the roles that hold the matching `*:approve` permission — otherwise the policy names a role that cannot actually approve.

---

Expand Down
87 changes: 87 additions & 0 deletions docs/engineering/remediation_exception_governance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Remediation & Exception Governance — Role Matrix

> **Status:** Current as of 2026-06-19.
> **Authority:** `auth/permissions.yaml` is the source of truth for who can do
> what (codegen produces `internal/auth/permissions.gen.go` / `roles.gen.go`).
> This document is a human-readable view of it; if the two disagree, the YAML
> wins and this doc is stale.
> **Audience:** Operators deciding how to assign roles, and engineers working on
> the remediation / exception lifecycles.

This is the answer to "which role can **request**, **approve/reject**, and
**execute** remediation and exceptions." Two governed lifecycles share the same
separation-of-duties rule.

---

## Built-in roles (least → most privilege)

`viewer` → `auditor` → `ops_lead` → `security_admin` → `admin`

`admin` holds the `*` wildcard (every permission). Custom roles may be created
and are validated against the permission registry.

## Remediation

| Action | permission | viewer | auditor | ops_lead | security_admin | admin |
|--------|------------|:------:|:-------:|:--------:|:--------------:|:-----:|
| View requests/history | `remediation:read` | ✓ | ✓ | ✓ | ✓ | ✓ |
| **Request** | `remediation:request` | | | ✓ | ✓ | ✓ |
| **Approve / Reject** | `remediation:approve` | | | | ✓ | ✓ |
| Execute (Fix) | `remediation:execute` | | | ✓ | ✓ | ✓ |
| Rollback | `remediation:rollback` | | | ✓ | ✓ | ✓ |

Note the deliberate asymmetry: **`ops_lead` can request and execute remediation
but cannot approve it** — approval needs `security_admin` or `admin`.

`remediation:execute` and `remediation:rollback` are **free core** (single-rule
manual). Bulk and automated remediation is the licensed track, gated separately
at the handler via `license.EnforceFeature(remediation_execution)` — not via a
permission.

## Exceptions

| Action | permission | viewer | auditor | ops_lead | security_admin | admin |
|--------|------------|:------:|:-------:|:--------:|:--------------:|:-----:|
| View | `exception:read` | ✓ | ✓ | ✓ | ✓ | ✓ |
| **Request** | `exception:request` | | ✓ | ✓ | ✓ | ✓ |
| Comment | `exception:comment` | | ✓ | ✓ | ✓ | ✓ |
| **Approve** | `exception:approve` | | ✓ | | ✓ | ✓ |
| Revoke | `exception:revoke` | | | | ✓ | ✓ |

Note the asymmetry mirrors remediation in reverse: **`auditor` can approve
exceptions but not remediation**, and **`ops_lead` can request exceptions but not
approve them**.

## Separation of duties (self-review rule)

For **both** lifecycles, the reviewer must differ from the requester. Approving
or rejecting your own request is refused with **409 `self_review`** — and there
is **no bypass**: not for `admin`, and there is no config flag.

- Remediation: `internal/remediation/service.go` (`ErrSelfReview`)
- Exceptions: `internal/exception/service.go`

**One-operator note:** because of this rule, a single-operator workspace cannot
complete the request → approve flow today. The resolution for the free tier is
[Remediation Approval Governance (ADR)](remediation_governance_adr.md): free-core
single-rule remediation will not require a separate approval; the approval gate
(with self-review) is reserved for the licensed bulk/auto track. Until that lands,
two distinct users are required to approve any remediation/exception.

## On `approver_roles` policies

The policies-as-data framework registers an `approvals` policy *type*
(`internal/policy/types.go`), but **no `approvals` policy is currently
configured**, and no code reads `approver_roles`. The enforced approval gate
today is purely the `remediation:approve` / `exception:approve` **permission**
above. If an `approvals` policy is ever added, its `approver_roles` must be a
subset of the roles that hold the corresponding `*:approve` permission, or the
policy can name a role that cannot actually approve.

## References

- Source of truth: `auth/permissions.yaml`
- RBAC registry: [rbac_registry.md](rbac_registry.md)
- Decision record: [remediation_governance_adr.md](remediation_governance_adr.md)
- Operator guide: [../guides/HOSTS_AND_REMEDIATION.md](../guides/HOSTS_AND_REMEDIATION.md)
112 changes: 112 additions & 0 deletions docs/engineering/remediation_governance_adr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Remediation Approval Governance (ADR)

> **Status:** Accepted 2026-06-19. Implementation pending (the conditional-approval
> path is not yet built; today every remediation request goes through
> request → approve → execute).
> **Authority:** This document is the decision record for *when* a remediation
> requires human approval. The role/permission matrix that backs it is
> [remediation_exception_governance.md](remediation_exception_governance.md);
> the permission source of truth is `auth/permissions.yaml`.
> **Audience:** Anyone implementing or specing the remediation lifecycle, and
> anyone scoping the OpenWatch+ licensed remediation track.

---

## Context

Remediation is open-core. The boundary, decided separately, is:

- **Free core:** per-rule **manual** remediation — an operator fixes one finding
on one host, and can roll it back.
- **OpenWatch+ (licensed):** **bulk and automated** remediation — apply many
rules / fleet-wide, and policy-driven auto-remediation. Gated at the handler
via `license.EnforceFeature(remediation_execution)`.

The shipped lifecycle is a single state machine with a human approval gate:

```
Request → pending_approval → (Approve) → approved → (MarkExecuting) → executing → executed → rolled_back
│ (failed, dry_run_complete are side branches)
└── (Reject) → rejected
```

Approval enforces **separation of duties**: the reviewer must differ from the
requester. This is hard-coded with no bypass (`internal/remediation/service.go`,
`if requestedBy == reviewedBy { return ErrSelfReview }`) and the execute handler
refuses anything not in `approved` state
(`internal/server/remediation_handlers.go`, 409 `only an approved request can be
executed`).

**The problem this ADR resolves:** that gate makes the product unusable for a
single operator. A lone administrator can request but can never approve their
own request (409 `self_review`, even as `admin`), so they never reach Fix. The
same applies to compliance exceptions. Requiring approval here also buys *no*
separation of duties — the requester and the approver would be the same human.

## Decision

**Keep the governance machinery; make the human approval step *conditional* on
the remediation track ("A-keep").**

- **Free-core, single-rule manual remediation does not require a separate human
approval.** A free-core request reaches an executable state directly (auto-approved
on creation, or a `ready` state the execute handler also accepts). The operator
clicks **Fix**; there is no `pending_approval` interstitial.
- **The licensed bulk / auto-remediation track keeps the full request → approve →
execute flow with the self-review separation-of-duties guard.** This is where an
approval gate carries real risk-management value (many rules, fleet-wide, or
unattended), and where multiple roles realistically exist.

We do **not** delete the governance code. It is exactly the machinery the
licensed track needs.

## Consequences

**Stays, unchanged:**

- The `remediation_requests` + `remediation_transactions` tables (migration 0037)
— every request and its transactions are still recorded for audit, history, and
rollback, approval or not.
- The execution half of the state machine (`executing → executed → rolled_back →
failed`, `dry_run_complete`), `MarkExecuting`, `RecordExecution`, the
`RemediationWorker`, the execute/rollback handlers, the
`remediation:execute` / `remediation:rollback` permissions, and the frontend
Fix/rollback UI.

**Stays, but becomes conditional — reserved for the licensed track:**

- `Request` / `Approve` / `Reject`, the self-review guard, the
`pending_approval` / `approved` / `rejected` states, and the
`remediation:request` / `remediation:approve` permissions.

**Changes (small, surgical):**

1. A free-core single-rule request reaches an executable state without a human
approval transition.
2. UI: the Fix button is live immediately for free-core (no pending-approval step).
3. Specs/tests: the `api-remediation` ACs that assert "must be approved before
execute" split into free-core (no approval) vs. licensed (approval + self-review).
The self-review test stays, retargeted to the licensed path.

**Accepted trade-off:** until the bulk/auto track ships, the approve/reject/
self-review code is present-but-dormant (exercised only by its tests). We accept
carrying it rather than deleting working, tested code and rebuilding it later.

## Alternatives considered

- **Single-operator mode (config flag relaxing self-review).** Viable, but adds a
config surface and an "I approved my own request" audit nuance. The conditional
split achieves the same outcome for the free tier without a flag.
- **Require a second approver account.** Rejected as the *only* answer: it is poor
UX and, since the same human clicks both, delivers no real separation of duties.
- **A-defer (strip governance now, rebuild for the licensed track).** Rejected:
throws away working, tested, just-merged code to rebuild the same machinery later.

## References

- Role/permission matrix + self-review rule:
[remediation_exception_governance.md](remediation_exception_governance.md)
- Permission source of truth: `auth/permissions.yaml`
- RBAC registry: [rbac_registry.md](rbac_registry.md)
- Lifecycle code: `internal/remediation/`, `internal/server/remediation_handlers.go`
- Spec: `specs/api/remediation.spec.yaml`
46 changes: 30 additions & 16 deletions docs/guides/HOSTS_AND_REMEDIATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,12 +218,19 @@ on target hosts.

![Remediation confirmation dialog](../images/hosts/confirm-remediation.png)

For organizations that require approval workflows:
For organizations that require an approval step:

1. Select findings and click **Request Remediation**.
2. Enter a justification for the changes.
3. An admin reviews and approves the request.
4. Once approved, the remediation executes automatically.
1. A user with `remediation:request` (`ops_lead`, `security_admin`, or `admin`)
selects findings, clicks **Request Remediation**, and enters a justification.
2. A **different** user with `remediation:approve` (`security_admin` or `admin`)
reviews and approves or rejects it. You cannot approve your own request
(separation of duties; self-approval returns `409 self_review`).
3. Once approved, a user with `remediation:execute` clicks **Fix** to apply the
change. Execution is operator-initiated, not automatic.

See [Remediation & Exception Governance](../engineering/remediation_exception_governance.md)
for the full role matrix. Single-operator workspaces cannot self-approve today;
see the [governance ADR](../engineering/remediation_governance_adr.md).

---

Expand Down Expand Up @@ -258,7 +265,8 @@ If a remediation causes problems, you can roll back to the pre-change state.

![Rollback confirmation](../images/hosts/rollback.png)

Rollback requires SUPER_ADMIN or SECURITY_ADMIN role (scan:rollback permission).
Rollback requires the `remediation:rollback` permission (`ops_lead`,
`security_admin`, or `admin`).

### After Rolling Back

Expand All @@ -269,16 +277,22 @@ returned to its previous state.

## Required Permissions

| Operation | Minimum Role |
|-----------|-------------|
| View hosts | GUEST |
| Add / edit / delete hosts | SECURITY_ANALYST |
| Bulk import / export | SUPER_ADMIN, SECURITY_ADMIN |
| Start remediation | SECURITY_ADMIN (scan:execute) |
| Approve remediation | SUPER_ADMIN (scan:approve) |
| Rollback remediation | SUPER_ADMIN, SECURITY_ADMIN (scan:rollback) |
| View server intelligence | SECURITY_ANALYST |
| Manage host groups | SECURITY_ANALYST |
Built-in roles, least to most privilege: `viewer` → `auditor` → `ops_lead` →
`security_admin` → `admin` (`admin` holds every permission). The permission
source of truth is `auth/permissions.yaml`; see
[Remediation & Exception Governance](../engineering/remediation_exception_governance.md)
for the complete matrix.

| Operation | Permission | Roles that hold it |
|-----------|------------|--------------------|
| View hosts | `host:read` | viewer, auditor, ops_lead, security_admin, admin |
| Add / edit hosts | `host:write` | ops_lead, security_admin, admin |
| Delete hosts | `host:delete` | security_admin, admin |
| Request remediation | `remediation:request` | ops_lead, security_admin, admin |
| Approve / reject remediation | `remediation:approve` | security_admin, admin |
| Execute remediation (Fix) | `remediation:execute` | ops_lead, security_admin, admin |
| Rollback remediation | `remediation:rollback` | ops_lead, security_admin, admin |
| View server intelligence | `host:read` | viewer, auditor, ops_lead, security_admin, admin |

---

Expand Down
65 changes: 65 additions & 0 deletions internal/auth/permissions_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,71 @@ func TestRequirePermission_HotPathLatency(t *testing.T) {
})
}

// @ac AC-17
// AC-17: built-in role grants match the remediation/exception governance matrix
// (docs/engineering/remediation_exception_governance.md). This locks separation
// of duties — a permissions.yaml edit that, e.g., grants ops_lead
// remediation:approve or removes auditor's exception:approve fails the build.
func TestGovernanceRoleMatrix(t *testing.T) {
t.Run("system-rbac/AC-17", func(t *testing.T) {
grants := func(role RoleID, perm Permission) bool {
def, ok := BuiltInRoles[role]
if !ok {
t.Fatalf("unknown built-in role %q", role)
}
for _, p := range def.Permissions {
if p == perm {
return true
}
}
return false
}
cases := []struct {
role RoleID
perm Permission
want bool
}{
// Remediation: ops_lead requests/executes/rolls back but CANNOT
// approve (separation of duties). Approve is security_admin+admin only.
{RoleOpsLead, RemediationRequest, true},
{RoleOpsLead, RemediationExecute, true},
{RoleOpsLead, RemediationRollback, true},
{RoleOpsLead, RemediationApprove, false},
{RoleSecurityAdmin, RemediationApprove, true},
{RoleAdmin, RemediationApprove, true},
{RoleViewer, RemediationRequest, false},
{RoleAuditor, RemediationRequest, false},
{RoleAuditor, RemediationApprove, false},
// security_admin holds remediation:* (wildcard expanded at codegen).
{RoleSecurityAdmin, RemediationRequest, true},
{RoleSecurityAdmin, RemediationExecute, true},
{RoleSecurityAdmin, RemediationRollback, true},
// Exceptions: auditor approves; ops_lead requests but cannot approve;
// revoke is security_admin+admin only.
{RoleAuditor, ExceptionRequest, true},
{RoleAuditor, ExceptionApprove, true},
{RoleOpsLead, ExceptionRequest, true},
{RoleOpsLead, ExceptionApprove, false},
{RoleOpsLead, ExceptionRevoke, false},
{RoleSecurityAdmin, ExceptionApprove, true},
{RoleSecurityAdmin, ExceptionRevoke, true},
{RoleViewer, ExceptionRequest, false},
{RoleViewer, ExceptionApprove, false},
// viewer holds only the read of each governed category; admin holds all.
{RoleViewer, RemediationRead, true},
{RoleViewer, ExceptionRead, true},
{RoleAdmin, RemediationRequest, true},
{RoleAdmin, ExceptionRevoke, true},
}
for _, c := range cases {
if got := grants(c.role, c.perm); got != c.want {
t.Errorf("BuiltInRoles[%s] grants %s = %v, want %v (governance matrix — see docs/engineering/remediation_exception_governance.md)",
c.role, c.perm, got, c.want)
}
}
})
}

// Suppress unused-import warning in cases where filepath/os are vestigial.
var _ = filepath.Join
var _ = os.Stat
Loading
Loading