Skip to content

docs(governance): remediation approval ADR + role matrix + RBAC drift-lock#604

Merged
remyluslosius merged 2 commits into
mainfrom
docs/remediation-governance
Jun 20, 2026
Merged

docs(governance): remediation approval ADR + role matrix + RBAC drift-lock#604
remyluslosius merged 2 commits into
mainfrom
docs/remediation-governance

Conversation

@remyluslosius

Copy link
Copy Markdown
Contributor

Remediation/exception governance: ADR, role matrix, doc fixes, and a drift-lock test

Addresses the governance questions raised against the just-landed remediation feature: what happens to the model for a one-man shop, who can request vs. approve, and making sure the role grants are documented and enforced. Important for the licensed OpenWatch+ track.

What's here

ADR — docs/engineering/remediation_governance_adr.md (A-keep, Accepted)
Keep the governance machinery; make the human approval step conditional: free-core single-rule remediation needs no separate approval, the licensed bulk/auto track keeps request → approve with the self-review separation-of-duties guard. Records the one-man-shop decision (a lone operator can't self-approve today; the free tier drops the gate rather than relaxing self-review).

Role matrix — docs/engineering/remediation_exception_governance.md (new)
The accurate per-role grants for remediation + exceptions (request / approve / execute / rollback / revoke), the no-bypass self-review rule, and the approver_roles clarification.

Stale-doc fixes

  • HOSTS_AND_REMEDIATION.md: replaced Python-era role names (SUPER_ADMIN, scan:approve/rollback) and the false "executes automatically after approval" claim with the real Go RBAC + operator-initiated Fix.
  • rbac_registry.md: remediation:execute is free core now (was shown license-gated); reconciled the approver_roles: [security_admin, ops_lead] fiction — no approvals policy is configured, and the enforced gate is the remediation:approve / exception:approve permission (ops_lead does not hold it).

Drift-lock test — system-rbac C-08 / AC-17
New spec constraint + TestGovernanceRoleMatrix asserting built-in role grants match the matrix against BuiltInRoles. A permissions.yaml edit that breaks separation of duties (e.g. granting ops_lead remediation:approve, or dropping auditor's exception:approve) now fails the build. All 110 specs at 100% structural coverage; the test passes.

The matrix being locked

Action viewer auditor ops_lead security_admin admin
remediation: request
remediation: approve
remediation: execute/rollback
exception: request
exception: approve
exception: revoke

Docs-and-test only; no behavior change. The conditional-approval implementation in the ADR is a separate follow-up.

…BAC docs

- ADR (A-keep): keep the governance machinery, make human approval conditional
  - free-core single-rule remediation needs none, licensed bulk/auto keeps
  request->approve with self-review. Records the one-man-shop decision.
- New role matrix doc: remediation + exception request/approve/execute grants
  per built-in role, plus the no-bypass self-review rule.
- Fix HOSTS_AND_REMEDIATION.md: Python-era role names (SUPER_ADMIN, scan:*) and
  the wrong 'executes automatically' claim -> real Go RBAC + manual Fix.
- Fix rbac_registry.md: remediation:execute is free core now (not license-gated);
  reconcile the approver_roles fiction (no approvals policy is configured; the
  enforced gate is the remediation:approve/exception:approve permission).
Add a spec constraint + test asserting built-in role grants match the
remediation/exception governance matrix, so a permissions.yaml edit that breaks
separation of duties (e.g. granting ops_lead remediation:approve, or dropping
auditor's exception:approve) fails the build. Verified against BuiltInRoles.
@github-actions github-actions Bot added size/L documentation Improvements or additions to documentation security labels Jun 19, 2026
@remyluslosius remyluslosius merged commit 6ef583b into main Jun 20, 2026
13 checks passed
@remyluslosius remyluslosius deleted the docs/remediation-governance branch June 20, 2026 03:48
remyluslosius added a commit that referenced this pull request Jun 20, 2026
… + auth fix) (#609)

* fix(auth): return 401 for anonymous callers on protected endpoints

An anonymous request (no credentials, or a session cookie that expired in the
browser and is no longer sent) to a protected endpoint now returns 401
auth.required instead of 403. The SPA redirects to login on a 401, so an
expired session surfaces as a clean re-login prompt rather than a dead-end
'failed to load'. An authenticated caller whose role lacks the permission still
gets 403 authz.permission_denied; the audit event is unchanged for both.

* test+spec: update anonymous-denial contract to 401 across specs/tests

The 12 specs/tests that strictly asserted anonymous -> 403 now assert 401
auth.required (alerts, audit-events-query, fleet-observability, host-system-info,
os-intelligence, system-rbac AC-09/AC-15, system/fleet connectivity, discovery/
intelligence config). Authenticated-but-unauthorized -> 403 language preserved.
Specs that already said '401/403' are unchanged.

* feat(remediation): conditional approval (A-keep) — free-core auto-approves

Implements the A-keep ADR: free-core single-rule remediation no longer requires
a separate human approval, so a single operator can request and Fix a finding
directly (removing the self-review deadlock). The approve/reject flow with
separation of duties is retained for the licensed bulk/auto track.

- Request(...requiresApproval bool): false (free core) inserts an 'approved'
  row directly (reviewed_at set, reviewed_by NULL, auto-approved review_note)
  and emits remediation.requested + remediation.approved; true (licensed
  bulk/auto) inserts 'pending_approval' and goes through Approve/Reject.
- The single-rule request handler passes false.
- Tests: AC-01 covers auto-approve + the approval-required path; the HTTP
  AC-05/AC-06 approve and pending-execute paths seed a pending_approval request
  (the free-core POST auto-approves). Frontend unchanged (the hook already
  renders approved -> Fix and keeps the pending_approval/approve UI for the
  licensed track).

Note: the ADR + governance docs land in #604; their status flips to
'implemented' once both merge.

* fix(remediation): serialize concurrent fixes on a host instead of failing

Clicking Fix on several findings on the same host enqueued multiple jobs that
ran concurrently; the second collided on the per-host SSH guard (ErrHostBusy)
and the remediation worker marked it failed. Now the worker treats a busy host
as transient: it backs off and requeues (queue.EnqueueAfter) until the host is
free, so the fixes apply one at a time.

- queue: add a delayed-visibility column (migration 0039 available_at) +
  EnqueueAfter(delay); Dequeue skips not-yet-available rows so the requeue does
  not busy-loop the drain (job-queue AC-13).
- remediation: HostHasExecuting + RevertToApproved primitives (api-remediation
  AC-08); worker processExecute/processRollback pre-check the host and revert+
  requeue on an ErrHostBusy race instead of failing the request.

* feat(frontend): live remediation status via remediation.completed SSE

The Remediation tab required a manual refresh to see a fix finish. The worker
already publishes remediation.completed on the event bus; useLiveEvents now
subscribes to it and invalidates ['host', id, 'remediations'] + ['host', id],
so the tab and the compliance score update automatically when a queued fix or
rollback reaches its terminal state. frontend-live-events AC-09 + AC-01 (topic
set grows to 6).

* chore(release): bump Kensa to v0.5.2 and prepare 0.2.0-rc.11

Kensa v0.5.2 is a PATCH release with a frozen api/ surface, so OpenWatch's
library integration is unchanged. Its notable fix corrects a config_value
matching bug ('" "' delimiter now matches any whitespace incl. TAB), which
removes a class of false FAILs on TAB-delimited rules (RHEL login.defs) —
affected hosts may see their compliance score improve. The jsonl skipped-vs-
error fix (kensa#104) is confirmed no-impact for the library path (issue #603).

- go.mod kensa v0.5.1 -> v0.5.2; KensaModuleVersion + kensa-executor spec pin
  updated to match (version-pin tests pass; corpus stays at 539 rules, the
  variable-catalog AC still sees exactly 3 placeholders).
- version.env -> 0.2.0-rc.11; README + operator guides + CHANGELOG cut a
  0.2.0-rc.11 section.

* docs(changelog): reconcile rc.11 section (bundle #604-#608)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation security size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant