Skip to content

feat(remediation): free-core execution engine (Fix + rollback, kensa v0.5.1)#601

Merged
remyluslosius merged 11 commits into
mainfrom
feat/remediation-core
Jun 19, 2026
Merged

feat(remediation): free-core execution engine (Fix + rollback, kensa v0.5.1)#601
remyluslosius merged 11 commits into
mainfrom
feat/remediation-core

Conversation

@remyluslosius

Copy link
Copy Markdown
Contributor

Remediation execution (free core): the Fix button applies an approved fix

Completes Phase 7 Tier A. An approved single-rule remediation is applied to the host via a queued worker (Kensa Capture/Apply/Validate/Commit over SSH), with rollback. The requester gets a Fix button on their approved request; an executed one gets Roll back.

Free/paid line moved: per-rule manual execute + rollback are now free core (the requester applies their own approved fix); bulk and auto remediation are the OpenWatch+ boundary (remediation_execution repurposed).

  • internal/kensa: apply-enabled SSH transport (ControlChannelSensitive gate — scans stay read-only), kensa.Remediate/Rollback over our credential-resolved transport + SQLite rollback log; executor methods share the per-host inFlight guard.
  • internal/worker: RemediationWorker mirrors ScanWorker (a remediation job type, one dispatcher by JobType); on a committed execute it flips that rule to pass in host_rule_state so the score moves.
  • Real :execute/:rollback handlers (free, enqueue → 202/409); openapi corrected to the free-core contract.
  • Frontend: lifecycle-aware Fix / Applying… / Fixed + Roll back / Failed(reason); the upsell now advertises bulk/auto as OpenWatch+.
  • kensa v0.5.1: bumped — the public pkg/kensa/handlers bundle (auto-registered by DefaultWithTransportFactory) resolves kensa #94.

Live-verified on a real host: an approved cron-d-permissions fix applied /etc/cron.d 755700 (committed, rule fail→pass, score moved), then rollback restored 755.

Stacked PRs: this is 3 of 3 (base: feat/admin-user-mgmt). Retarget to main as the predecessors merge.

Scope the Phase 7 remediation feature as an open-core split: the free
AGPL core is the see-and-govern loop (view findings + projected lift,
request, approve, view signed transaction history); the OpenWatch+
licensed side is the act of mutating a host (remediation_execution) and
the fleet auto-remediation engine (remediation_auto).

Ratified: auto-remediation is a licensed feature.
OpenWatch Core (free, AGPLv3) half of Phase 7 remediation: a see-and-govern
loop over failing rules. View what is remediable + the projected per-framework
compliance lift, request a fix, route it through approval. The free path never
contacts a host or mutates host_rule_state/transactions; the act of mutating a
host stays behind the OpenWatch+ remediation_execution license.

Backend:
- Migration 0037: remediation_requests + remediation_transactions (per-step
  Kensa journal, written only by the licensed execute path).
- internal/remediation: Request/Approve/Reject (forward-only state machine,
  FOR UPDATE transitions, separation of duties, one-open-per-host+rule) and a
  read-only ProjectLift (per-framework delta from host_rule_state.framework_refs).
- Free endpoints (list/get/request/approve/reject/steps). Act verbs
  (:dry-run/:execute/:rollback) enforce the dangerous license-gated permission:
  403 (RBAC first), 402 free tier, 501 when licensed (body is OpenWatch+).
- Spec api-remediation (C-01..C-07, AC-01..AC-06).

Frontend:
- useHostRemediations hook; per-failing-rule Request-remediation affordance on
  the Compliance tab (gated on remediation:request, open-state suppressed);
  Remediation tab read panel (request list, projected lift, approve/reject
  gated on remediation:approve, transaction-model explainer) with the host-
  mutating Execute control rendered as a disabled OpenWatch+ upsell.
- Spec frontend-remediation-tab (C-01..C-03, AC-01..AC-03).

Plan: docs/engineering/remediation_core_plan.md + remediation_licensed_plan.md
…copy

- auth-bootstrap dev shim: grant the admin role its remediation:* and
  exception:* permissions so the per-rule Request/Approve affordances render
  in dev (the shim faked a stale admin permission set; production derives the
  full set from the backend). Also adds compliance:read.
- Remediation tab: a 409 on approve/reject now surfaces the backend's specific
  reason. A self-review block (you requested it) reads 'You cannot approve or
  reject your own request' instead of the misleading 'already changed state'.
Admin user-management on the existing /api/v1/users surface, gated on
admin:user_manage:

- POST /users/{id}:reset-password - set any user's (or one's own) password on
  admin authority, no current password required; runs the role-aware policy +
  breach screen and revokes the target's sessions.
- POST /users/{id}:disable / :enable - disable sets users.disabled_at
  (migration 0038) and revokes the target's sessions; a disabled user cannot
  authenticate (login rejected, generic invalid-credentials, audit reason
  account_disabled). An admin cannot disable their own account (409).
- Audit: admin.user.{password_reset,disabled,enabled}.
- Frontend: Manage-user modal gains Reset password + Disable/Enable (gated on
  admin:user_manage || isAdmin); a Disabled chip on the roster.
- Spec api-users v1.2.0 (C-06..C-08, AC-14..AC-19), frontend-settings v1.10.0.

Note: cookie sessions are cut off immediately via revocation + the login
block; full Bearer-JWT / API-token disable enforcement is a documented
hardening follow-up.
…core

Per the product decision: per-rule MANUAL remediation (execute + rollback) is
free core (Tier A); bulk and auto-remediation are the OpenWatch+ boundary.

- Ungate remediation:execute / remediation:rollback (no longer
  license_gated:remediation_execution) - the requester can apply their own
  approved single-rule fix for free.
- Grant remediation:execute + remediation:rollback to ops_lead (the requester
  role) so the Fix button works for the user who raised the request.
- Repurpose the remediation_execution feature to gate bulk + auto-remediation
  only.
- Update remediation_core_plan.md boundary table.

The execution engine (kensa.Remediate/Rollback wiring + queued worker + Fix
button) lands next; the act endpoints' 402 stub + AC-06 test get rewritten
with it.
…bled transport)

The execution foundation for free-core Tier A:
- transport.go: TransportFactory.Apply makes the SSH transport report
  ControlChannelSensitive()=true so the Kensa engine permits APPLY (host
  mutation). Scans leave it false (read-only) - the load-bearing gate that
  keeps a scan connection from ever changing a host.
- remediatefunc.go: NewProductionRemediateFunc composes the FULL Kensa via
  pkg/kensa.DefaultWithTransportFactory (engine + SQLite txn store for rollback
  pre-state + signer) over our apply-enabled, credential-resolved transport;
  returns Remediate + Rollback closures that map kensa TransactionResult /
  RollbackResult into OpenWatch journal shapes.
- executor.go: Remediate(host, rule) + Rollback(host, txnID) share Run's
  per-host inFlight guard (a host is never scanned + remediated at once).
…cked, graceful)

Completes free-core Tier A: an approved single-rule remediation is applied via
a queued worker, with a Fix button on the approved request and Rollback on an
executed one.

- internal/worker: RemediationWorker mirrors ScanWorker — a 'remediation' job
  type (HMAC-signed), one dispatcher routing by JobType. Drives approved ->
  executing -> executed|failed (and executed -> rolled_back), writes the
  remediation_transactions journal, and on a committed execute flips that one
  rule to pass in host_rule_state so the score moves.
- internal/remediation/execution.go: MarkExecuting / RecordExecution /
  rollback transitions; surfaces the failure reason in review_note.
- cmd/openwatch: wire NewProductionRemediateFunc + executor.WithRemediateFunc
  (serve + worker), OPENWATCH_KENSA_STORE_PATH for the SQLite rollback log.
- Real :execute/:reject handlers (free, enqueue, 202/409); openapi act
  endpoints corrected to the free-core contract (no 402); the Stage-0 RBAC
  demo re-pointed to premium_diagnostics (system-rbac AC-10).
- Frontend: lifecycle-aware Fix / Applying.../ Fixed+Rollback / Failed(reason);
  the upsell now advertises bulk/auto as OpenWatch+.

KNOWN LIMITATION (Kensa-side): kensa v0.5.0 keeps its apply-mechanism handlers
in internal/handlers/* (blank-imported only inside the kensa module), and
exposes no public handler-registration package, so an external consumer cannot
register them and Kensa.Remediate fails preflight ("mechanism X is not
registered"). A live test confirmed the request flows end to end into Kensa's
engine; the failure is BEFORE any apply (no host changed). Surfaced gracefully
as "Remediation engine unavailable in this build…". Lifts with a kensa release
that exposes a public handler bundle; then OpenWatch needs only a version bump.
See docs/engineering/remediation_core_plan.md.
…eal host

kensa v0.5.1 ships the public pkg/kensa/handlers bundle (auto-registered by
DefaultWithTransportFactory), fixing kensa #94. Single-rule remediation now
runs end to end: a live test applied cron-d-permissions on a test host
(/etc/cron.d 755->700, committed, rule flipped to pass), then rolled back to
755. OpenWatch needed only the version bump.

- go.mod: kensa v0.5.0 -> v0.5.1; KensaModuleVersion + kensa-executor spec
  context synced (system-kensa-executor AC-10).
- friendlyTxnErr kept as defense-in-depth (de-pinned from v0.5.0).
- remediation_core_plan.md execution-status note updated to live/working.
Pick up GA-readiness (#602): Specter 100% gate + structural gate, hardened
CI/release, SPA test fixture, packaging-test opt-in. Brings the full
remediation stack (governance + admin + execution) current with main for a
single consolidated landing.
@remyluslosius remyluslosius changed the base branch from feat/admin-user-mgmt to main June 19, 2026 16:48
@remyluslosius remyluslosius reopened this Jun 19, 2026
…tion:execute

remediation:execute was moved to free core (single-rule manual execute; only
bulk/auto is licensed via license.EnforceFeature(remediation_execution)), so
LicenseGate(remediation:execute) correctly returns "". The system-rbac AC-07
test still expected the old gating — update it to assert free-core and keep
audit:export as the gated example. (Slipped through because #601 targeted
feat/admin-user-mgmt, so go-ci — which only runs on PRs to main — never ran.)
TestAPI_AdminDisableEnable carried // @ac AC-16/17/18 (structural coverage
passed) but ran one t.Run("api-users/AC-16") covering all three — so the
outcome gate, which credits per-AC t.Run tokens, only saw AC-16 and left
api-users at 89% under the 100% gate. Hoist the shared setup and split into
ordered AC-16/AC-17/AC-18 subtests (no behavior change; subtests run in order
so enable still observes disable). Same slip-through cause as the AC-07 fix:
#600 targeted feat/remediation-governance, so go-ci never ran on it.
@remyluslosius remyluslosius merged commit df024ee into main Jun 19, 2026
13 checks passed
@remyluslosius remyluslosius deleted the feat/remediation-core branch June 19, 2026 18:06
remyluslosius added a commit that referenced this pull request Jun 20, 2026
… 110) (#610)

- CLAUDE.md: Last Updated 2026-06-20; Remediation row -> Complete (#601/#606/#607);
  scanning-status note -> v0.2.0-rc.11 incl. free-core remediation; spec count 108 -> 110
- BACKLOG.md: drop done rows (Remediation tab, specter 100%-all-tiers, -p 1 -> -p 4)
- scan_remaining_work.md: Phase 7 first-slice shipped banner; remaining = licensed track
- SESSION_LOG.md: 2026-06-20 entry (rc.11 cut, bundle mechanics, gotchas)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pkg/kensa.Remediate unusable by external consumers: apply handlers register only via internal blank imports

1 participant