feat(remediation): free-core execution engine (Fix + rollback, kensa v0.5.1)#601
Merged
Conversation
Scope the Phase 7 remediation feature as an open-core split: the free AGPL core is the see-and-govern loop (view findings + projected lift, request, approve, view signed transaction history); the OpenWatch+ licensed side is the act of mutating a host (remediation_execution) and the fleet auto-remediation engine (remediation_auto). Ratified: auto-remediation is a licensed feature.
OpenWatch Core (free, AGPLv3) half of Phase 7 remediation: a see-and-govern loop over failing rules. View what is remediable + the projected per-framework compliance lift, request a fix, route it through approval. The free path never contacts a host or mutates host_rule_state/transactions; the act of mutating a host stays behind the OpenWatch+ remediation_execution license. Backend: - Migration 0037: remediation_requests + remediation_transactions (per-step Kensa journal, written only by the licensed execute path). - internal/remediation: Request/Approve/Reject (forward-only state machine, FOR UPDATE transitions, separation of duties, one-open-per-host+rule) and a read-only ProjectLift (per-framework delta from host_rule_state.framework_refs). - Free endpoints (list/get/request/approve/reject/steps). Act verbs (:dry-run/:execute/:rollback) enforce the dangerous license-gated permission: 403 (RBAC first), 402 free tier, 501 when licensed (body is OpenWatch+). - Spec api-remediation (C-01..C-07, AC-01..AC-06). Frontend: - useHostRemediations hook; per-failing-rule Request-remediation affordance on the Compliance tab (gated on remediation:request, open-state suppressed); Remediation tab read panel (request list, projected lift, approve/reject gated on remediation:approve, transaction-model explainer) with the host- mutating Execute control rendered as a disabled OpenWatch+ upsell. - Spec frontend-remediation-tab (C-01..C-03, AC-01..AC-03). Plan: docs/engineering/remediation_core_plan.md + remediation_licensed_plan.md
…copy - auth-bootstrap dev shim: grant the admin role its remediation:* and exception:* permissions so the per-rule Request/Approve affordances render in dev (the shim faked a stale admin permission set; production derives the full set from the backend). Also adds compliance:read. - Remediation tab: a 409 on approve/reject now surfaces the backend's specific reason. A self-review block (you requested it) reads 'You cannot approve or reject your own request' instead of the misleading 'already changed state'.
Admin user-management on the existing /api/v1/users surface, gated on
admin:user_manage:
- POST /users/{id}:reset-password - set any user's (or one's own) password on
admin authority, no current password required; runs the role-aware policy +
breach screen and revokes the target's sessions.
- POST /users/{id}:disable / :enable - disable sets users.disabled_at
(migration 0038) and revokes the target's sessions; a disabled user cannot
authenticate (login rejected, generic invalid-credentials, audit reason
account_disabled). An admin cannot disable their own account (409).
- Audit: admin.user.{password_reset,disabled,enabled}.
- Frontend: Manage-user modal gains Reset password + Disable/Enable (gated on
admin:user_manage || isAdmin); a Disabled chip on the roster.
- Spec api-users v1.2.0 (C-06..C-08, AC-14..AC-19), frontend-settings v1.10.0.
Note: cookie sessions are cut off immediately via revocation + the login
block; full Bearer-JWT / API-token disable enforcement is a documented
hardening follow-up.
…core Per the product decision: per-rule MANUAL remediation (execute + rollback) is free core (Tier A); bulk and auto-remediation are the OpenWatch+ boundary. - Ungate remediation:execute / remediation:rollback (no longer license_gated:remediation_execution) - the requester can apply their own approved single-rule fix for free. - Grant remediation:execute + remediation:rollback to ops_lead (the requester role) so the Fix button works for the user who raised the request. - Repurpose the remediation_execution feature to gate bulk + auto-remediation only. - Update remediation_core_plan.md boundary table. The execution engine (kensa.Remediate/Rollback wiring + queued worker + Fix button) lands next; the act endpoints' 402 stub + AC-06 test get rewritten with it.
…bled transport) The execution foundation for free-core Tier A: - transport.go: TransportFactory.Apply makes the SSH transport report ControlChannelSensitive()=true so the Kensa engine permits APPLY (host mutation). Scans leave it false (read-only) - the load-bearing gate that keeps a scan connection from ever changing a host. - remediatefunc.go: NewProductionRemediateFunc composes the FULL Kensa via pkg/kensa.DefaultWithTransportFactory (engine + SQLite txn store for rollback pre-state + signer) over our apply-enabled, credential-resolved transport; returns Remediate + Rollback closures that map kensa TransactionResult / RollbackResult into OpenWatch journal shapes. - executor.go: Remediate(host, rule) + Rollback(host, txnID) share Run's per-host inFlight guard (a host is never scanned + remediated at once).
…cked, graceful)
Completes free-core Tier A: an approved single-rule remediation is applied via
a queued worker, with a Fix button on the approved request and Rollback on an
executed one.
- internal/worker: RemediationWorker mirrors ScanWorker — a 'remediation' job
type (HMAC-signed), one dispatcher routing by JobType. Drives approved ->
executing -> executed|failed (and executed -> rolled_back), writes the
remediation_transactions journal, and on a committed execute flips that one
rule to pass in host_rule_state so the score moves.
- internal/remediation/execution.go: MarkExecuting / RecordExecution /
rollback transitions; surfaces the failure reason in review_note.
- cmd/openwatch: wire NewProductionRemediateFunc + executor.WithRemediateFunc
(serve + worker), OPENWATCH_KENSA_STORE_PATH for the SQLite rollback log.
- Real :execute/:reject handlers (free, enqueue, 202/409); openapi act
endpoints corrected to the free-core contract (no 402); the Stage-0 RBAC
demo re-pointed to premium_diagnostics (system-rbac AC-10).
- Frontend: lifecycle-aware Fix / Applying.../ Fixed+Rollback / Failed(reason);
the upsell now advertises bulk/auto as OpenWatch+.
KNOWN LIMITATION (Kensa-side): kensa v0.5.0 keeps its apply-mechanism handlers
in internal/handlers/* (blank-imported only inside the kensa module), and
exposes no public handler-registration package, so an external consumer cannot
register them and Kensa.Remediate fails preflight ("mechanism X is not
registered"). A live test confirmed the request flows end to end into Kensa's
engine; the failure is BEFORE any apply (no host changed). Surfaced gracefully
as "Remediation engine unavailable in this build…". Lifts with a kensa release
that exposes a public handler bundle; then OpenWatch needs only a version bump.
See docs/engineering/remediation_core_plan.md.
…eal host kensa v0.5.1 ships the public pkg/kensa/handlers bundle (auto-registered by DefaultWithTransportFactory), fixing kensa #94. Single-rule remediation now runs end to end: a live test applied cron-d-permissions on a test host (/etc/cron.d 755->700, committed, rule flipped to pass), then rolled back to 755. OpenWatch needed only the version bump. - go.mod: kensa v0.5.0 -> v0.5.1; KensaModuleVersion + kensa-executor spec context synced (system-kensa-executor AC-10). - friendlyTxnErr kept as defense-in-depth (de-pinned from v0.5.0). - remediation_core_plan.md execution-status note updated to live/working.
Pick up GA-readiness (#602): Specter 100% gate + structural gate, hardened CI/release, SPA test fixture, packaging-test opt-in. Brings the full remediation stack (governance + admin + execution) current with main for a single consolidated landing.
…tion:execute remediation:execute was moved to free core (single-rule manual execute; only bulk/auto is licensed via license.EnforceFeature(remediation_execution)), so LicenseGate(remediation:execute) correctly returns "". The system-rbac AC-07 test still expected the old gating — update it to assert free-core and keep audit:export as the gated example. (Slipped through because #601 targeted feat/admin-user-mgmt, so go-ci — which only runs on PRs to main — never ran.)
TestAPI_AdminDisableEnable carried // @ac AC-16/17/18 (structural coverage passed) but ran one t.Run("api-users/AC-16") covering all three — so the outcome gate, which credits per-AC t.Run tokens, only saw AC-16 and left api-users at 89% under the 100% gate. Hoist the shared setup and split into ordered AC-16/AC-17/AC-18 subtests (no behavior change; subtests run in order so enable still observes disable). Same slip-through cause as the AC-07 fix: #600 targeted feat/remediation-governance, so go-ci never ran on it.
This was referenced Jun 19, 2026
remyluslosius
added a commit
that referenced
this pull request
Jun 20, 2026
… 110) (#610) - CLAUDE.md: Last Updated 2026-06-20; Remediation row -> Complete (#601/#606/#607); scanning-status note -> v0.2.0-rc.11 incl. free-core remediation; spec count 108 -> 110 - BACKLOG.md: drop done rows (Remediation tab, specter 100%-all-tiers, -p 1 -> -p 4) - scan_remaining_work.md: Phase 7 first-slice shipped banner; remaining = licensed track - SESSION_LOG.md: 2026-06-20 entry (rc.11 cut, bundle mechanics, gotchas)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Remediation execution (free core): the Fix button applies an approved fix
Completes Phase 7 Tier A. An approved single-rule remediation is applied to the host via a queued worker (Kensa Capture/Apply/Validate/Commit over SSH), with rollback. The requester gets a Fix button on their approved request; an executed one gets Roll back.
Free/paid line moved: per-rule manual execute + rollback are now free core (the requester applies their own approved fix); bulk and auto remediation are the OpenWatch+ boundary (
remediation_executionrepurposed).internal/kensa: apply-enabled SSH transport (ControlChannelSensitivegate — scans stay read-only),kensa.Remediate/Rollbackover our credential-resolved transport + SQLite rollback log; executor methods share the per-host inFlight guard.internal/worker:RemediationWorkermirrorsScanWorker(aremediationjob type, one dispatcher by JobType); on a committed execute it flips that rule to pass inhost_rule_stateso the score moves.:execute/:rollbackhandlers (free, enqueue → 202/409); openapi corrected to the free-core contract.pkg/kensa/handlersbundle (auto-registered byDefaultWithTransportFactory) resolves kensa #94.Live-verified on a real host: an approved
cron-d-permissionsfix applied/etc/cron.d755→700(committed, rule fail→pass, score moved), then rollback restored755.Stacked PRs: this is 3 of 3 (base:
feat/admin-user-mgmt). Retarget tomainas the predecessors merge.