Container auto-fix: red container → configurable agent session, with safeguards#35
Open
falkoro wants to merge 2 commits into
Open
Container auto-fix: red container → configurable agent session, with safeguards#35falkoro wants to merge 2 commits into
falkoro wants to merge 2 commits into
Conversation
A red (unhealthy/restarting/crashed) container gets a Fix button that spawns a tmux session running a configurable agent, seeded with the container's read-only diagnostics, to attempt a safe remediation while the operator watches. - Configurable "fix agents" registry (label + command), seeded with Claude/Grok/Codex, editable in-app via /api/fix-agents. - Per-host `protected` flag → propose-only mode (agent inspects + prints recommended commands, runs nothing). Advisory (prompt-level). - Danger modal: agent picker + required acknowledgement gate before spawn. - Read-only diagnostic (ps/inspect/logs, no mutating verbs); prompt written to a file and passed quoted; destructive-verb ban baked into the agent prompt. - Fix sessions are dynamic sessions: appear as cards, real close via /api/sessions/remove (kill + drop from dashboard). - All write endpoints login+unlock+CSRF gated; GET /api/fix-agents unlock-gated. Security-reviewed: no injection (charset-validated name/engine/target + shell_word escaping holds through tmux→sh -c→zsh -lic). 40 tests pass; UI + spawn/close browser-verified against a live unhealthy container. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
…cord Commit security review caught a protected-host safety-gate bypass: the SSH target was taken from the client request while `protected` was recomputed by re-matching that target — so addressing a protected host via an alternate target string (IP vs hostname, different user@) failed the string match, propose became false, and the agent would run in full mutating mode against a host the operator marked protected. resolve_host() now derives target + protected + label from ONE record: a named host (id/label) is authoritative and the client-sent target is ignored; a free-form target to an unconfigured host fails safe to propose-only; a matching record honours its own flag. Covered by unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR adds a guarded “Fix” workflow for unhealthy/restarting/crashed containers that spawns a tmux-backed “fix agent” session seeded with read-only diagnostics, plus new runtime configuration for fix agents, protected remote hosts (propose-only mode), and persisted dynamic sessions.
Changes:
- Add backend support for
/api/fixand/api/fix-agents(load/save), including prompt-file generation and tmux session spawning. - Introduce persisted “dynamic sessions” and merge them into the session list; add a new “remove” endpoint for sessions.
- Update the UI to show a Fix (🔧) button on “red” containers, a danger-gated modal, a “Fix agents” editor, and a
protectedflag in the remote-host editor.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tmux.rs | Adds dynamic sessions into the session model and introduces create_dynamic_session. |
| src/routes.rs | Adds /api/fix, /api/fix-agents, and /api/sessions/remove; adjusts stop behavior to remove dynamic entries. |
| src/remote_hosts.rs | Persists/normalizes new protected flag for remote hosts and updates tests. |
| src/main.rs | Wires new modules (dynamic_sessions, fix, fix_agents). |
| src/fix.rs | Implements fix-session creation: diagnostics, prompt generation, agent command construction, and dynamic session spawn. |
| src/fix_agents.rs | Implements fix-agent registry load/save with normalization and seeding. |
| src/dynamic_sessions.rs | Implements persistence layer for dynamic sessions (load/save/remove). |
| src/config.rs | Adds config paths for fix agents and dynamic sessions, plus protected on remote hosts. |
| public/metrics.js | Adds Fix button + danger-gated modal + /api/fix call in served JS. |
| public/events.js | Adds “Fix agents” button to the Configure area. |
| public/actions.js | Reworks remote-host editor into row UI with protected; adds fix-agents editor. |
| frontend/metrics.ts | TypeScript source for Fix button + modal + /api/fix call. |
| frontend/events.ts | TypeScript source for “Fix agents” button insertion. |
| frontend/actions.ts | TypeScript source for remote-host editor rows + protected, plus fix-agents editor. |
| .gitignore | Ignores fix-agents.json, dynamic-sessions.json, fix-prompts/, and local Playwright scratch. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+58
to
+63
|
|
||
| // Resolve the SSH target, propose-only flag, and display label from ONE authoritative source. | ||
| // The `protected` (propose-only) gate must be decided by the same record that supplies the | ||
| // target, so a client can't dodge a protected host by sending an alternate target string that | ||
| // fails to string-match the configured record. Returns (target, propose_only, host_label). | ||
| fn resolve_host( |
Comment on lines
+396
to
+400
| pub async fn create_dynamic_session(config: Arc<Config>, spec: KnownSession) -> Result<String, String> { | ||
| dynamic_sessions::save_entry(config.clone(), &spec).await?; | ||
| match launch_spec(&spec, "created").await { | ||
| Ok(message) => Ok(message), | ||
| Err(error) => { |
Comment on lines
+730
to
743
| async fn api_session_remove( | ||
| State(state): State<AppState>, | ||
| headers: HeaderMap, | ||
| connect: ConnectInfo<SocketAddr>, | ||
| axum::Json(body): axum::Json<NameBody>, | ||
| ) -> Response { | ||
| if let Some(response) = guard(&state, &headers, &connect) { | ||
| return response; | ||
| } | ||
| if let Err(response) = require_unlock(&state, &headers).and_then(|_| require_action(&headers)) { | ||
| return response; | ||
| } | ||
| stop_session_result(state.config.clone(), &body.name).await | ||
| } |
Comment on lines
+1030
to
+1044
| async fn stop_session_result(config: std::sync::Arc<config::Config>, name: &str) -> Response { | ||
| match tmux::stop_session(name).await { | ||
| Ok(message) => { | ||
| let _ = dynamic_sessions::remove(config, name).await; | ||
| webutil::json_response( | ||
| StatusCode::OK, | ||
| &serde_json::json!({ "ok": true, "message": message }), | ||
| ) | ||
| } | ||
| Err(error) => webutil::json_response( | ||
| StatusCode::BAD_REQUEST, | ||
| &serde_json::json!({ "error": error }), | ||
| ), | ||
| } | ||
| } |
Comment on lines
+509
to
523
| function readRemoteHostRows(root: HTMLElement): RemoteHostEntry[] { | ||
| return Array.from(root.querySelectorAll<HTMLElement>('[data-remote-host-row]')) | ||
| .map((row) => { | ||
| const value = (field: string): string => row.querySelector<HTMLInputElement>(`[data-host-field="${field}"]`)?.value.trim() || ''; | ||
| const label = value('label'); | ||
| const id = value('id') || idFromLabel(label); | ||
| return { | ||
| id, | ||
| label, | ||
| target: value('target'), | ||
| protected: Boolean(row.querySelector<HTMLInputElement>('[data-host-field="protected"]')?.checked), | ||
| }; | ||
| }) | ||
| .filter((host) => host.id && host.label && host.target); | ||
| } |
Comment on lines
+498
to
512
| function readRemoteHostRows(root) { | ||
| return Array.from(root.querySelectorAll('[data-remote-host-row]')) | ||
| .map((row) => { | ||
| const value = (field) => row.querySelector(`[data-host-field="${field}"]`)?.value.trim() || ''; | ||
| const label = value('label'); | ||
| const id = value('id') || idFromLabel(label); | ||
| return { | ||
| id, | ||
| label, | ||
| target: value('target'), | ||
| protected: Boolean(row.querySelector('[data-host-field="protected"]')?.checked), | ||
| }; | ||
| }) | ||
| .filter((host) => host.id && host.label && host.target); | ||
| } |
Comment on lines
+570
to
+571
| await Promise.all([loadFixAgents(), loadRemoteHostConfig()]); | ||
| const remote = remoteHostConfig.find((item) => item.id === host || item.label === host); |
Comment on lines
+542
to
+543
| await Promise.all([loadFixAgents(), loadRemoteHostConfig()]); | ||
| const remote = remoteHostConfig.find((item) => item.id === host || item.label === host); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A red container (unhealthy / restarting / crashed) gets a Fix button (🔧). Clicking it opens a danger-gated modal; on confirm, ShellDeck spawns a tmux session running a configurable agent, seeded with the container's read-only diagnostics, to attempt a safe remediation while you watch it live.
{label, command}, seeded with Claude/Grok/Codex, managed in-app (a "Fix agents" editor;GET/POST /api/fix-agents).{prompt}/{promptfile}placeholders are substituted, else the prompt is appended.protectedflag → propose-only. On a protected host the agent is told to inspect read-only and print recommended commands, run nothing. Unprotected hosts attempt a safe fix. (Checkbox in the remote-hosts editor.)ps -a/inspect/logs --tail 200, zero mutating verbs). Prompt is written to a file and passed quoted. A destructive-verb ban (rm/rmi/compose down -v/volume prune/system prune/kill/reboot…) is baked into the agent's instructions, scoped to only the target container.POST /api/sessions/remove(kill + drop).Safety model (read this)
ShellDeck cannot hard-sandbox an interactive agent session on a host — so protection is layered, not absolute:
Two residual risks, both LOW/by-design (from the security review):
Security review
Independent security-reviewer pass: no CRITICAL/HIGH/MED findings. No command-injection path —
name/engine/targetare charset-validated to a metacharacter-free set, the attacker-influenced prompt/diagnostics areshell_wordsingle-quote-escaped (or never inlined on the default branch), and the quoting composes correctly throughtmux → sh -c → zsh -lic. All write endpoints are login+unlock+CSRF-gated;GET /api/fix-agentsis unlock-gated;write_promptcan't escape its dir.This branch adds its own
src/dynamic_sessions.rs(master lacks it). ItsDynamicSessionis{name,label,family,alias,badge,start}, which diverges from #34's{name,label,start,cwd}. Whichever lands second needs a small struct-unification merge. Recommend merging #34 first, then I'll reconcile this on top (onedynamic_sessionsmodule).Verification
fix.rsguards: quoting can't be broken by attacker prompt, diagnostic is read-only, session names stay in charset).(unhealthy)container: Fix button only on red cards; modal agent picker + ack-gates-Confirm + danger banner; fix-agents editor; protected checkbox; end-to-end spawn captured live diagnostics + correct SAFE-FIX instructions, and close killed the tmux session + cleared dynamic-sessions.cargo fmtchurn; compact style preserved.Note for the productization track
The spawn shell is hardcoded
zsh(fine for this box). For the open-source/enterprise build it should honor$SHELL/a configured default-command — follow-up, not a blocker.🤖 Generated with Claude Code