fix: detached docker session stays executing while container is running#137
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #136
A detached `--isolated docker` session was recorded with a terminal status (`executed`) and `exitCode -1` while its container was still running (or not yet visible on a slow Docker-in-Docker host). Root cause: `isDetachedSessionAlive`/`is_detached_session_alive` treated a failed `docker inspect` as "not alive" (false), so `enrichDetachedStatus`/`enrich_detached_status` marked the session `executed` with the `-1` sentinel. Fix (JS + Rust parity): - Distinguish "unknown" (inspect failed -> null/None) from "stopped" (inspect succeeded, Running=false). Unknown liveness keeps the session `executing` instead of fabricating a terminal result. - When a container has genuinely stopped, resolve the real exit code from the log footer, then `docker inspect .State.ExitCode`, only falling back to `-1` as a last resort. Adds JS + Rust tests guarded on docker availability, plus changelog fragments. Closes #136
executed) with exitCode -1 while container is still runningexecuting while container is running
… CI) The stopped-container test fabricated a container with `docker run alpine`, but on Windows runners in Windows-containers mode the Linux image never materializes (`no matching manifest for windows`). `docker inspect` then fails and liveness is null (unknown), not false, so the assertion broke. - Probe with the repo's `canRunLinuxDockerImages()` instead of `docker info`, so the suite skips when Linux images can't run. - Skip the stopped-container test unless the container actually exists per `docker inspect`.
Working session summaryThe working tree is clean (ci-logs is gitignored or already absent). Everything is complete. Let me give a final summary. The work on issue #136 is complete and verified. SummaryIssue #136: A detached Root cause: Fix (JS + Rust parity):
Tests: 4 new JS tests + 4 new Rust tests, each guarded on docker/Linux-image availability. A follow-up commit fixed a Windows-CI flake by probing with the repo's Verification:
PR #137 is updated (title de-WIP'd, full description), marked ready for review, and This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
📊 Context and tokens usage:Claude Opus 4.8: (2 sub-sessions)
Total: (6.2K new + 153.6K cache writes + 7.0M cache reads) input tokens, 51.7K output tokens, $5.802376 cost 🤖 Models used:
📎 Log file uploaded as Gist (2980KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
This reverts commit 14a4a48.
Summary
Fixes #136. A detached
--isolated dockersession was recorded with a terminal status (executed) andexitCode -1while its container was still running — or not even created yet on a slow Docker-in-Docker host. This made$ --list/$ --statusreport the session as finished/failed when it was actually still in progress, breaking orchestrators that poll status.Root cause
isDetachedSessionAlive(JS) /is_detached_session_alive(Rust) randocker inspect -f '{{.State.Running}}' <name>and treated any failure (non-zero exit / empty output) as "not alive" (false). Right afterdocker run -dreturns, the container can be transiently invisible todocker inspecton a slow DinD host. Afalseliveness result droveenrichDetachedStatus/enrich_detached_statusto flip the record toexecutedand write the-1sentinel exit code.Fix (JS + Rust parity)
null/None) from "stopped" (inspect succeeded,Running=false). When liveness is unknown, the detached session staysexecutinginstead of fabricating a terminal-1result.{{.State.Running}}and{{.State.ExitCode}}in one call. When a container has genuinely stopped, the real exit code is resolved from the log footer first, then fromdocker inspect .State.ExitCode, only falling back to-1as a last resort when no real code is available.--rmcontainer that already wrote anExit Code:log footer still resolves to its real terminal status (footer is authoritative when liveness is unknown).How to reproduce
$ --isolated docker --detached -- sleep 30.$ --status <session>(or$ --list) while the container is still spinning up / running.executed, exit code-1. After: statusexecutinguntil the container actually exits, then the real exit code.Tests
js/test/session-name-status.js): addedIssue #136: detached docker session liveness— unknown container →null; not-visible container keepsexecuting; running container staysexecuting; stopped container resolves the real exit code. Guarded ondockeravailability.rust/tests/session_name_status.rs): added 4 parity tests, guarded on docker availability.eslint,prettier --check,cargo fmt --check, andcargo clippyare clean.Changelog
js/.changeset/detached-docker-running-status.md(start-command: patch)rust/changelog.d/136.md(bump: patch)