Describe the bug
When the runner receives a new job while a previous worker process is still running, it cancels the old worker and immediately spawns a new one. Both worker processes share the same _temp directory (orgs_<org>_work/_temp). The cancelled worker's TempDirectoryManager cleanup runs after the new worker has already created its _runner_file_commands pipes in that shared directory, deleting them out from under the active job. This causes the new job to fail with:
Missing file at path: .../_temp/_runner_file_commands/set_output_<uuid>
The root cause is that JobDispatcher spawns the new Worker process immediately upon receiving the new job request — it does not wait for the previous Worker process to fully exit and complete its TempDirectoryManager cleanup. This creates a window (17 seconds in our case) where two Worker PIDs are alive and operating on the same _temp directory.
To Reproduce
This is a race condition that requires two jobs to be dispatched to the same non-ephemeral self-hosted runner in quick succession. The exact sequence:
- Runner is executing Job A (a long-running job, e.g. integration tests)
- GitHub dispatches Job B to the same runner while Job A is still actively running and being renewed
- Runner acknowledges Job B, logs
"We are not yet checking the state of jobrequest <Job A ID>... Cancel running worker right away."
- Runner sends cancellation to Job A's Worker and immediately spawns Job B's Worker — both PIDs are now alive
- Job B's Worker initializes, creates
_runner_file_commands/set_output_<uuid> and step_summary_<uuid> files in the shared _temp directory
- Job B begins executing its first action step (e.g.
actions/checkout@v6)
- Job A's Worker finishes its cancellation teardown and calls
TempDirectoryManager: Cleaning runner temp folder: <shared _temp path> — this deletes the entire _temp directory contents, including Job B's active file command pipes
- Job B's action step fails because its
set_output and step_summary files no longer exist
- Job B exits with code 102 (runner infrastructure failure)
In our case, the gap between Job B starting (11:55:19Z) and Job A's cleanup running (11:55:36Z) was 17 seconds — plenty of time for Job B to have created and started using the file command pipes.
Expected behavior
The runner should ensure the previous Worker process has fully exited (including TempDirectoryManager cleanup) before spawning a new Worker process that uses the same _temp directory. Alternatively, each Worker should use an isolated temp directory scoped to its job ID rather than sharing a single _temp path.
Runner Version and Platform
- Runner version: 2.333.1 (latest as of 2026-04-20)
- OS: Ubuntu 22.04 LTS (running as an LXC VM on a self-hosted node)
- Architecture: x86_64
- Runner mode: Non-ephemeral, organization-level self-hosted runner
What's not working?
When two Worker processes overlap on the same runner, the exiting Worker's TempDirectoryManager cleanup deletes the _runner_file_commands directory that the new Worker is actively using, causing the new job to fail with:
Error: Missing file at path: /home/ubuntu/actions-runner/orgs_miniohq_work/_temp/_runner_file_commands/set_output_b0988204-d5c8-4571-861f-7028374312ee
The new job (Job B) exits with code 102. The previous job (Job A) also fails to report completion, receiving HTTP 404 / TaskOrchestrationJobNotFoundException from the run service.
Job Log Output
Job B (the victim job) log output during the checkout step:
2026-04-20T11:55:22.8993364Z ##[group]Run actions/checkout@v6
2026-04-20T11:55:23.0693218Z Syncing repository: miniohq/eos
2026-04-20T11:55:23.0696859Z ##[group]Getting Git version info
2026-04-20T11:55:23.0698918Z Working directory is '/home/ubuntu/actions-runner/orgs_miniohq_work/eos/eos'
2026-04-20T11:55:23.0701302Z [command]/usr/bin/git version
2026-04-20T11:55:23.0702235Z git version 2.43.0
...
(checkout proceeds normally for ~22 seconds, then fails)
...
Error: Missing file at path: /home/ubuntu/actions-runner/orgs_miniohq_work/_temp/_runner_file_commands/set_output_b0988204-d5c8-4571-861f-7028374312ee
Runner and Worker's Diagnostic Logs
Runner Log — Job dispatch overlap (Runner_20260410-225245-utc.log)
Shows Job A (42b445e6) actively renewing, then Job B (b665e3b4) arriving and the runner immediately spawning a new Worker without waiting for the old one to exit:
[2026-04-20 11:55:17Z INFO JobDispatcher] Successfully renew job 42b445e6-82dd-5f8b-a498-a9859d5322d2, job is valid till 4/20/2026 12:04:36 PM
[2026-04-20 11:55:17Z INFO BrokerMessageListener] Acknowledging runner request 'b665e3b4-2377-5230-9563-f043505754b8'.
[2026-04-20 11:55:19Z INFO JobDispatcher] Job request 0 for plan 93b2502b-9a4c-460a-8f49-4ae31685f3a7 job b665e3b4-2377-5230-9563-f043505754b8 received.
[2026-04-20 11:55:19Z ERR JobDispatcher] We are not yet checking the state of jobrequest 42b445e6-82dd-5f8b-a498-a9859d5322d2 status. Cancel running worker right away.
[2026-04-20 11:55:19Z INFO JobDispatcher] Send job cancellation message to worker for job 42b445e6-82dd-5f8b-a498-a9859d5322d2.
[2026-04-20 11:55:19Z INFO ProcessInvokerWrapper] Starting process:
[2026-04-20 11:55:19Z INFO ProcessInvokerWrapper] File name: '/home/ubuntu/actions-runner/bin.2.333.1/Runner.Worker'
[2026-04-20 11:55:19Z INFO ProcessInvokerWrapper] Arguments: 'spawnclient 160 164'
[2026-04-20 11:55:19Z INFO ProcessInvokerWrapper] Process started with process id 1449281, waiting for process exit.
[2026-04-20 11:55:19Z INFO JobDispatcher] Send job request message to worker for job b665e3b4-2377-5230-9563-f043505754b8.
At this point, PID 1431550 (Job A) and PID 1449281 (Job B) are both running simultaneously.
Worker Log — Job A's cleanup wipes shared _temp (Worker_20260420-114836-utc.log)
Job A receives cancellation, tears down, then runs TempDirectoryManager at 11:55:36Z — 17 seconds after Job B's Worker started:
[2026-04-20 11:55:19Z INFO Worker] Cancellation/Shutdown message received.
[2026-04-20 11:55:19Z INFO ProcessInvokerWrapper] Waiting for process exit or 7.5 seconds after SIGINT signal fired.
[2026-04-20 11:55:26Z INFO ProcessInvokerWrapper] Waiting for process exit or 2.5 seconds after SIGTERM signal fired.
[2026-04-20 11:55:31Z INFO ProcessInvokerWrapper] Process Cancellation finished.
[2026-04-20 11:55:36Z INFO TempDirectoryManager] Cleaning runner temp folder: /home/ubuntu/actions-runner/orgs_miniohq_work/_temp
[2026-04-20 11:55:36Z INFO JobRunner] Raising job completed against run service
[2026-04-20 11:55:36Z ERR GitHubActionsService] POST request to https://run-actions-1-azure-eastus.actions.githubusercontent.com/176/completejob failed. HTTP Status: NotFound
[2026-04-20 11:55:36Z ERR JobRunner] GitHub.DistributedTask.WebApi.TaskOrchestrationJobNotFoundException: Job not found: 42b445e6-82dd-5f8b-a498-a9859d5322d2. workflow instance not found
Worker Log — Job B fails because its files were deleted (Worker_20260420-115519-utc.log)
Job B initialized _temp at 11:55:20Z, started checkout at 11:55:22Z, but its file command pipes were wiped at 11:55:36Z by Job A's cleanup:
[2026-04-20 11:55:20Z INFO HostContext] Well known directory 'Temp': '/home/ubuntu/actions-runner/orgs_miniohq_work/_temp'
[2026-04-20 11:55:22Z INFO ProcessInvokerWrapper] Starting process:
[2026-04-20 11:55:22Z INFO ProcessInvokerWrapper] File name: '/home/ubuntu/actions-runner/externals/node24/bin/node'
[2026-04-20 11:55:22Z INFO ProcessInvokerWrapper] Arguments: '"/home/ubuntu/actions-runner/orgs_miniohq_work/_actions/actions/checkout/v6/dist/index.js"'
[2026-04-20 11:55:22Z INFO ProcessInvokerWrapper] Process started with process id 1449368, waiting for process exit.
[2026-04-20 11:55:45Z INFO ProcessInvokerWrapper] Finished process 1449368 with exit code 1, and elapsed time 00:00:22.9692284.
[2026-04-20 11:55:45Z INFO CreateStepSummaryCommand] Step Summary file (/home/ubuntu/actions-runner/orgs_miniohq_work/_temp/_runner_file_commands/step_summary_b0988204-d5c8-4571-861f-7028374312ee) does not exist; skipping attachment upload
[2026-04-20 11:55:45Z INFO ExecutionContext] errorMessages: ["Missing file at path: /home/ubuntu/actions-runner/orgs_miniohq_work/_temp/_runner_file_commands/set_output_b0988204-d5c8-4571-861f-7028374312ee"]
[2026-04-20 11:55:47Z INFO JobRunner] Job result after all job steps finish: Failed
[2026-04-20 11:55:49Z INFO TempDirectoryManager] Cleaning runner temp folder: /home/ubuntu/actions-runner/orgs_miniohq_work/_temp
[2026-04-20 11:55:49Z INFO Worker] Job completed.
Runner reports: Worker finished for job b665e3b4... Code: 102
Timeline Summary
| Time (UTC) |
Event |
| 11:48:36 |
Job A (PID 1431550) starts — run-tables-tests (spark) |
| 11:55:17 |
Job A still renewing successfully (valid till 12:04:36) |
| 11:55:17 |
Job B acknowledged by runner while Job A is active |
| 11:55:19 |
Runner: "Cancel running worker right away" — sends cancel to Job A |
| 11:55:19 |
Job B (PID 1449281) spawned immediately — two PIDs now alive |
| 11:55:20 |
Job B initializes, uses shared _temp directory |
| 11:55:22 |
Job B creates set_output_b0988204... and starts checkout |
| 11:55:36 |
Job A runs TempDirectoryManager — wipes shared _temp including Job B's files |
| 11:55:45 |
Job B checkout fails: Missing file at path: .../set_output_b0988204... |
| 11:55:49 |
Job B exits code 102 (Failed) |
Suggested Fix
Either:
JobDispatcher should await the previous Worker process exit before spawning the new Worker, OR
- Each Worker should use a job-scoped temp directory (e.g.
_temp/<job-id>/) instead of sharing a single _temp path, OR
TempDirectoryManager should check whether another Worker is active before cleaning _temp
Describe the bug
When the runner receives a new job while a previous worker process is still running, it cancels the old worker and immediately spawns a new one. Both worker processes share the same
_tempdirectory (orgs_<org>_work/_temp). The cancelled worker'sTempDirectoryManagercleanup runs after the new worker has already created its_runner_file_commandspipes in that shared directory, deleting them out from under the active job. This causes the new job to fail with:The root cause is that
JobDispatcherspawns the new Worker process immediately upon receiving the new job request — it does not wait for the previous Worker process to fully exit and complete itsTempDirectoryManagercleanup. This creates a window (17 seconds in our case) where two Worker PIDs are alive and operating on the same_tempdirectory.To Reproduce
This is a race condition that requires two jobs to be dispatched to the same non-ephemeral self-hosted runner in quick succession. The exact sequence:
"We are not yet checking the state of jobrequest <Job A ID>... Cancel running worker right away."_runner_file_commands/set_output_<uuid>andstep_summary_<uuid>files in the shared_tempdirectoryactions/checkout@v6)TempDirectoryManager: Cleaning runner temp folder: <shared _temp path>— this deletes the entire_tempdirectory contents, including Job B's active file command pipesset_outputandstep_summaryfiles no longer existIn our case, the gap between Job B starting (11:55:19Z) and Job A's cleanup running (11:55:36Z) was 17 seconds — plenty of time for Job B to have created and started using the file command pipes.
Expected behavior
The runner should ensure the previous Worker process has fully exited (including
TempDirectoryManagercleanup) before spawning a new Worker process that uses the same_tempdirectory. Alternatively, each Worker should use an isolated temp directory scoped to its job ID rather than sharing a single_temppath.Runner Version and Platform
What's not working?
When two Worker processes overlap on the same runner, the exiting Worker's
TempDirectoryManagercleanup deletes the_runner_file_commandsdirectory that the new Worker is actively using, causing the new job to fail with:The new job (Job B) exits with code 102. The previous job (Job A) also fails to report completion, receiving HTTP 404 /
TaskOrchestrationJobNotFoundExceptionfrom the run service.Job Log Output
Job B (the victim job) log output during the checkout step:
Runner and Worker's Diagnostic Logs
Runner Log — Job dispatch overlap (Runner_20260410-225245-utc.log)
Shows Job A (
42b445e6) actively renewing, then Job B (b665e3b4) arriving and the runner immediately spawning a new Worker without waiting for the old one to exit:At this point, PID 1431550 (Job A) and PID 1449281 (Job B) are both running simultaneously.
Worker Log — Job A's cleanup wipes shared _temp (Worker_20260420-114836-utc.log)
Job A receives cancellation, tears down, then runs
TempDirectoryManagerat 11:55:36Z — 17 seconds after Job B's Worker started:Worker Log — Job B fails because its files were deleted (Worker_20260420-115519-utc.log)
Job B initialized
_tempat 11:55:20Z, started checkout at 11:55:22Z, but its file command pipes were wiped at 11:55:36Z by Job A's cleanup:Runner reports:
Worker finished for job b665e3b4... Code: 102Timeline Summary
run-tables-tests (spark)_tempdirectoryset_output_b0988204...and starts checkoutTempDirectoryManager— wipes shared_tempincluding Job B's filesMissing file at path: .../set_output_b0988204...Suggested Fix
Either:
JobDispatchershouldawaitthe previous Worker process exit before spawning the new Worker, OR_temp/<job-id>/) instead of sharing a single_temppath, ORTempDirectoryManagershould check whether another Worker is active before cleaning_temp