Skip to content

Commit 4972708

Browse files
committed
Ensure read write many first
1 parent 6ecda1d commit 4972708

11 files changed

Lines changed: 891 additions & 650 deletions
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# ADR 0135: RWX volume strategy and RWO affinity fallback
2+
3+
**Date:** 22 April 2026
4+
5+
**Status**: Accepted
6+
7+
## Context
8+
9+
The Kubernetes hook implementation for GitHub Actions runners requires access to the runner's working directory (`_work`) within the dynamically created job pods. This shared access is typically managed via Persistent Volume Claims (PVCs).
10+
11+
Regardless of the storage strategy, job pods are always constrained to run on the same node as the runner pod to ensure consistent access to the local environment and state. The choice of volume access mode determines operational flexibility and multi-pod access capability rather than pod placement.
12+
13+
Depending on the storage provider and cluster configuration, operators may choose between `ReadWriteMany` (RWX) or `ReadWriteOnce` (RWO) access modes. RWX is preferred because it allows multiple pods to access the volume simultaneously, providing greater operational flexibility for future scaling or monitoring scenarios. RWO restricts volume access to a single pod at a time, locking the volume to that pod's specific node.
14+
15+
## Decision
16+
17+
We have decided to establish `ReadWriteMany` (RWX) as the preferred storage strategy for the Kubernetes hook. While job pods remain pinned to the runner's node, RWX provides superior operational flexibility by allowing multiple pods (such as sidecars or auxiliary tools) to access the same volume without storage-imposed locking constraints.
18+
19+
For environments where RWX is unavailable or undesirable, we support a `ReadWriteOnce` (RWO) fallback strategy. This fallback is implemented using node affinity to ensure that job pods are scheduled onto the same node as the runner pod that holds the RWO volume.
20+
21+
### Operational Guidance
22+
23+
1. **Preferred Model (RWX):** Operators should configure the runner with a PVC supporting `ReadWriteMany`.
24+
2. **Fallback Model (RWO):** If using `ReadWriteOnce`, operators must enable the Kubernetes scheduler integration by setting `ACTIONS_RUNNER_USE_KUBE_SCHEDULER=true`.
25+
3. **Node Selection:** When scheduler integration is enabled, the hook applies a `requiredDuringSchedulingIgnoredDuringExecution` node affinity targeting the runner's current node (`kubernetes.io/hostname`).
26+
4. **Implementation Details:**
27+
- The hook determines the node name via `getCurrentNodeName()` and applies affinity in `packages/k8s/src/k8s/index.ts` (lines 101, 165).
28+
- The scheduler behavior is toggled by the `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` environment variable, as defined in `packages/k8s/src/k8s/utils.ts` (line 16).
29+
- The PVC claim name defaults to `${ACTIONS_RUNNER_POD_NAME}-work` unless overridden by `ACTIONS_RUNNER_CLAIM_NAME` (`packages/k8s/src/hooks/constants.ts`, lines 27-33).
30+
31+
### Non-Recommendations
32+
33+
We explicitly do **not** recommend the use of `spec.nodeName` for operator-driven scheduling. While the hook uses `nodeName` as a legacy fallback when `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` is not set to `true` (`packages/k8s/src/k8s/index.ts`, lines 103, 167), this bypasses the Kubernetes scheduler and can lead to scheduling failures or resource imbalances. Operators should always prefer the affinity-based approach for RWO volumes.
34+
35+
## Alternatives
36+
37+
- **nodeName Bypass:** Directly setting `nodeName` bypasses the scheduler entirely. This was rejected as a recommendation because it prevents the scheduler from accounting for taints, tolerations, and resource pressure.
38+
- **Local Volumes:** Using local volumes tied to specific nodes. This is a subset of the RWO fallback and is supported via the affinity mechanism.
39+
40+
## Consequences
41+
42+
- **Flexibility:** RWX users benefit from the ability to have multiple pods access the volume simultaneously, simplifying future operational extensions.
43+
- **Node Coupling:** All users are coupled to the node where the runner pod is running. The hook ensures job pods are scheduled on the same node to maintain workspace integrity.
44+
- **Configuration:** Operators must be aware of the `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` toggle when moving from RWX to RWO. This toggle controls whether the hook uses `nodeName` (bypassing the scheduler) or node affinity (using the scheduler) to pin the pod to the runner's node.
45+
46+
## Migration Guidance
47+
48+
Operators migrating from an RWO setup that relied on default `nodeName` behavior to a more robust affinity-based setup should:
49+
1. Ensure the runner pod has the `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` environment variable set to `true`.
50+
2. Verify that the runner's ServiceAccount has the necessary permissions to list pods (to determine its own node).
51+
52+
## Non-Goals
53+
54+
- This ADR does not recommend `nodeName` as a primary or secondary configuration path for operators.
55+
- This ADR does not dictate specific storage providers (e.g., EBS vs. EFS vs. Azure Files), but rather the access mode strategy.

packages/k8s/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,27 @@ rules:
3030
- The `ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER` env should be set to true to prevent the runner from running any jobs outside of a container
3131
- The runner pod should map a persistent volume claim into the `_work` directory
3232
- The `ACTIONS_RUNNER_CLAIM_NAME` env should be set to the persistent volume claim that contains the runner's working directory, otherwise it defaults to `${ACTIONS_RUNNER_POD_NAME}-work`
33+
- The `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` env can be set to `true` to enable the Kubernetes scheduler for job pods. When set to `true`, the hook uses `nodeAffinity` to ensure job pods are scheduled correctly (essential for `ReadWriteOnce` volumes). If not set, the hook defaults to a legacy mode where job pods are pinned to the same node as the runner pod using `nodeName`.
34+
35+
## Storage Guidance
36+
The K8s hooks require a shared volume between the runner pod and the job pods to share the workspace and other internal directories.
37+
38+
### RWX (Recommended)
39+
The preferred way to configure storage is using a `ReadWriteMany` (RWX) Persistent Volume Claim. While job pods are always pinned to the runner's node, RWX provides better operational flexibility by allowing multiple pods to access the same workspace simultaneously.
40+
41+
To migrate from RWO to RWX:
42+
1. Provision a new `ReadWriteMany` StorageClass if one is not available.
43+
2. Update your PVC definition to use `accessModes: [ReadWriteMany]`.
44+
3. Set `ACTIONS_RUNNER_USE_KUBE_SCHEDULER=true` to enable the scheduler-based node pinning (via affinity) instead of the default `nodeName` pinning.
45+
46+
### RWO Fallback (Affinity-based)
47+
If `ReadWriteMany` storage is not available, you can use `ReadWriteOnce` (RWO) storage. In this mode, all job pods must be scheduled on the same node as the runner pod that owns the PVC.
48+
49+
To enable this safely:
50+
1. Ensure `ACTIONS_RUNNER_USE_KUBE_SCHEDULER` is set to `true`.
51+
2. The hooks will automatically add a `nodeAffinity` to the job pods, ensuring they are scheduled on the same node as the runner pod (`kubernetes.io/hostname` match).
52+
53+
> **Note:** We do not recommend manually setting `nodeName` in the pod template, as the hooks handle node placement automatically via affinity when the scheduler is enabled.
3354
- Some actions runner env's are expected to be set. These are set automatically by the runner.
3455
- `RUNNER_WORKSPACE` is expected to be set to the workspace of the runner
3556
- `GITHUB_WORKSPACE` is expected to be set to the workspace of the job

packages/k8s/src/hooks/prepare-job.ts

Lines changed: 29 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import * as core from '@actions/core'
2+
import * as io from '@actions/io'
23
import * as k8s from '@kubernetes/client-node'
34
import {
45
JobContainerInfo,
@@ -7,33 +8,26 @@ import {
78
writeToResponseFile,
89
ServiceContainerInfo
910
} from 'hooklib'
11+
import path from 'path'
1012
import {
1113
containerPorts,
12-
createJobPod,
14+
createPod,
1315
isPodContainerAlpine,
1416
prunePods,
1517
waitForPodPhases,
16-
getPrepareJobTimeoutSeconds,
17-
execCpToPod,
18-
execPodStep
18+
getPrepareJobTimeoutSeconds
1919
} from '../k8s'
2020
import {
21-
CONTAINER_VOLUMES,
21+
containerVolumes,
2222
DEFAULT_CONTAINER_ENTRY_POINT,
2323
DEFAULT_CONTAINER_ENTRY_POINT_ARGS,
2424
generateContainerName,
2525
mergeContainerWithOptions,
2626
readExtensionFromFile,
2727
PodPhase,
28-
fixArgs,
29-
prepareJobScript
28+
fixArgs
3029
} from '../k8s/utils'
31-
import {
32-
CONTAINER_EXTENSION_PREFIX,
33-
getJobPodName,
34-
JOB_CONTAINER_NAME
35-
} from './constants'
36-
import { dirname } from 'path'
30+
import { CONTAINER_EXTENSION_PREFIX, JOB_CONTAINER_NAME } from './constants'
3731

3832
export async function prepareJob(
3933
args: PrepareJobArgs,
@@ -46,6 +40,7 @@ export async function prepareJob(
4640
await prunePods()
4741

4842
const extension = readExtensionFromFile()
43+
await copyExternalsToRoot()
4944

5045
let container: k8s.V1Container | undefined = undefined
5146
if (args.container?.image) {
@@ -75,8 +70,7 @@ export async function prepareJob(
7570

7671
let createdPod: k8s.V1Pod | undefined = undefined
7772
try {
78-
createdPod = await createJobPod(
79-
getJobPodName(),
73+
createdPod = await createPod(
8074
container,
8175
services,
8276
args.container.registry,
@@ -96,13 +90,6 @@ export async function prepareJob(
9690
`Job pod created, waiting for it to come online ${createdPod?.metadata?.name}`
9791
)
9892

99-
const runnerWorkspace = dirname(process.env.RUNNER_WORKSPACE as string)
100-
101-
let prepareScript: { containerPath: string; runnerPath: string } | undefined
102-
if (args.container?.userMountVolumes?.length) {
103-
prepareScript = prepareJobScript(args.container.userMountVolumes || [])
104-
}
105-
10693
try {
10794
await waitForPodPhases(
10895
createdPod.metadata.name,
@@ -115,28 +102,6 @@ export async function prepareJob(
115102
throw new Error(`pod failed to come online with error: ${err}`)
116103
}
117104

118-
await execCpToPod(createdPod.metadata.name, runnerWorkspace, '/__w')
119-
120-
if (prepareScript) {
121-
await execPodStep(
122-
['sh', '-e', prepareScript.containerPath],
123-
createdPod.metadata.name,
124-
JOB_CONTAINER_NAME
125-
)
126-
127-
const promises: Promise<void>[] = []
128-
for (const vol of args?.container?.userMountVolumes || []) {
129-
promises.push(
130-
execCpToPod(
131-
createdPod.metadata.name,
132-
vol.sourceVolumePath,
133-
vol.targetVolumePath
134-
)
135-
)
136-
}
137-
await Promise.all(promises)
138-
}
139-
140105
core.debug('Job pod is ready for traffic')
141106

142107
let isAlpine = false
@@ -180,8 +145,10 @@ function generateResponseFile(
180145
const mainContainerContextPorts: ContextPorts = {}
181146
if (mainContainer?.ports) {
182147
for (const port of mainContainer.ports) {
183-
mainContainerContextPorts[port.containerPort] =
184-
mainContainerContextPorts.hostPort
148+
if (port.containerPort && port.hostPort) {
149+
mainContainerContextPorts[port.containerPort.toString()] =
150+
port.hostPort.toString()
151+
}
185152
}
186153
}
187154

@@ -217,6 +184,17 @@ function generateResponseFile(
217184
writeToResponseFile(responseFile, JSON.stringify(response))
218185
}
219186

187+
async function copyExternalsToRoot(): Promise<void> {
188+
const workspace = process.env['RUNNER_WORKSPACE']
189+
if (workspace) {
190+
await io.cp(
191+
path.join(workspace, '../../externals'),
192+
path.join(workspace, '../externals'),
193+
{ force: true, recursive: true, copySourceDirectory: false }
194+
)
195+
}
196+
}
197+
220198
export function createContainerSpec(
221199
container: JobContainerInfo | ServiceContainerInfo,
222200
name: string,
@@ -250,7 +228,7 @@ export function createContainerSpec(
250228
container['environmentVariables'] || {}
251229
)) {
252230
if (value && key !== 'HOME') {
253-
podContainer.env.push({ name: key, value })
231+
podContainer.env.push({ name: key, value: value as string })
254232
}
255233
}
256234

@@ -266,7 +244,10 @@ export function createContainerSpec(
266244
})
267245
}
268246

269-
podContainer.volumeMounts = CONTAINER_VOLUMES
247+
podContainer.volumeMounts = containerVolumes(
248+
container['userMountVolumes'],
249+
jobContainer
250+
)
270251

271252
if (!extension) {
272253
return podContainer

0 commit comments

Comments
 (0)