Skip to content

feat(vm): plumb dind.allow_privileged from host config into the VM#88

Merged
luthermonson merged 1 commit into
mainfrom
feat/dind-allow-priv-vm
Jun 10, 2026
Merged

feat(vm): plumb dind.allow_privileged from host config into the VM#88
luthermonson merged 1 commit into
mainfrom
feat/dind-allow-priv-vm

Conversation

@luthermonson

Copy link
Copy Markdown
Contributor

Summary

The host's [dind] allow_privileged = true setting never crossed the VM boundary. The in-VM ephemerd reads its own (default) config file inside /var/lib/ephemerd and falls back to the Linux default of false, so privileged sibling containers were rejected even when the operator explicitly opted in. ephpm-style workloads that need KIND (privileged) couldn't run regardless of host config.

Fix

Plumb the host's resolved value across the boundary via the kernel cmdline:

  1. cmd/ephemerd/main.go reads cfg.Dind.ResolvedAllowPrivileged() and threads it through startContainerRuntime into vm.LinuxVMConfig.DindAllowPrivileged.
  2. pkg/vm/linuxvm_windows.go appends ephemerd.dind_allow_privileged=1 to the kernel cmdline when set.
  3. The in-initrd init script (templated in mage/download/download.go) parses the new param and adds --dind-allow-privileged to the in-VM ephemerd-linux serve invocation.
  4. A new --dind-allow-privileged CLI flag on ephemerd serve forces cfg.Dind.AllowPrivileged = true, overriding the in-VM config file.

Cache-invalidation bug fixed in the same PR

Initrdx86's outOfDate input list only watched the rootfs tarball — edits to the embedded init script body in download.go itself were silently skipped by mage build:windows, embedding a stale init script in a fresh binary. Adding mage/download/download.go as an input fixes this.

(The Darwin Initrd() function uses fileExists instead of outOfDate, so it has the same class of bug in a different form. Out of scope here; flagging for a follow-up.)

Verified

End-to-end on the live rig:

  • Kernel cmdline now carries ephemerd.dind_allow_privileged=1 when host config has the flag.
  • Init banner: ephemerd-init: containerd_port=10000 root_disk=/dev/sda dind=1 dind_allow_privileged=1.
  • Serve invocation log: launching ephemerd-linux (dind=1 allow_privileged=1).
  • The "rejecting elevated container request" warnings stopped firing.

Test plan

  • mage ci (lint + tests) passes
  • Set [dind] allow_privileged = true on the host, restart ephemerd
  • Run a workflow that does docker run --privileged or similar from inside a job
  • Confirm the sibling container starts (no rejecting elevated container request warning in vm/linux/console.log)
  • Leave allow_privileged unset on a fresh host, confirm Linux default of false still rejects (regression check)

Future work

This is the second ad-hoc kernel-cmdline plumbing for an in-VM setting (after dind=1). At a third we should swap to a real config-share mechanism — Hyper-V's Plan9 share infrastructure already exists in hcs_windows.go but isn't wired. Branch feat/plan9-config-share exists for that follow-up.

The host's `[dind] allow_privileged = true` setting was silently dropped
on the way into the embedded Linux VM. The in-VM ephemerd reads its own
(default) config inside /var/lib/ephemerd and falls back to the Linux
default of false, rejecting `docker run --privileged` siblings even when
the host operator explicitly opted in. ephpm-style workloads that need
KIND (privileged containers) couldn't run.

Plumb the host's `cfg.Dind.ResolvedAllowPrivileged()` through:

1. main.go → startContainerRuntime → LinuxVMConfig.DindAllowPrivileged
2. linuxvm_windows.go appends `ephemerd.dind_allow_privileged=1` to the
   kernel cmdline when set
3. The in-initrd init script parses the new param and adds
   `--dind-allow-privileged` to the in-VM `ephemerd-linux serve` call
4. A new `--dind-allow-privileged` CLI flag on `ephemerd serve` forces
   `cfg.Dind.AllowPrivileged = true`, overriding the in-VM config file

Also fixes a latent bug in mage/download/download.go: Initrdx86's
`outOfDate` input list didn't include download.go itself, so edits to
the embedded init script body were silently skipped by `mage
build:windows` (we burned ~30 minutes on this today). Adding the file as
an input makes init-script edits invalidate the cached initrd correctly.

Verified end-to-end on the live rig: kernel cmdline carries
ephemerd.dind_allow_privileged=1, init banner shows dind_allow_privileged=1,
serve invocation logs `(dind=1 allow_privileged=1)`, and the dind
rejection warnings ("rejecting elevated container request") stop firing.

Note: the Darwin Initrd() function has a similar cache pattern using
fileExists rather than outOfDate — same class of bug, deferred to a
follow-up.
@luthermonson luthermonson merged commit 83dafe9 into main Jun 10, 2026
3 of 4 checks passed
luthermonson added a commit that referenced this pull request Jun 11, 2026
…kes effect on next boot)

The host's data dir (where config.toml lives) is now exposed read-only
to the Linux VM as a Hyper-V Plan9 share named "ephemerd-host-config".
The init script mounts it at /mnt/host-config and points the in-VM
`ephemerd serve` at the host's config.toml via --config. Adding a new
in-VM-relevant setting (dind, runtime.rlimits, future knobs) now costs
zero plumbing: write to config.toml on the host, restart ephemerd, the
VM reboots and reads the same TOML.

Why Plan9: the kernel surface (CONFIG_9P_FS, CONFIG_NET_9P_VIRTIO) was
already compiled into our virt kernel and the modules were already
listed in initrdKernelModulesX86. Someone wired the guest side but
never the host. This connects the dots.

Security boundary: the share is read-only — a compromised in-VM
ephemerd cannot mutate the host. Job containers never see the share
(they get only the runtime's explicit bind mounts).

Fallback: when the share fails to mount (stripped kernel without 9p,
share not exported, etc.) the init script logs a warning and falls
back to today's behavior. The kernel-cmdline ephemerd.dind* params
introduced in #88 are deliberately retained as that fallback path —
they're redundant when the share is healthy.

Doc: docs/arch/plan9-config-share.md.

Not in scope: macOS Vz (different mechanism — virtio-fs; symmetric
work, separate PR), Linux host (no VM to share with).
luthermonson added a commit that referenced this pull request Jun 11, 2026
…Plan9)

Reworks the host-config delivery away from the Hyper-V Plan9 share,
which failed twice over: HCS rejected the Plan9 device JSON at VM start
(HcsStartComputeSystem: 0xc0370110 — took down Linux CI on the dev rig
until rollback), and even with a valid document the guest could never
mount it — Hyper-V serves Plan9 over hvsock, not virtio, and mainline
mount -t 9p has no hvsock transport (LCOW's GCS does an AF_VSOCK +
trans=fd dance in userspace to make it work).

A live share buys continuous file visibility; we need a boot-time
snapshot of one file. So: ride config.toml in via the runtime-generated
initrd tail, exactly like ephemerd-linux already does. buildBootInitrd
appends /assets/config.toml (mode 0600) when the host file exists; the
init script stages it to /etc/ephemerd/config.toml and passes --config.
The tail regenerates on every VM boot, so "edit config.toml + restart
the service" is the complete update procedure — same semantics the
Plan9 share would have given, zero new kernel or transport surface.

Missing config.toml is non-fatal (fresh installs run on defaults +
the ephemerd.dind* cmdline flags from #88, retained for that case).

The arch doc (docs/arch/host-config-initrd.md) keeps a post-mortem of
the Plan9 attempt, including two follow-ups: a louder signal when
Linux-labeled jobs are queued but the VM failed to boot (the outage's
only symptom was a DEBUG skip log), and a smoke test that actually
starts a minimal HCS VM (0xc0370110 only appears at start time;
nothing in mage ci exercises it).

Verified on the live rig: VM boots, init logs "host config staged at
/etc/ephemerd/config.toml", launch banner shows host_config=yes, in-VM
worker reads the host's [dind] section.
luthermonson added a commit that referenced this pull request Jun 11, 2026
…tail (#89)

* feat(vm): share host data dir into Linux VM via Plan9 (host config takes effect on next boot)

The host's data dir (where config.toml lives) is now exposed read-only
to the Linux VM as a Hyper-V Plan9 share named "ephemerd-host-config".
The init script mounts it at /mnt/host-config and points the in-VM
`ephemerd serve` at the host's config.toml via --config. Adding a new
in-VM-relevant setting (dind, runtime.rlimits, future knobs) now costs
zero plumbing: write to config.toml on the host, restart ephemerd, the
VM reboots and reads the same TOML.

Why Plan9: the kernel surface (CONFIG_9P_FS, CONFIG_NET_9P_VIRTIO) was
already compiled into our virt kernel and the modules were already
listed in initrdKernelModulesX86. Someone wired the guest side but
never the host. This connects the dots.

Security boundary: the share is read-only — a compromised in-VM
ephemerd cannot mutate the host. Job containers never see the share
(they get only the runtime's explicit bind mounts).

Fallback: when the share fails to mount (stripped kernel without 9p,
share not exported, etc.) the init script logs a warning and falls
back to today's behavior. The kernel-cmdline ephemerd.dind* params
introduced in #88 are deliberately retained as that fallback path —
they're redundant when the share is healthy.

Doc: docs/arch/plan9-config-share.md.

Not in scope: macOS Vz (different mechanism — virtio-fs; symmetric
work, separate PR), Linux host (no VM to share with).

* feat(vm): deliver host config.toml via boot-initrd tail (rework from Plan9)

Reworks the host-config delivery away from the Hyper-V Plan9 share,
which failed twice over: HCS rejected the Plan9 device JSON at VM start
(HcsStartComputeSystem: 0xc0370110 — took down Linux CI on the dev rig
until rollback), and even with a valid document the guest could never
mount it — Hyper-V serves Plan9 over hvsock, not virtio, and mainline
mount -t 9p has no hvsock transport (LCOW's GCS does an AF_VSOCK +
trans=fd dance in userspace to make it work).

A live share buys continuous file visibility; we need a boot-time
snapshot of one file. So: ride config.toml in via the runtime-generated
initrd tail, exactly like ephemerd-linux already does. buildBootInitrd
appends /assets/config.toml (mode 0600) when the host file exists; the
init script stages it to /etc/ephemerd/config.toml and passes --config.
The tail regenerates on every VM boot, so "edit config.toml + restart
the service" is the complete update procedure — same semantics the
Plan9 share would have given, zero new kernel or transport surface.

Missing config.toml is non-fatal (fresh installs run on defaults +
the ephemerd.dind* cmdline flags from #88, retained for that case).

The arch doc (docs/arch/host-config-initrd.md) keeps a post-mortem of
the Plan9 attempt, including two follow-ups: a louder signal when
Linux-labeled jobs are queued but the VM failed to boot (the outage's
only symptom was a DEBUG skip log), and a smoke test that actually
starts a minimal HCS VM (0xc0370110 only appears at start time;
nothing in mage ci exercises it).

Verified on the live rig: VM boots, init logs "host config staged at
/etc/ephemerd/config.toml", launch banner shows host_config=yes, in-VM
worker reads the host's [dind] section.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant