Skip to content

feat(vm): deliver host config.toml into the Linux VM via boot-initrd tail#89

Merged
luthermonson merged 2 commits into
mainfrom
feat/plan9-config-share
Jun 11, 2026
Merged

feat(vm): deliver host config.toml into the Linux VM via boot-initrd tail#89
luthermonson merged 2 commits into
mainfrom
feat/plan9-config-share

Conversation

@luthermonson

@luthermonson luthermonson commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

The host's config.toml now rides into the Linux VM through the runtime-generated boot-initrd tail — the same mechanism that already delivers ephemerd-linux on every VM start. The init script stages it at /etc/ephemerd/config.toml and passes --config, so the in-VM daemon reads the same TOML the host reads. Adding a new in-VM-relevant config knob now costs zero plumbing: edit config.toml, restart ephemerd, done.

This supersedes the per-setting kernel-cmdline pattern that #87/#88 each had to walk (field → cmdline param → init-script parser → CLI flag → rebuild).

Rework history: why not the Plan9 share

The first draft of this PR (still in the branch history) exposed the host data dir as a Hyper-V Plan9 share. Deployed to the dev rig, it failed twice over:

  1. HCS rejected the document at VM startHcsStartComputeSystem: HRESULT 0xc0370110. The VM never booted, and Linux CI on the rig was silently down (~100 min) until rollback; the only symptom was a DEBUG-level "OS labels don't match" skip log.
  2. The guest could never have mounted it anyway — Hyper-V serves Plan9 over hvsock, not virtio. Mainline mount -t 9p has no hvsock transport; LCOW's GCS makes it work by opening an AF_VSOCK socket in userspace and passing the fd via trans=fd. That's real machinery for a file we read exactly once at boot.

A live share buys continuous visibility; we need a boot-time snapshot of one file. The initrd tail already exists, regenerates on every VM boot, and adds zero kernel/transport surface. Full post-mortem in docs/arch/host-config-initrd.md, including two follow-ups it surfaced (louder unbootable-VM signal; HCS boot smoke test — 0xc0370110 only appears at start time, nothing in mage ci exercises it).

Mechanism

  • pkg/vm/initrd_windows.gobuildBootInitrd takes a hostConfigPath; when readable, appends /assets/config.toml (mode 0600) to the cpio tail next to ephemerd-linux. Missing file → skipped, non-fatal.
  • pkg/vm/linuxvm_windows.go — resolves <HostDataDir>/config.toml at every VM start (tail is rebuilt per boot, which is what makes restart-to-apply work).
  • Init script (mage/download/download.go) — stages /assets/config.toml/etc/ephemerd/config.toml (0600) and adds --config to the in-VM serve invocation. Logs host_config=yes|<empty> in the launch banner.
  • The ephemerd.dind* cmdline flags from feat(vm): plumb dind.allow_privileged from host config into the VM #88 are retained as the no-config fallback (fresh installs before config.toml exists).

Security

  • The TOML can carry webhook secrets → 0600 in the cpio and in the VM; root-only; job containers never see the VM host rootfs.
  • The GitHub App private key does not cross the boundary — private_key_path references a file outside the data dir, and only the TOML text is embedded.
  • Worker mode (--containerd-only) returns before the metrics/provider/scheduler blocks, so a host config with [metrics] enabled = true does not start a second listener inside the VM. Arch doc records this as an invariant to preserve.

Verified on the live rig

  • VM boots cleanly (no 0xc0370110)
  • ephemerd-init: host config staged at /etc/ephemerd/config.toml
  • Launch banner host_config=yes
  • Dispatch client connects; jobs flow; in-VM worker reads the host's [dind] section

Test plan

  • pkg/vm unit tests: config-present tail contains assets/config.toml + body; missing config is non-fatal and leaves no stray entry
  • Lint (golangci-lint, Linux flavor) — 0 issues
  • Live deploy: boot + staging + banner verified
  • CI green
  • Flip a [dind] value on the host, restart, confirm the in-VM daemon honors it with the cmdline fallback removed (manual spot-check)

Out of scope

  • macOS: Vz exposes virtio-fs directly, so the Darwin equivalent can be a real share — different wiring, tracked separately.
  • Removing the ephemerd.dind* cmdline flags — they're the fresh-install fallback; revisit after this soaks.

…kes effect on next boot)

The host's data dir (where config.toml lives) is now exposed read-only
to the Linux VM as a Hyper-V Plan9 share named "ephemerd-host-config".
The init script mounts it at /mnt/host-config and points the in-VM
`ephemerd serve` at the host's config.toml via --config. Adding a new
in-VM-relevant setting (dind, runtime.rlimits, future knobs) now costs
zero plumbing: write to config.toml on the host, restart ephemerd, the
VM reboots and reads the same TOML.

Why Plan9: the kernel surface (CONFIG_9P_FS, CONFIG_NET_9P_VIRTIO) was
already compiled into our virt kernel and the modules were already
listed in initrdKernelModulesX86. Someone wired the guest side but
never the host. This connects the dots.

Security boundary: the share is read-only — a compromised in-VM
ephemerd cannot mutate the host. Job containers never see the share
(they get only the runtime's explicit bind mounts).

Fallback: when the share fails to mount (stripped kernel without 9p,
share not exported, etc.) the init script logs a warning and falls
back to today's behavior. The kernel-cmdline ephemerd.dind* params
introduced in #88 are deliberately retained as that fallback path —
they're redundant when the share is healthy.

Doc: docs/arch/plan9-config-share.md.

Not in scope: macOS Vz (different mechanism — virtio-fs; symmetric
work, separate PR), Linux host (no VM to share with).
…Plan9)

Reworks the host-config delivery away from the Hyper-V Plan9 share,
which failed twice over: HCS rejected the Plan9 device JSON at VM start
(HcsStartComputeSystem: 0xc0370110 — took down Linux CI on the dev rig
until rollback), and even with a valid document the guest could never
mount it — Hyper-V serves Plan9 over hvsock, not virtio, and mainline
mount -t 9p has no hvsock transport (LCOW's GCS does an AF_VSOCK +
trans=fd dance in userspace to make it work).

A live share buys continuous file visibility; we need a boot-time
snapshot of one file. So: ride config.toml in via the runtime-generated
initrd tail, exactly like ephemerd-linux already does. buildBootInitrd
appends /assets/config.toml (mode 0600) when the host file exists; the
init script stages it to /etc/ephemerd/config.toml and passes --config.
The tail regenerates on every VM boot, so "edit config.toml + restart
the service" is the complete update procedure — same semantics the
Plan9 share would have given, zero new kernel or transport surface.

Missing config.toml is non-fatal (fresh installs run on defaults +
the ephemerd.dind* cmdline flags from #88, retained for that case).

The arch doc (docs/arch/host-config-initrd.md) keeps a post-mortem of
the Plan9 attempt, including two follow-ups: a louder signal when
Linux-labeled jobs are queued but the VM failed to boot (the outage's
only symptom was a DEBUG skip log), and a smoke test that actually
starts a minimal HCS VM (0xc0370110 only appears at start time;
nothing in mage ci exercises it).

Verified on the live rig: VM boots, init logs "host config staged at
/etc/ephemerd/config.toml", launch banner shows host_config=yes, in-VM
worker reads the host's [dind] section.
@luthermonson luthermonson force-pushed the feat/plan9-config-share branch from 43b6ee7 to f9c6a03 Compare June 11, 2026 00:15
@luthermonson luthermonson changed the title feat(vm): share host config into Linux VM via Plan9 feat(vm): deliver host config.toml into the Linux VM via boot-initrd tail Jun 11, 2026
@luthermonson luthermonson marked this pull request as ready for review June 11, 2026 00:41
@luthermonson luthermonson merged commit f094055 into main Jun 11, 2026
4 checks passed
@luthermonson luthermonson deleted the feat/plan9-config-share branch June 11, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant