feat(vm): deliver host config.toml into the Linux VM via boot-initrd tail#89
Merged
Conversation
…kes effect on next boot) The host's data dir (where config.toml lives) is now exposed read-only to the Linux VM as a Hyper-V Plan9 share named "ephemerd-host-config". The init script mounts it at /mnt/host-config and points the in-VM `ephemerd serve` at the host's config.toml via --config. Adding a new in-VM-relevant setting (dind, runtime.rlimits, future knobs) now costs zero plumbing: write to config.toml on the host, restart ephemerd, the VM reboots and reads the same TOML. Why Plan9: the kernel surface (CONFIG_9P_FS, CONFIG_NET_9P_VIRTIO) was already compiled into our virt kernel and the modules were already listed in initrdKernelModulesX86. Someone wired the guest side but never the host. This connects the dots. Security boundary: the share is read-only — a compromised in-VM ephemerd cannot mutate the host. Job containers never see the share (they get only the runtime's explicit bind mounts). Fallback: when the share fails to mount (stripped kernel without 9p, share not exported, etc.) the init script logs a warning and falls back to today's behavior. The kernel-cmdline ephemerd.dind* params introduced in #88 are deliberately retained as that fallback path — they're redundant when the share is healthy. Doc: docs/arch/plan9-config-share.md. Not in scope: macOS Vz (different mechanism — virtio-fs; symmetric work, separate PR), Linux host (no VM to share with).
…Plan9) Reworks the host-config delivery away from the Hyper-V Plan9 share, which failed twice over: HCS rejected the Plan9 device JSON at VM start (HcsStartComputeSystem: 0xc0370110 — took down Linux CI on the dev rig until rollback), and even with a valid document the guest could never mount it — Hyper-V serves Plan9 over hvsock, not virtio, and mainline mount -t 9p has no hvsock transport (LCOW's GCS does an AF_VSOCK + trans=fd dance in userspace to make it work). A live share buys continuous file visibility; we need a boot-time snapshot of one file. So: ride config.toml in via the runtime-generated initrd tail, exactly like ephemerd-linux already does. buildBootInitrd appends /assets/config.toml (mode 0600) when the host file exists; the init script stages it to /etc/ephemerd/config.toml and passes --config. The tail regenerates on every VM boot, so "edit config.toml + restart the service" is the complete update procedure — same semantics the Plan9 share would have given, zero new kernel or transport surface. Missing config.toml is non-fatal (fresh installs run on defaults + the ephemerd.dind* cmdline flags from #88, retained for that case). The arch doc (docs/arch/host-config-initrd.md) keeps a post-mortem of the Plan9 attempt, including two follow-ups: a louder signal when Linux-labeled jobs are queued but the VM failed to boot (the outage's only symptom was a DEBUG skip log), and a smoke test that actually starts a minimal HCS VM (0xc0370110 only appears at start time; nothing in mage ci exercises it). Verified on the live rig: VM boots, init logs "host config staged at /etc/ephemerd/config.toml", launch banner shows host_config=yes, in-VM worker reads the host's [dind] section.
43b6ee7 to
f9c6a03
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The host's
config.tomlnow rides into the Linux VM through the runtime-generated boot-initrd tail — the same mechanism that already deliversephemerd-linuxon every VM start. The init script stages it at/etc/ephemerd/config.tomland passes--config, so the in-VM daemon reads the same TOML the host reads. Adding a new in-VM-relevant config knob now costs zero plumbing: editconfig.toml, restart ephemerd, done.This supersedes the per-setting kernel-cmdline pattern that #87/#88 each had to walk (field → cmdline param → init-script parser → CLI flag → rebuild).
Rework history: why not the Plan9 share
The first draft of this PR (still in the branch history) exposed the host data dir as a Hyper-V Plan9 share. Deployed to the dev rig, it failed twice over:
HcsStartComputeSystem: HRESULT 0xc0370110. The VM never booted, and Linux CI on the rig was silently down (~100 min) until rollback; the only symptom was a DEBUG-level "OS labels don't match" skip log.mount -t 9phas no hvsock transport; LCOW's GCS makes it work by opening anAF_VSOCKsocket in userspace and passing the fd viatrans=fd. That's real machinery for a file we read exactly once at boot.A live share buys continuous visibility; we need a boot-time snapshot of one file. The initrd tail already exists, regenerates on every VM boot, and adds zero kernel/transport surface. Full post-mortem in
docs/arch/host-config-initrd.md, including two follow-ups it surfaced (louder unbootable-VM signal; HCS boot smoke test —0xc0370110only appears at start time, nothing inmage ciexercises it).Mechanism
pkg/vm/initrd_windows.go—buildBootInitrdtakes ahostConfigPath; when readable, appends/assets/config.toml(mode 0600) to the cpio tail next toephemerd-linux. Missing file → skipped, non-fatal.pkg/vm/linuxvm_windows.go— resolves<HostDataDir>/config.tomlat every VM start (tail is rebuilt per boot, which is what makes restart-to-apply work).mage/download/download.go) — stages/assets/config.toml→/etc/ephemerd/config.toml(0600) and adds--configto the in-VM serve invocation. Logshost_config=yes|<empty>in the launch banner.ephemerd.dind*cmdline flags from feat(vm): plumb dind.allow_privileged from host config into the VM #88 are retained as the no-config fallback (fresh installs beforeconfig.tomlexists).Security
private_key_pathreferences a file outside the data dir, and only the TOML text is embedded.--containerd-only) returns before the metrics/provider/scheduler blocks, so a host config with[metrics] enabled = truedoes not start a second listener inside the VM. Arch doc records this as an invariant to preserve.Verified on the live rig
0xc0370110)ephemerd-init: host config staged at /etc/ephemerd/config.tomlhost_config=yes[dind]sectionTest plan
pkg/vmunit tests: config-present tail containsassets/config.toml+ body; missing config is non-fatal and leaves no stray entrygolangci-lint, Linux flavor) — 0 issues[dind]value on the host, restart, confirm the in-VM daemon honors it with the cmdline fallback removed (manual spot-check)Out of scope
ephemerd.dind*cmdline flags — they're the fresh-install fallback; revisit after this soaks.