fix(dind): scope built image tags per job (latent concurrent-build race)#90
Merged
Merged
Conversation
All jobs share one BuildKit worker writing into one containerd
namespace ("buildkit"), keyed by tag. Two concurrent jobs that both
`docker build -t ephpm:dev` raced on the same image record — last
build wins, and the losing job pushed/loaded the other job's binary.
Observed in the wild as an E2E matrix job asserting on PHP 8.4 and
getting the 8.5 build (ephpm/ephpm#67, #68, reproduced on main).
Fix: built tags are now stored under job-scoped names inside the
shared namespace:
ephpm:dev -> build.ephemerd.local/<job-id>/ephpm:dev
The transform is invisible to workflows — each job's docker CLI keeps
its own tag; only the storage name carries the scope. docker push
applies the same transform on lookup (scoped candidates first, then
unscoped fallbacks for images staged by tests/tooling). Cross-job
resolution is impossible by construction: job A's candidates are
A-scoped + unscoped, which can never match job B's B-scoped records.
The synthetic build.ephemerd.local registry hostname never resolves on
any network — it exists to keep scoped names valid under the Docker
reference grammar (BuildKit's exporter validates refs).
Tests: scoping table (registry-qualified refs, case-sensitive tags,
underscored job IDs), reference-grammar validation of scoped names,
export-attr scoping, push candidate ordering, and the race condition
expressed as a property (same tag + different jobs = distinct names).
The existing registry e2e exercises the unscoped fallback unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a latent concurrent-build race in the dind layer: all jobs share one BuildKit worker writing into one flat
"buildkit"containerd namespace, keyed by raw user tag. Two concurrent jobs that bothdocker build -t same:tagoverwrite each other's image record — last build wins, and the losing job's subsequentdocker pushships the other job's bytes.The race (from code, not from an incident)
cmd/ephemerd/main.goconstructs onebuildkit.ServerwithContainerdNamespace: "buildkit"and hands it to every per-job dind server.pkg/dind/buildkit_build.goexported builds under the raw user tag —docker build -t foo:devfrom any job writes image recordfoo:devin the shared namespace.pkg/dind/registry.go(docker push) reads tags back from the same flat namespace.containerd image records are name → digest. Concurrent same-tag builds = last writer wins = cross-job image substitution for any workflow that builds and pushes through the dind socket. ephemerd's per-job isolation (container namespaces
ephemerd-dind-<job-id>) covers containers and execs but never covered built tags.Fix
Built tags are stored under job-scoped names inside the shared namespace:
docker pushapplies the same transform on lookup: scoped candidates first, then unscoped fallbacks (covers images staged into the namespace by tests/tooling).build.ephemerd.localis a synthetic registry hostname that never resolves — it exists to keep scoped names valid under the Docker reference grammar (BuildKit's exporter validates refs).dind topology answers (for the record)
The investigation asked four questions worth answering somewhere durable:
dind.Server, socket, and containerd namespace (ephemerd-dind-<job-id>). The backing containerd and the BuildKit worker are VM-global.(provider, repo)image cache namespace is intentionally shared across jobs of the same repo — pulls only, read-through.dind.allow_privileged: only opens the privilege gate on the per-job daemon; no isolation topology change either way.Tests
distribution/reference(BuildKit rejects invalid refs).-t a -t b).Test plan
docker build+docker pushto a real registry from a single job still round-trips on a deployed build