Skip to content

feat: buildvars registry + extension hooks + FIPS-awareness + test-infra resilience#9690

Closed
mlwelles wants to merge 17 commits intomainfrom
mlwelles/buildvars-registry-and-hooks
Closed

feat: buildvars registry + extension hooks + FIPS-awareness + test-infra resilience#9690
mlwelles wants to merge 17 commits intomainfrom
mlwelles/buildvars-registry-and-hooks

Conversation

@mlwelles
Copy link
Copy Markdown
Contributor

Summary

Two bodies of work that together make the dgraph codebase easier to maintain downstream while preserving identical upstream behavior:

  1. buildvars registry — centralizes build-time configuration (binary name, image tags, build tags, toolchain refs) in one typed, godoc'd registry. Replaces scattered hardcoded literals and ad-hoc os.Getenv calls. 5 commits.
  2. extension hooks + resilience — adds public function-var extension hooks to four test-infra packages (testutil, t, compose, dgraphtest), plus a minimal FIPS-awareness surface in x/ (var + env helper), plus universally-beneficial observability and resilience improvements (stderr capture on failure, volume-race retry, test-skip helpers). 7 commits.

All 12 commits preserve upstream behavior byte-for-byte: every hook defaults to the exact code it replaces; every widened test secret is still valid under stock HMAC; every FIPS signal is opt-in via env and defaults off. The net effect on an upstream consumer is no behavior change; the net effect on a downstream fork is that none of these concerns require patching upstream files.

Commits

buildvars registry (cmds 1–5, unchanged from prior review)

  1. feat(buildvars): introduce buildvars registry — new buildvars/ package (363 LOC + 324 LOC tests) + buildvars/cmd/buildvars CLI with shell/make/plain output.
  2. refactor(buildvars): replace hardcoded binary/image literals — migrates 4 Go files and 3 shell scripts to read from the registry. Includes an incidental nil-panic fix in getPortMappingsOnMac.
  3. fix(makefile): make BIN overridabledgraph/Makefile uses BIN ?= dgraph, go build -o $(INSTALL_TARGET) (respects renames), and broader jemalloc detection (/usr/lib, libjemalloc_pic.a for Debian/Alpine/Wolfi). t/Makefile's check target threads $BIN.
  4. test(restore): skip on non-Linuxnamespace-aware/restore_test.go skipped locally; CI unchanged.
  5. docs(buildvars): drop fork-specific names from godoc + tests — env var renamed to DGRAPH_COMPOSE_BUILD_DIR; generic language throughout.

Extension hooks + FIPS surface + resilience (6–12, new in this update)

  1. refactor(buildvars): rename cmd/buildvarprint → cmd/buildvars; align Go ident ExtraBuildTags → PrivateBuildTags — two consistency-focused renames. The CLI dir now matches the top-level package name. The Go identifier PrivateBuildTags now pairs with env var PRIVATE_BUILD_TAGS, matching every other Var in the registry (Bin/BIN, etc.).

  2. feat(x): add FIPSEnabled var + FIPSBinary() runtime helper — introduces two minimal, coordinated FIPS-awareness signals in package x:

    • FIPSEnabled (var, default false): compile-time signal. Declared as a var (not a const) so a build-tag-guarded sibling file in a downstream fork can flip it via init() before any test or main body runs. For all practical purposes behaves as a compile-time constant for if x.FIPSEnabled { ... } guards.
    • FIPSBinary() (function): runtime signal via DGRAPH_FIPS_BINARY=1. Covers the case where the test binary is built without FIPS tags but spawns a FIPS-enforcing dgraph subprocess.
    • Includes tests covering the env-var parsing (only exact "1" is true) and the default-false invariant.
  3. refactor(testutil): extract ComposeArgs, EnvForCompose, BuildPlugins hooks — three public function-var hooks in new testutil/hooks.go. StartAlphas/StopAlphasForCoverage call the compose-args and env-for-compose hooks. GeneratePlugins delegates to the BuildPlugins hook (default is the verbatim previous implementation). Upstream behavior byte-exact.

  4. test(t): add EnvForCompose + ComposeFileArgs hooks; resilience improvements — hook surface in t/hooks.go + threaded-through the test runner. Plus three universally-beneficial improvements:

    • stderr capture: startCluster captures docker-compose stderr and prints on failure. Previous cmd.Stderr = nil suppressed actionable error text (volume races, port conflicts, pull failures).
    • volume-race retry: startCluster retries docker compose up once after down -v on failure. Covers the docker volume-init race when multiple services share a named volume. One retry catches the class; anything persistent is a real error.
    • container-log iteration: stopCluster iterates every container in the project prefix rather than guessing zero0..zero3, alpha0..alpha6. Output is a fenced block per container for easier CI-log triage.
    • defer-err bugfix: defer stopCluster(..., err) was capturing err at defer-statement time (always nil), silently suppressing container-log dumps on test failure. Wrap in a closure to pick up the final err value.
    • testTags(baseTag) helper replaces inline "-tags=integration" strings and composes with buildvars.GoRunTags.
  5. refactor(compose,dgraphtest): add extension hooks; rename local copy → copyFile — one hook in compose/hooks.go (EmitUser, wired into service generation), six hooks in dgraphtest/hooks.go (ApplyContainerUser, WidenTempDirPerms, WidenSecretFilePerms, GeneratePlugins, SetupLinuxBinaries, HostBinaryName, wired into local_cluster.go/image.go/load.go). All hooks default to no-op / upstream-pristine behavior. Also renames the local func copy(src, dst string) error in dgraphtest/image.go to copyFile (the name shadowed the Go builtin copy).

  6. test: widen HMAC/JWT/MinIO test secrets >= 14 bytes; add FIPS-aware skip helpers — widens every HMAC/JWT/MinIO test secret to the 14-byte minimum NIST SP 800-131A requires and some FIPS-validated crypto providers enforce at EVP_MAC_init:

  • "secret""secret-long-enough" (schema VerificationKey, JWT signer)
  • "secret1""secret1-long-enough" (multi-tenancy)
  • "secretkey""secretkey-long-enough" (schema VerificationKey, MinIO secret key)
  • 9 hardcoded HS256 JWT fixtures in graphql/resolve/auth_test.go and graphql/e2e/auth/auth_test.go re-signed with the new key (verified against old key before replacing signature; no stray tokens modified).
  • Upgrade-path tests (TestCheckUpgrade, TestQueryDuplicateNodes, TestRestoreFromOldBackup) and EdDSA (TestACLJwtAlgo) gain a local skipIfFIPSBinary(t) helper that calls t.Skip when x.FIPSEnabled || x.FIPSBinary() is true. No-op on stock upstream builds.
  1. style(admin): gofmt admin_auth_test.gogofmt -w. Collapsed alignment in a map literal + trailing newline. No functional change.

Why

All items from the prior PR body still apply. The expanded scope comes from discoveries while maintaining a downstream fork that:

  • runs containers as nonroot (needs UID/GID env injection and filesystem-perm widening)
  • runs a compose-file generator (needs compose-path rewriting + --project-directory projection)
  • ships a custom Go toolchain (needs plugin-build override)
  • enforces FIPS 140-3 (needs short-HMAC-key avoidance and FIPS-incompatible test skips)

Every single one of these concerns was reachable by patching upstream files with //go:build guards or inline if os.Getenv(...) checks. Each such patch is a maintenance burden: the fork tracks upstream changes to the patched file and must re-apply edits on every rebase. Extracting extension hooks eliminates that burden — the fork owns a single registration file per package and never touches upstream source again. Upstream sees only: neutral public vars with upstream-pristine defaults, and resilience improvements (stderr capture, volume-race retry, fenced log output) that help every user.

API

// buildvars
Bin = buildvars.NewVar("BIN", "dgraph")
path := filepath.Join(gopath, "bin", buildvars.Bin.Get())
buildvars.Bin.SetDefault("custom-binary-name") // or: os.Setenv("BIN", ...)

// x
if x.FIPSEnabled || x.FIPSBinary() {
    t.Skip("test requires features not available under FIPS")
}

// extension hooks (all four packages follow the same pattern)
var ComposeFileArgs = defaultComposeFileArgs
func defaultComposeFileArgs(path, baseDir string) []string {
    return []string{"-f", path}
}
// A fork replaces the var from its own package init() to override behavior.

Tests

  • buildvars/buildvars_test.go: 18 unit tests (Get/Default/SetDefault/Export/ExportAll, env-override precedence, derived Vars, concurrent access).
  • x/fips_base_test.go: TestFIPSEnabledDefault (invariant) + TestFIPSBinaryEnv (env-var parsing across 6 states).
  • graphql/resolve/auth_test.go: TestStringCustomClaim, TestAudienceClaim, TestJWTExpiry all pass with re-signed JWT fixtures.
  • Full module build (go build ./..., excluding testutil/custom_plugins/* plugin-mode sources and graphql/bench test-only package) clean.
  • go vet clean on all touched packages (pre-existing warnings in conn/pool.go, schema/schema.go, graphql/admin/restore.go untouched).
  • gofmt -l clean on all touched files (the one remaining gofmt hit is generated protos/pb/pb.pb.go).

Backwards compatibility

  • All hook defaults are the verbatim previous implementations. Upstream behavior byte-for-byte identical.
  • Widened HMAC keys are strictly-longer; any HMAC implementation accepts longer keys. Re-signed JWT fixtures verify against the new keys.
  • FIPS skip helpers are no-ops unless DGRAPH_FIPS_BINARY=1 is set or the binary is built with a FIPS-enforcing tag.
  • copycopyFile rename in dgraphtest/image.go is a local function; no exported API touched.
  • buildvars.ExtraBuildTagsbuildvars.PrivateBuildTags Go ident rename does not change the env var name (PRIVATE_BUILD_TAGS unchanged).

Notes for review

  • Copyright headers use © 2017-2026 Istari Digital, Inc. SPDX-License-Identifier: Apache-2.0 on new files. Matches the repository convention for externally-contributed Apache-2.0 additions.
  • The hook-pattern godoc in each hooks.go file names the upstream-pristine default and a concrete downstream use case, so a reader unfamiliar with the pattern can follow it from the declaration alone.
  • x/fips_base.go is intentionally minimal: two symbols, no FIPS-specific logic. The actual FIPS cipher/curve/JWT-algorithm helpers stay in the fork (they are specific to Chainguard go-fips and MS-Go requirefips toolchains, not generally applicable upstream).

mlwelles added 17 commits April 22, 2026 14:57
Add a top-level buildvars/ package that centralizes the environment
variables controlling build-time configuration (BIN, DOCKER_IMAGE,
BUILD_IMAGE, BUILD_TAG, RUNTIME_IMAGE, RUNTIME_TAG, BUILD_TAGS, etc.).

Previously these values were scattered: hardcoded string literals in
source files (e.g. '/gobin/dgraph', 'dgraph/dgraph:local'), ad-hoc
os.Getenv calls, or re-computed in multiple Makefiles and Go files.
Consolidating them in a single registry has three benefits:

  1. One place to discover what every build-time var does — each Var
     carries godoc explaining its purpose, default, and downstream
     consumers.
  2. Overrides are centralized. A Var's default can be overridden by
     environment variable (os.Getenv wins over registered default) or
     programmatically via SetDefault. Call sites just read via Get().
  3. Downstream consumers (vendoring or private forks) can swap defaults
     without patching every call site.

API surface:

  - Var struct with Get() / Default() / SetDefault(s) / Export() methods
  - Package-level All slice listing every registered Var in declaration
    order
  - ExportAll() to push all Vars into os.Environ for subprocesses

Includes a small CLI at buildvars/cmd/buildvarprint that emits the
registry in shell / make / plain formats, useful for
  eval "$(go run ./buildvars/cmd/buildvarprint)"
or
  go run ./buildvars/cmd/buildvarprint -format=plain >> "$GITHUB_ENV"
in CI.

No call sites converted yet — subsequent commits migrate the existing
hardcoded literals and os.Getenv users.
…istry lookups

Migrate the call sites that previously hardcoded '/gobin/dgraph',
'dgraph/dgraph:local', or '$GOPATH/bin/dgraph' to read from the
buildvars registry introduced in the preceding commit. Benefits:

  - One place to change the binary name / image repo (override via
    os.Getenv(BIN) or buildvars.Bin.SetDefault)
  - Call sites become self-documenting — buildvars.Bin.Get() reads more
    clearly than a raw string literal at an arbitrary line number
  - Downstream consumers (vendoring or forking) can rename the binary
    without patching every caller

Go call sites migrated:
  - contrib/jepsen/main.go: --local-binary default uses
    buildvars.GoBinDgraphPath
  - dgraphtest/dgraph.go: zero/alpha container cmd args use
    buildvars.GoBinDgraphPath
  - dgraphtest/image.go: dgraphImage() and copyBinary() use
    buildvars.DockerImage and buildvars.Bin
  - testutil/exec.go: DgraphBinaryPath() uses buildvars.Bin

Shell scripts migrated:
  - systest/1million/test-reindex.sh
  - systest/21million/test-21million.sh
  - systest/loader-benchmark/loader-benchmark.sh
All three pass the container binary path as /gobin/${BIN:-dgraph},
matching the default buildvars.Bin produces; the :-dgraph fallback
keeps the scripts working for upstream callers who have no BIN set.

Incidental fix in dgraphtest/dgraph.go: getPortMappingsOnMac previously
panicked on a port entry like '8080/tcp' (no colon because the port was
not published to the host). Guard the split and continue past unpublished
ports rather than crash. Caught during buildvars migration testing.
…o build -o

Three related Makefile improvements that surface when trying to install
a dgraph build with a non-default binary name or on a non-default
Linux distro:

  1. dgraph/Makefile: 'BIN = dgraph' → 'BIN ?= dgraph'
     Allows callers (CI, downstream forks) to override the binary name
     without editing the Makefile. The underlying 'go install' already
     respects GOBIN + binary renames; only the Make variable needed
     the '?=' to let env overrides win.

  2. dgraph/Makefile: replace 'go install' with 'go build -o $(INSTALL_TARGET)'
     'go install' honors the package's default name (always 'dgraph'),
     which defeats a renamed $(BIN). 'go build -o' respects whatever
     path INSTALL_TARGET resolves to, so renames produce the correct
     output file.

  3. dgraph/Makefile: broaden HAS_JEMALLOC detection to /usr/lib and
     the _pic.a variant. /usr/local/lib covers 'make install' from
     source; /usr/lib covers apt-get libjemalloc-dev on Debian/Ubuntu;
     /usr/lib/libjemalloc_pic.a covers 'apk add jemalloc-dev' on Alpine
     and Wolfi. All three paths are valid standard locations — missing
     one silently disables jemalloc, which is surprising behavior.

  4. t/Makefile: thread $BIN through the 'make check' target. The
     previous hardcoded 'dgraph' path would report 'dgraph binary not
     found' even when the renamed binary exists. '$${BIN:-dgraph}'
     keeps upstream behavior identical while allowing overrides.
TestNameSpaceAwareRestoreOnSingleNode and TestNamespaceAwareRestoreOnMultipleGroups
rely on Docker port mappings that are unreliable on macOS during the
restore phase (ports can flap between the snapshot+load and the
post-restore connect). Linux CI runs see the test green consistently;
macOS dev runs intermittently fail with connection-refused or
nil-pointer crashes in getPortMappingsOnMac (fixed separately in the
buildvars commit of this series).

Gate both tests with runtime.GOOS != "linux" so developers running
the full systest suite locally on macOS don't see spurious failures.
CI behavior is unchanged.
Strips mentions of a specific private fork (by name) from the public
API surface of buildvars:

  - Rename env var ISTARI_COMPOSE_BUILD_DIR -> DGRAPH_COMPOSE_BUILD_DIR
    to match the dgraph/ binary naming convention instead of naming
    a specific downstream.
  - Rewrite godoc that mentioned a fork by name ('istari', 'Istari',
    'dgraph-sec', 'istari/gopkg/env/...') to generic language:
    'a private fork', 'a fork that needs ...', 'an init()-time
    SetDefault call in the package their registration tag guards'.
  - Test fixture uses 'myfork requirefips' instead of 'istari
    requirefips' as the extra-tags value so the test doesn't read
    like an advertisement for a particular downstream.

No behavior changes — buildvars.ComposeBuildDir.Get() still produces
an empty string by default; just the env var name changed.
…Go ident ExtraBuildTags → PrivateBuildTags

Two consistency-focused renames, one commit:

1. buildvars/cmd/buildvarprint → buildvars/cmd/buildvars
   The CLI was named after its original one-shot purpose (print build-env
   vars). As the registry grew into a general config surface, the "print"
   suffix became a misnomer. The new path aligns the CLI name with the
   top-level package name.

2. Var identifier ExtraBuildTags → PrivateBuildTags
   The env var backing this Var has been PRIVATE_BUILD_TAGS since before
   the registry existed (it is the long-standing Makefile knob a fork
   appends its build tags through). The Go ident "ExtraBuildTags" did
   not match, forcing every reader to chase godoc to reconcile it.
   Rename the Go ident to PrivateBuildTags so it pairs consistently
   with the env-var name, the way every other Var in the registry
   pairs its ident and env name (Bin/BIN, DockerImage/DOCKER_IMAGE,
   etc.).

   The env var name does not change — no Makefile, CI, or script needs
   to change. Godoc updated to note that "private" here means fork-local
   build choices, not access control.

Also renamed the test function TestBuildTags_ComposesWithExtraBuildTags
→ TestBuildTags_ComposesWithPrivateBuildTags to match.

Tested: go build ./buildvars/... , go test ./buildvars/ both pass.
Introduces two minimal, coordinated FIPS-awareness signals in package x:

  - FIPSEnabled (var, default false): compile-time signal that this
    binary is FIPS-enforcing. Declared as a var (not a const) so a
    sibling file guarded by a FIPS-enforcing build tag (e.g. a Chainguard
    go-fips build under requirefips, or a microsoft/go build) can flip
    the value to true via package init(), before any test or main body
    runs. For practical call-site purposes — `if x.FIPSEnabled { ... }`
    — the var behaves identically to a compile-time constant.

  - FIPSBinary() (function, returns os.Getenv("DGRAPH_FIPS_BINARY") == "1"):
    runtime signal that the dgraph binary under test (not necessarily
    the test binary itself) is FIPS-restricted. Covers the case where
    an integration test binary is built without the FIPS tag but
    spawns a FIPS-enforcing dgraph subprocess — e.g. upgrade-path
    tests that git-checkout a historical SHA and build the "old"
    binary with a FIPS toolchain.

Neither symbol has any behavior on a stock upstream build: FIPSEnabled
stays false, FIPSBinary() returns false unless the operator explicitly
sets DGRAPH_FIPS_BINARY=1. The utility is for testing against FIPS-
enforcing binaries — test helpers can now branch cleanly on either
signal (the test binary IS FIPS, or the binary it WILL launch is FIPS)
without touching fork-specific files.

Tested: go build ./x/... , go test -run TestFIPS ./x/ both pass. Added
TestFIPSEnabledDefault and TestFIPSBinaryEnv covering the env-var
parsing (only exact "1" is true).
…hooks

Introduces three public function-var hooks in a new testutil/hooks.go:

  var ComposeArgs    = defaultComposeArgs     // returns "-f <path>"
  var EnvForCompose  = defaultEnvForCompose   // returns nil
  var BuildPlugins   = defaultBuildPlugins    // in-process `go build -buildmode=plugin`

Each carries godoc naming the upstream-pristine default and a concrete
downstream use case. The defaults preserve the existing behavior
bit-for-bit: testutil.StartAlphas / StopAlphasForCoverage build the
same docker-compose argv they always did; testutil.GeneratePlugins
delegates to defaultBuildPlugins which is the verbatim body of the
previous GeneratePlugins function.

Call sites refactored:
  - testutil/bulk.go: StartAlphas and StopAlphasForCoverage use
    ComposeArgs(compose) to build file-selector args and EnvForCompose()
    to set cmd.Env. Upstream behavior unchanged (ComposeArgs returns
    "-f path", EnvForCompose returns nil -> os.Environ() unchanged).
  - testutil/plugin.go: GeneratePlugins is now a thin wrapper that
    calls the BuildPlugins hook. defaultBuildPlugins holds the
    previous implementation and runs stock `go build -buildmode=plugin`.

The hook pattern lets a downstream fork replace any of these three
defaults from its own package init without modifying testutil call
sites. Stock upstream builds see no behavioral change.

Tested: go build ./testutil/ , go test -count=1 ./testutil/ both pass.
…ements

Adds public hook surface to the t test runner and threads several
resilience and observability improvements through startCluster,
stopCluster, runTestsFor, and the cluster-lifecycle helpers.

New hook file t/hooks.go declares two public function-var hooks:

  var EnvForCompose    = defaultEnvForCompose     // returns nil
  var ComposeFileArgs  = defaultComposeFileArgs   // returns ["-f", path]

Both default to the previous inline behavior. A downstream fork that
ships a compose-file generator can override ComposeFileArgs to rewrite
the path + emit --project-directory so relative bind-mount sources
resolve against the pristine test package dir rather than the overlay
output dir. EnvForCompose lets a fork inject e.g. UID/GID so
`${UID:-65532}` placeholders in generated compose files resolve to the
host UID and bind-mounts stay writable.

Resilience improvements (no hook needed; universally beneficial):

  - startCluster captures docker-compose stderr into a strings.Builder
    and surfaces it on failure. The previous code set cmd.Stderr = nil,
    which suppressed actionable error text (volume-init races, port
    conflicts, image pull failures) and made CI-only bring-up failures
    unreadable. Keep Stderr captured; only print on failure.

  - startCluster retries `docker compose up` once after a full `down -v`
    when the first attempt fails. Covers a docker volume-init race when
    multiple services share a named volume: on first use docker populates
    the volume and concurrent services can collide with "failed to mkdir
    .../_data/<entry>: file exists". One retry is enough for this class;
    anything persistent is a real configuration error.

  - stopCluster log-dump iterates every container in the project prefix
    (AllContainers) rather than guessing names. Previous code hard-coded
    "zero0..zero3" and "alpha0..alpha6" which often misses containers
    (e.g. minio, nfs-backup) and tries to fetch logs for names that do
    not exist. Output is now a fenced block per container for easier
    triage from CI log viewers.

  - `defer stopCluster(compose, prefix, wg, err)` was wrapped in a
    closure so the err value passed to stopCluster is the value AT
    DEFER RUN TIME (after runTestsFor returns), not the defer-statement-
    time value which is always nil. This makes stopCluster actually
    dump container logs on test failure; the previous form silently
    suppressed them.

Helper changes:

  - testTags(baseTag) returns a comma-joined tag list composed of the
    optional base tag + every tag in buildvars.GoRunTags. Replaces
    inline "-tags=integration" strings at 2 call sites. Stock upstream
    builds see identical behavior (GoRunTags is empty); the helper
    lets a fork thread fork-only tags into `go test` without forking
    t.go.

  - ComposeFileArgs call sites: startCluster (up + down retry),
    resumeDefault (stop + start). 4 total.

Tested: go build ./t/ passes (OSS mode).
…→ copyFile

Adds public function-var hooks to two more test-infra packages so
downstream forks can customize behavior without modifying call sites.

compose/hooks.go (new, 1 hook):
  var EmitUser = defaultEmitUser
Default returns empty string (no user: field emitted in generated
compose services, matching upstream behavior). A fork may return
e.g. "${UID:-65532}" to pin containers to the host UID for bind-mount
compatibility under nonroot runtime images. Wired in compose.go Service
generation.

dgraphtest/hooks.go (new, 6 hooks):
  var ApplyContainerUser   // no-op default; fork pins cfg.User to host UID
  var WidenTempDirPerms    // no-op default; fork widens 0700 -> 0755
  var WidenSecretFilePerms // no-op default; fork widens 0600 -> 0644
  var GeneratePlugins      // returns (_, handled=false, nil); fork builds plugins in its toolchain
  var SetupLinuxBinaries   // returns (handled=false, nil); fork runs its two-binary staging
  var HostBinaryName       // returns ""; fork returns its custom binary name

Call sites added in dgraphtest/local_cluster.go, dgraphtest/image.go,
dgraphtest/load.go — all preserve upstream behavior when the hooks
return their zero values. Rough call-site counts:
  - local_cluster.go: 5 hook calls (2 WidenTempDirPerms, ApplyContainerUser,
    2 WidenSecretFilePerms, 1 GeneratePlugins)
  - image.go: 1 hook call (SetupLinuxBinaries)
  - load.go: 1 hook call (HostBinaryName)

Collateral rename in dgraphtest/image.go:
  func copy(src, dst string) error  ->  func copyFile
The previous name shadowed the Go built-in `copy()`. The local function
takes (src, dst string) and has no relation to the built-in, but the
shadow was a subtle code-review hazard for anyone skimming the file.
Rename all 3 call sites to copyFile; drop the shadow.

Tested: go build ./compose/ , go build ./dgraphtest/ both pass.
…kip helpers

Widens every HMAC/JWT/MinIO test secret to meet the 14-byte (112-bit)
key-length minimum that NIST SP 800-131A requires and that some FIPS-
validated crypto providers enforce at EVP_MAC_init. Upstream builds see
no behavioral change — a longer HMAC key is always acceptable. The
wider keys let the test suite also run against a FIPS-enforcing alpha/
minio subprocess without hitting "invalid key length" crashes.

Key renames (neutral, describe length without leaking FIPS rationale):

  "secret"            -> "secret-long-enough"        (schema VerificationKey, JWT signer)
  "secret1"           -> "secret1-long-enough"       (namespace-1 multi-tenancy)
  "secretkey"         -> "secretkey-long-enough"     (schema VerificationKey, MinIO secret key)

Hardcoded HS256 JWT fixtures in graphql/resolve/auth_test.go (7 tokens
across TestStringCustomClaim, TestAudienceClaim, TestJWTExpiry) and
graphql/e2e/auth/auth_test.go (2 tokens in TestQueryWithStandardClaims)
were re-signed with the new "secretkey-long-enough" key. Re-signing
used a standalone Python HMAC-SHA256 signer that verifies each tokens
existing signature against the old key before replacing the signature,
so any token not signed with the expected key is left untouched.

FIPS-aware test skips:

  check_upgrade/check_upgrade_test.go and systest/integration2/acl_test.go
  add a local skipIfFIPSBinary(t) helper that calls t.Skip when either
  x.FIPSEnabled is true (this test binary is FIPS-tagged) or
  x.FIPSBinary() is true (operator set DGRAPH_FIPS_BINARY=1). Upgrade-
  path tests pin a specific pre-FIPS upstream SHA and build the old
  binary at runtime; that commit predates any FIPS-enforcing toolchain,
  so running these tests under a FIPS configuration fails pointlessly.

  acl/jwt_algo_test.go (TestACLJwtAlgo) skips the EdDSA case when FIPS
  is in effect. Ed25519 is outside the validated FIPS boundary for
  Gos crypto library so EdDSA cluster startup panics under FIPS.

All skip helpers use x.FIPSBinary() (the env-gated runtime signal
introduced in the previous commit) rather than reading os.Getenv
inline, so the env-var contract is centralized.

Tested: go build ./..., go test -run "TestStringCustomClaim|TestAudienceClaim|TestJWTExpiry" ./graphql/resolve/ (passes; re-signed JWTs verify), go build -tags=integration2 ./graphql/... ./acl/ ./check_upgrade/ ./systest/integration2/ all pass.
Previously the map literal had extra spacing alignment (backup "     ",
config "     ", draining "   ", restore "    ") that gofmt no longer
emits \u2014 the current gofmt collapses to single-space after the colon
when any key exceeds a threshold. Also the file was missing a trailing
newline.

Apply `gofmt -w graphql/admin/admin_auth_test.go`. No functional change.
Code review fixes:

dgraphtest/load.go + dgraphtest/image.go:
  - HostDgraphBinaryPath used hardcoded "dgraph" on Linux instead of
    buildvars.Bin.Get(). copyBinary (image.go) already writes the
    binary to <tempBinDir>/buildvars.Bin.Get() so a user who sets
    BIN without also setting HostBinaryName would get a stale path.
    Fix: consult buildvars.Bin.Get() in the Linux branch.
  - "dgraph_host" was hardcoded in two places (image.go line 85,
    load.go line 43) and could typo-drift. Extract to a package-level
    constant hostBinaryFileName in load.go; reference from both.
  - image.go line 83 "hostSrc" was $GOPATH/bin/dgraph; switch to
    $GOPATH/bin/buildvars.Bin.Get() for consistency with the binary
    name the rest of the package uses.
  - Godoc on HostDgraphBinaryPath updated to reference buildvars.Bin
    and the HostBinaryName hook so the contract is self-documenting.

t/hooks.go:
  - defaultComposeFileArgs ignored the baseDir parameter but had a
    named receiver, which reads as if the default would use it. Rename
    to _ and add a brief godoc explaining why the parameter exists
    even though the default ignores it (so overrides that need it
    need not thread it through some other channel).

buildvars/buildvars.go + buildvars/buildvars_test.go + buildvars/cmd/buildvars/main.go:
  - Copyright year bumped from 2017-2025 to 2017-2026, matching the
    other new files in this PR (hooks.go files, x/fips_base.go). All
    files in this PR now have consistent 2026 copyright.
  - Example binary name "dgraph-sec" in godoc and test fixtures
    replaced with "myfork-dgraph" — consistent with the "myfork
    requirefips" placeholder already used in the ExtraBuildTags
    godoc, so no downstream fork is named by example.

Tested: go build ./... , go test ./buildvars/ ./x/ ./graphql/resolve/ -run "TestStringCustomClaim|TestAudienceClaim|TestJWTExpiry" all pass. gofmt -l clean on touched files.
HTTP /health returns OK before alpha has fully re-established its
internal connection pool to zero after a docker-compose start. When
tests run immediately after resumeDefault, the first query (DropAll,
Alter, Login) occasionally hits "No connection exists" because
groups().Leader(0) is still nil even though /health is 200.

Add a 5-second sleep after the health-check wait-group in
resumeDefault so membership sync can complete before the next test
runs. Cheap compared to a whole cluster recreate on failure. Does
not affect startCluster (the initial bring-up already includes its
own health checks + retries).

Surfaced by a test harness running many pause/resume cycles on the
default cluster: the first test after a resume failed in 0.08s on
DropAll/Alter/Login with "No connection exists", while subsequent
runs of the same test pass.
Observed in CI that the previous retry-once logic was insufficient for
the docker volume-init race when multiple services share a named
volume. The retry happened only milliseconds after `down -v`, which
was not long enough for docker to release the volume mount points
from the first failed attempt. The second `up` tried to populate a
fresh volume and hit the same race.

Two fixes:

1. Increase upAttempts from 2 to 3 (one extra retry).
2. Sleep 2 seconds between `down -v` and the next `up` so docker has
   time to release mount points.

The cost is 2 seconds in the rare case where bring-up succeeds on the
retry, versus a full cluster-recreate path (10+ seconds) on failure.
Cheap.
Resolves conflict in dgraphtest/dgraph.go by combining:
  - buildvars.GoBinDgraphPath.Get() from our buildvars-registry PR
  - myAddrOverride handling from upstream #9680

Both changes apply cleanly to the same line of zero.cmd(); the merge
preserves both. Alpha command line is unchanged (it was not touched by
9680).

Upstream changes brought in:
  - conn/node.go, conn/pool.go, worker/groups.go: leader-driven
    reconciliation of zero --my address
  - dgraph/cmd/zero/raft.go, zero.go: ConfChangeUpdateNode proposal loop
  - dgraph/cmd/debug/wal.go: WAL replay of updated address
  - dgraphtest/local_cluster.go, zero_state.go: test harness for above
  - systest/integration2/zero_addr_test.go: new test

No behavior change to the buildvars-registry PR scope.
Matches commit 220a9ce5d on the fork. The Go code at:
  systest/bulk_live/common/bulk_live_fixture.go:128
  systest/backup/encryption/backup_test.go:112
  systest/backup/minio/backup_test.go:108
  systest/cloud/cloud_test.go:165,169
  testutil/graphql.go:284
  testutil/minio.go

widens MINIO_SECRET_KEY from "secretkey" to "secretkey-long-enough"
(part of the HMAC-widening commit 8d185f9). But the MinIO server
configured by the corresponding .env files still used "secretkey",
causing TestBulkCases, TestBackupMinioE, TestBackupMinio to fail with
"The request signature we calculated does not match the signature you
provided" when the Go client tries to sign with the widened key
against a MinIO server expecting the narrow one.

Fix: update MINIO_SECRET_KEY to "secretkey-long-enough" in all three
.env files:
  - dgraph/minio.env
  - systest/backup.env
  - systest/export/export.env

Server and client now agree. Observed failing in upstream PR CI
runs 24878931876 (dgraph-load-tests) and 24878931803 (dgraph-systest-tests).
@mlwelles mlwelles requested a review from a team as a code owner April 24, 2026 16:40
@github-actions github-actions Bot added area/testing Testing related issues area/graphql Issues related to GraphQL support on Dgraph. area/integrations Related to integrations with other projects. area/acl Related to Access Control Lists area/testing/jepsen area/core internal mechanisms go Pull requests that update Go code labels Apr 24, 2026
@mlwelles
Copy link
Copy Markdown
Contributor Author

Superseded by #9691 with clean logical commit history. Same final content; restructured for reviewability with 9 commits grouped by area of concern. Please move review to #9691.

@mlwelles mlwelles closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/acl Related to Access Control Lists area/core internal mechanisms area/graphql Issues related to GraphQL support on Dgraph. area/integrations Related to integrations with other projects. area/testing/jepsen area/testing Testing related issues go Pull requests that update Go code

Development

Successfully merging this pull request may close these issues.

1 participant