fix(test): macOS local image, plugin tests, and test suite restructuring#9610
Merged
fix(test): macOS local image, plugin tests, and test suite restructuring#9610
Conversation
When running `make test` (--suite=all), the load and ldbc download
blocks in t/t.go shared the same *tmp directory. The ldbc block's
MakeDirEmpty call wiped files downloaded by the load block, causing
systest/1million to fail with missing schema files.
Hoist directory initialization above both download blocks so
MakeDirEmpty runs exactly once. Both datasets coexist in the same
directory since their filenames don't overlap. Also use a dedicated
subdirectory (dgraph-test-data) instead of bare os.TempDir() to
avoid wiping the system temp directory.
Add testSuiteContainsAny() helper to replace repeated
testSuiteContains("x") || testSuiteContains("y") patterns.
30 Docker Compose files hardcoded $GOPATH/bin as the binary mount
source. On macOS, this mounts the native macOS binary into Linux
containers, causing them to fail on startup.
Replace all 78 occurrences with ${LINUX_GOBIN:-$GOPATH/bin} to match
the pattern already used in dgraph/docker-compose.yml. On Linux,
LINUX_GOBIN defaults to $GOPATH/bin (no change). On macOS, it points
to the cross-compiled Linux binary directory.
Add a configurable per-package test timeout flag to the t/ runner. Previously the timeout was hardcoded to 30m (or 180m with --race), which caused the 21million/live test to time out on slower machines. Usage: make test TIMEOUT=90m cd t && ./t --suite=all --timeout=60m Defaults remain unchanged: 30m normal, 180m with --race. An explicit --timeout overrides both.
Remove empty if-branch flagged by staticcheck SA9003 in t/t.go and fix markdown table alignment in TESTING.md for prettier compliance.
The default `make test` (no args) previously ran --suite=all (~60+ min). Now it runs unit,systest,core suites plus integration2 tests (~30 min) for a faster local feedback loop. Changes: - Split the else branch in `test` target: SUITE set → explicit suite; nothing set → default (unit,systest,core + integration2) - Add $(origin) guards on all test-* targets to prevent confusing variable conflicts (e.g. `make test-unit SUITE=ldbc` now errors) - Add `test-suites` target (runs all t/ runner suites via SUITE=all) - Add `test-everything` target (all suites + integration + integration2 + upgrade + fuzz) - Update TESTING.md, CONTRIBUTING.md, AGENTS.md to reflect new defaults - Update `make help` output with new default description
Resolve conflicts in Makefile and CONTRIBUTING.md, keeping the PR's dual-command default (unit,systest,core + integration2) and the SUITE=all example line.
Shorter, clearer name for the target that runs every test in the repo (all suites + integration + integration2 + upgrade + fuzz).
test-suite now accepts an optional SUITE= argument (defaults to all), making it a flexible entry point for running any t/ runner suite.
AGENTS.md is a local Claude Code config file that should not be tracked in version control.
…s-compilation - Remove broken test-integration target: TAGS=integration bypassed the t/ runner, skipping Docker Compose orchestration and plugin compilation that integration tests require. Use SUITE= to route through the t/ runner instead. - Fix gotestsum PATH resolution in t/t.go: add gotestsumBin() that resolves to $GOPATH/bin/gotestsum instead of relying on PATH lookup, which fails on machines where $GOPATH/bin is not in PATH. - Add cross-compilation support for Go plugins on macOS: detect non-Linux hosts and set CGO_ENABLED=1, CC to the appropriate cross-compiler, and use the BFD linker (both testutil/plugin.go and dgraphtest/local_cluster.go). - Add check-cross-compiler.sh dependency check script. - Add top-level make deps and make setup targets for dependency management. - Update TESTING.md and CONTRIBUTING.md to document new targets and recommend make setup for first-time onboarding. - Remove test-integration from test-full (redundant: SUITE=all covers it).
The local-image and install targets cross-compiled the dgraph binary without CGO, producing a statically linked binary that cannot dlopen() Go plugin .so files. This caused all plugin tests to fail on macOS with "Invalid tokenizer anagram". Changes: - Add LINUX_CC variable for architecture-aware cross-compiler selection - Enable CGO_ENABLED=1 with cross-compiler in install and local-image - Use BFD linker (-fuse-ld=bfd) since gold is not in cross-toolchains - Skip jemalloc (BUILD_TAGS=) for cross-compilation (headers unavailable) - Add EXTLDFLAGS support to dgraph/Makefile for external linker flags - Split t/Makefile check into deps (tools) + check (tools + binary) - Top-level setup now calls t/deps so it no longer requires the binary
…ent restore waits - Add deploy memory limits to all docker-compose services (zeros: 512M, alphas: 2-4GB, minio: 512M) to prevent OOM kills on macOS Docker Desktop - Add --cache "size-mb=1024" to alpha commands for explicit cache sizing - Implement pause/resume of the default cluster during custom-cluster tests so the full Docker memory budget is available for custom clusters - Make WaitForRestore resilient to transient errors (connection reset, unavailable, transport errors) with a 10-minute deadline instead of infinite loop - Simplify dgraph-installed Make target to always rebuild - Ensure $GOPATH/bin is in PATH for subprocess tool discovery
…compose Add deploy memory limits (zeros: 512M, alphas: 2048M, minio: 512M) and --cache "size-mb=1024" to all alpha commands in the main dgraph test cluster docker-compose file, matching the changes in the online-restore compose file. Prevents OOM kills on memory-constrained Docker Desktop VMs.
Capture get_os() output in a variable before using it in conditionals, preventing the return value from being masked by [[ ]]. Reuse the variable in the later case statement.
Contributor
This just shifts OOM kills from the OS to docker, not sure if this really accomplishes much. Setting a memory limit on docker containers does not make applications use less memory. |
…rue unit mode Restructure test suites to separate lightweight from resource-intensive tests: - Add `integration` suite as the new default (replaces old `unit,systest,core`), excluding ldbc, load, and systest-heavy packages - Add `systest-baseline` and `systest-heavy` sub-suites; `systest` runs both - Add `heavyPackages` list for resource-intensive tests (minio, encryption, tracing, online-restore) that can OOM on macOS Docker Desktop - Make `unit` suite truly unit-only: no Docker cluster, no `--tags=integration`, skips custom-cluster packages entirely - Add `make test-integration` and `make test-integration-heavy` targets - Rename `make test-full` to `make test-all`; remove `test-suite`, `test-ldbc`, `test-load` targets (use `make test SUITE=...` instead) - Update CONTRIBUTING.md, TESTING.md, and Makefile help text to match
matthewmcneely
approved these changes
Feb 25, 2026
The TestImportApis/SingleGroupShutOneAlpha integration2 test was flaking because initiateSnapshotStream would fail immediately when the Dgraph server returned "overloaded with pending proposals" during Raft membership changes. Add exponential backoff retry (1s→10s cap, 60s max) for transient "overloaded" errors while preserving fast failure for non-retryable errors like connectivity loss.
Remove deploy memory limits and --cache "size-mb=1024" flags from dgraph/docker-compose.yml and systest/online-restore/docker-compose.yml. These just shift OOM kills from the OS to Docker without actually solving the underlying resource problem, and they add noise to the compose files.
Contributor
Author
|
@xqqp: Good point. While they did seem to improve stability, it was a purely subjective seeming, and you're right -- they really just shift where the OOM kill comes from, and the amount of noise they add to the compose files is icky. Removed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes macOS compatibility for local image builds and plugin tests, restructures the test suite system for better ergonomics and resource management, and fixes a flaky integration2 test.
macOS / Local Build Fixes
gotestsumPATH resolution, and cross-compilation issuescheck-cross-compiler.shTest Suite Restructuring
integrationsuite (new default): runs everything exceptldbc,load, andsystest-heavy— replaces oldunit,systest,coredefaultunitsuite behavior: true unit tests only — no Docker cluster, no--tags=integration, skips custom-cluster packagessystest-baseline/systest-heavysub-suites: separates lightweight systests from resource-intensive ones (minio, encryption, tracing, online-restore) that can OOM on Dockersysteststill runs both sub-suites for backward compatibilitytest-integration,test-integration-heavytest-full→test-alltest-suite,test-ldbc,test-load(usemake test SUITE=...instead)Import Client Retry Fix
TestImportApis/SingleGroupShutOneAlphain thedgraph-integration2-testsCI jobinitiateSnapshotStreammade a single attempt to start the snapshot stream, but the Dgraph server can return "overloaded with pending proposals" during Raft membership changes (e.g., when an alpha is shut down and the cluster is rebalancing)Quick Reference
make testintegrationsuite +integration2(~30 min)make test-unitmake test-integrationmake test-integration-heavymake test-allTest plan
make test-unitruns without starting Docker cluster and discovers only non-integration testsmake test(integration suite) correctly excludes heavy/ldbc/load packagesmake test-integration-heavy(systest-heavy) correctly runs only heavy packages (11 packages)make test SUITE=systestruns both systest-baseline and systest-heavy (30 packages)make test SUITE=systest-baselineruns only lightweight systests (18 packages)TestImportApis/SingleGroupShutOneAlphapasses reliably with retry logic (CI)