Make functional tests workspace-aware and stabilize local debug stack#11975
Make functional tests workspace-aware and stabilize local debug stack#11975sylvainsf wants to merge 3 commits into
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11975 +/- ##
==========================================
- Coverage 51.83% 51.81% -0.02%
==========================================
Files 728 728
Lines 45960 45971 +11
==========================================
- Hits 23824 23822 -2
- Misses 19868 19876 +8
- Partials 2268 2273 +5 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Adds a workflow for running the corerp/cloud Azure functional tests
against a local OS-process Radius stack (`make debug-start`) using the
host's `az login` credentials, with no service-principal/workload-identity
registration required.
Highlights
- New `build/scripts/azure-local-testenv.sh` orchestrator with
`setup`, `run`, `teardown`, `all` sub-commands. `run` and `all` accept
passthrough `go test` flags (e.g. `-run`, `-v`).
- Auto-recovery: `run` rebuilds state from the newest
`radlocal-${USER}-*` resource group when the state file is missing
(e.g. after `make debug-stop`), and re-applies the Azure scope on the
default rad environment that `debug-start` wipes.
- Orphan GC: `teardown --all-orphans` deletes every
`radlocal-${USER}-*` RG and stops the `tf-module-server` port-forward.
- `tf-module-server` bootstrap: deploys the in-cluster nginx test module
server and port-forwards it to `localhost:8999` automatically when not
already reachable.
- Terraform Azure provider falls back to `use_cli = true` when no Azure
credential is registered with UCP (404), letting the host RP's
`az login` session authenticate. CI workload-identity path is
unchanged.
- `start-radius.sh` exports `TERRAFORM_TEST_GLOBAL_DIR` so the RP no
longer tries to write to read-only `/terraform`.
- AWS-required tests skip cleanly via `t.Skip` when AWS env vars are
unset; private-git redis test skips when `GH_TOKEN` is unset.
- `recipe_terraform_test.go` now derives the resource ID from the
active workspace scope so it works against any RG (CI's `kind-radius`
and local debug's `default`).
Tested
Full `corerp/cloud/...` suite green locally:
- PASS: `Test_AzureConnections`, `Test_ACI`, `Test_TerraformRecipe_AzureResourceGroup`
- SKIP: AWS-only tests, `Test_TerraformPrivateGitModule_KubernetesRedis`,
`Test_Storage`/`Test_PersistentVolume` (issue #7853, pre-existing)
Documentation in
`docs/contributing/contributing-code/contributing-code-debugging/radius-os-processes-debugging.md`.
Signed-off-by: Sylvain Niles <[email protected]>
- GetSecretSuffix derives the resource group from the active workspace instead of hardcoding kind-radius, so tests pass on local debug stack. - Test_ApplicationGraph rewrites fixture resource group at runtime to match the active workspace. - debug.mk: install Contour and tf-module-server with explicit rollout checks (Helm --wait does not work on k3d for LoadBalancer services). - Misc test/CLI cleanups for running corerp-noncloud against an OS-process Radius stack.
This broke pkg/recipes/driver/bicep Test_Bicep_GetRecipeMetadata_*, which runs a fake HTTPS registry on 127.0.0.1. With the loopback heuristic the driver issued http:// requests to an HTTPS server and got '400 Bad Request' instead of the expected 'not found'.
be17e1e to
8be9a40
Compare
Radius functional test overviewClick here to see the test run details
Test Status⌛ Building Radius and pushing container images for functional tests... |
|
Closing in favor of #11904, which contains every commit from this branch plus the postgres TIMESTAMPTZ pagination fix, the noncloud-test-learnings doc, additional debug-stack stabilization, and the CI workflow unblock. The PR description over there has been updated to a synthesis covering all of it. |
There was a problem hiding this comment.
Pull request overview
This PR makes Radius functional tests and the local OS-process debug stack more portable by removing assumptions about a hard-coded kind-radius setup and improving the reliability of the make debug-* workflow. It also adds a local Azure test orchestrator and adjusts Terraform/Azure credential handling to support running against a locally started stack.
Changes:
- Make functional test helpers workspace-aware (derive scope/resource group from the active
radworkspace; adjust fixtures/tests accordingly). - Add/extend local-dev tooling for Azure cloud tests and OS-process debug runs (new orchestrator scripts, Make targets, debug-stack stabilization).
- Improve recipe engine/provider behavior and add guardrails/tests/docs discovered during end-to-end runs.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| test/validation/shared.go | Avoid list-based read-after-write races; add local-dev bypass for cloud credential checks in tests. |
| test/ucp/ucptest.go | Prefer active workspace connection (supports UCP override for local OS-process stacks). |
| test/rp/rptest.go | Delete RP resources in reverse order to avoid environment deletion racing child cleanup. |
| test/functional-portable/corerp/util.go | Derive RG from workspace scope; rewrite resource IDs to match workspace RG for secret-suffix calculation. |
| test/functional-portable/corerp/noncloud/resources/testdata/corerp-resources-simulatedenv.bicep | Stop mutating shared default environment by using a uniquely named env/namespace. |
| test/functional-portable/corerp/noncloud/resources/application_test.go | Rewrite fixture RG on read based on active RootScope before asserting ApplicationGraph. |
| test/functional-portable/corerp/cloud/resources/recipe_terraform_test.go | Use workspace scope for resource IDs; skip private-Git terraform-module test locally when GH_TOKEN is unset. |
| test/functional-portable/corerp/cloud/resources/extender_test.go | Skip AWS log-group test when required AWS env vars aren’t present (instead of failing). |
| test/functional-portable/cli/noncloud/cli_test.go | Add a watchdog to prevent rad run streaming-log test from hanging indefinitely. |
| test/createAzureTestResources.bicep | Parameterize Cosmos account name (keep default) to support unique naming in local orchestration. |
| pkg/recipes/terraform/config/providers/azure.go | Fall back to Azure CLI auth (use_cli = true) when Radius-managed credentials are missing. |
| pkg/recipes/engine/engine.go | Treat “environment not found” (404) during recipe delete config-load as a successful no-op. |
| pkg/corerp/frontend/controller/applications/updatefilter.go | Improve bad-request error messaging for invalid app-scoped namespaces (include lengths). |
| pkg/corerp/frontend/controller/applications/testbicep_scan_test.go | Add scan test to prevent test Bicep from mutating shared default environment. |
| pkg/azure/clientv2/unfold.go | Preserve response body for repeated unfolding by resetting resp.Body after reading. |
| docs/contributing/contributing-code/contributing-code-debugging/radius-os-processes-debugging.md | Document local DE usage and local Azure functional test flow against OS-process stack. |
| build/test.mk | Auto-detect local debug registry/CLI and git-http backend; add local Azure functional test make targets; improve gotestsum jsonfile support. |
| build/scripts/start-radius.sh | Export a writable TERRAFORM_TEST_GLOBAL_DIR for OS-process runs on host filesystems. |
| build/scripts/mirror-test-images.sh | New helper to mirror multi-arch images to ghcr.io/radius-project/mirror/*. |
| build/scripts/ensure-encryption-key.sh | New helper to create a stable encryption-key secret for the OS-process debug stack. |
| build/scripts/azure-local-testenv.sh | New Azure local test orchestrator (setup/run/teardown/all) with state recovery and orphan GC. |
| build/recipes.mk | Support plain-http publishing for localhost recipe registry via PLAIN_HTTP. |
| build/debug.mk | Add debug registry/git backend/flux/contour/tf-module server/bicep types automation; improve debug-start and DE handling. |
| .gitignore | Ignore local debug artifacts (including local-only bicepconfig.json override). |
| .github/scripts/publish-recipes.sh | Add optional --plain-http when publishing recipes (for localhost registry). |
| current := parsed.FindScope(resources_radius.ScopeResourceGroups) | ||
| if current == "" || current == rg { | ||
| return resourceID | ||
| } | ||
| // Case-insensitive replacement of the resourcegroups segment value while preserving the | ||
| // rest of the ID exactly. | ||
| return strings.ReplaceAll(resourceID, "/"+current+"/", "/"+rg+"/") | ||
| } |
| config, err := cli.LoadConfig("") | ||
| if err != nil { | ||
| return "kind-radius" | ||
| } |
| GOTESTSUM_JSONFILE_DIR ?= | ||
| # Recursive '=' so $@ resolves in each recipe's context. | ||
| # We need the double dash here to separate the 'gotestsum' options from the 'go test' options. | ||
| GOTEST_TOOL = gotestsum $(GOTESTSUM_OPTS)$(if $(GOTESTSUM_JSONFILE_DIR), --jsonfile=$(GOTESTSUM_JSONFILE_DIR)/[email protected]) -- |
| @listener_cmd=""; \ | ||
| if command -v lsof >/dev/null 2>&1; then \ | ||
| listener_cmd=$$(lsof -nP -iTCP:5017 -sTCP:LISTEN 2>/dev/null | awk 'NR==2 {print $$1}'); \ | ||
| fi; \ | ||
| if [ -n "$$listener_cmd" ] && [ "$$listener_cmd" != "kubectl" ] && curl -s "http://localhost:5017/metrics" > /dev/null 2>&1; then \ |
| if ! command -v docker >/dev/null 2>&1; then | ||
| echo "docker is required" >&2; exit 1 | ||
| fi | ||
| if ! docker buildx version >/dev/null 2>&1; then | ||
| echo "docker buildx is required (included with Docker Desktop)" >&2; exit 1 | ||
| fi |
| require_cmd kubectl make | ||
| if curl -sf -o /dev/null -m 2 http://localhost:8999/azure-rg.zip; then | ||
| log "tf-module-server already reachable at http://localhost:8999" | ||
| return 0 | ||
| fi | ||
| if ! kubectl get ns "${TF_MODULE_SERVER_NS}" >/dev/null 2>&1 \ | ||
| || ! kubectl -n "${TF_MODULE_SERVER_NS}" get deploy tf-module-server >/dev/null 2>&1; then | ||
| log "Deploying tf-module-server into the debug cluster (publish-test-terraform-recipes)..." | ||
| (cd "${REPO_ROOT}" && make publish-test-terraform-recipes >/dev/null) \ | ||
| || { err "make publish-test-terraform-recipes failed"; exit 1; } | ||
| fi | ||
| log "Waiting for tf-module-server rollout..." | ||
| kubectl -n "${TF_MODULE_SERVER_NS}" rollout status deploy/tf-module-server --timeout=120s >/dev/null \ | ||
| || { err "tf-module-server rollout did not become ready"; exit 1; } | ||
| # Stop any stale port-forward before starting a new one. | ||
| if [[ -f "${TF_MODULE_SERVER_PORT_FORWARD_PID_FILE}" ]]; then | ||
| local old_pid | ||
| old_pid="$(cat "${TF_MODULE_SERVER_PORT_FORWARD_PID_FILE}" 2>/dev/null || true)" | ||
| if [[ -n "${old_pid}" ]] && kill -0 "${old_pid}" 2>/dev/null; then | ||
| kill "${old_pid}" 2>/dev/null || true | ||
| fi | ||
| rm -f "${TF_MODULE_SERVER_PORT_FORWARD_PID_FILE}" | ||
| fi | ||
| log "Starting kubectl port-forward svc/tf-module-server 8999:80 -n ${TF_MODULE_SERVER_NS}" | ||
| ( kubectl -n "${TF_MODULE_SERVER_NS}" port-forward svc/tf-module-server 8999:80 \ |
| if clientv2.Is404Error(err) { | ||
| logger.Info("environment not found while loading recipe configuration for delete; treating as no-op") | ||
| return nil, nil | ||
| } |
Summary
Lets the Radius functional test suites — both corerp-noncloud and
corerp-cloud (Azure) — run against an OS-process Radius debug stack
(
make debug-start) on an arbitrary k3d/kind cluster, instead of beinghard-wired to a
kind-radiuscluster and workspace. Also stabilizes a fewmake debug-*targets that didn't reliably bring up their dependencieson k3d.
Changes
Test infrastructure — workspace-awareness
test/functional-portable/corerp/util.goGetSecretSuffixderives the resource group from the active workspace(
cli.LoadConfig→GetWorkspace→ParseScope→FindScope("resourcegroups")) instead of hardcodingkind-radius.Falls back to
defaultwhen nothing is configured.backends.NewKubernetesBackendis constructed from the active workspacerather than an assumed context/namespace.
test/functional-portable/corerp/noncloud/resources/application_test.goTest_ApplicationGraphPostStepVerify substitutes the fixture'skind-radiusresource group with the active workspace's resource groupbefore unmarshalling.
test/functional-portable/corerp/cloud/resources/recipe_terraform_test.goagainst any RG (CI's
kind-radiusand local debug'sdefault).test/functional-portable/corerp/cloud/resources/extender_test.go,test/rp/rptest.go,test/ucp/ucptest.go,test/validation/shared.go,test/functional-portable/cli/noncloud/cli_test.go,test/functional-portable/corerp/noncloud/resources/testdata/corerp-resources-simulatedenv.bicep— incidental cleanups required to run the suites against a
non-
kind-radiusworkspace; skip AWS-only tests cleanly when AWS envvars are unset; skip private-git redis test when
GH_TOKENis unset.Azure-cloud functional tests against a local OS-process stack
build/scripts/azure-local-testenv.sh— new orchestrator withsetup/run/teardown/allsub-commands.runandallacceptpassthrough
go testflags (e.g.-run,-v).runrebuilds state from the newestradlocal-${USER}-*resource group when the state file is missing(e.g. after
make debug-stop), and re-applies the Azure scope on thedefault rad environment that
debug-startwipes.teardown --all-orphansdeletes everyradlocal-${USER}-*RG and stops thetf-module-serverport-forward.
pkg/recipes/terraform/config/providers/azure.go— Terraform Azureprovider falls back to
use_cli = truewhen no Azure credential isregistered with UCP (404), so the host RP's
az loginsessionauthenticates. CI workload-identity path is unchanged.
build/scripts/start-radius.sh— exportsTERRAFORM_TEST_GLOBAL_DIRso the RP no longer tries to write to aread-only
/terraform.build/scripts/ensure-encryption-key.sh(new) — generates a stableencryption key for the local stack.
make debug-*reliabilitybuild/debug.mkdebug-install-contour: drop Helm--wait(it doesn't behave forLoadBalancer Services on k3d) and instead do explicit
kubectl wait --for=condition=Available+kubectl rollout status,so the target only returns once Contour is actually serving.
debug-install-tf-module-server: deploy the in-cluster nginx testmodule server and port-forward it to
localhost:8999; add acurlreadiness probe so subsequent recipe pulls don't race the pod
becoming Ready.
build/test.mk,build/recipes.mk,.github/scripts/publish-recipes.sh,build/scripts/mirror-test-images.sh(new) — companion glue forrunning the suite locally with mirrored images and locally published
recipes (the publish script learns
PLAIN_HTTPforlocalhost:5000pushes).
Misc
pkg/azure/clientv2/unfold.go,pkg/corerp/frontend/controller/applications/updatefilter.go,pkg/recipes/engine/engine.go— small adjustments surfaced whilerunning the suites end-to-end.
pkg/corerp/frontend/controller/applications/testbicep_scan_test.go(new) — small scan test added during investigation.
.gitignore— ignore local debug artifacts and the local-onlybicepconfig.jsonoverride thatmake debug-publish-bicep-typeswrites.
Documentation
docs/contributing/contributing-code/contributing-code-debugging/radius-os-processes-debugging.mddocuments running the Azure cloud suite against the local OS-process
stack via
make debug-start+azure-local-testenv.sh.How to use locally