Skip to content

Add Repo Radius deploy workflow#12243

Closed
sylvainsf wants to merge 6 commits into
mainfrom
repo-radius-deploy-workflow
Closed

Add Repo Radius deploy workflow#12243
sylvainsf wants to merge 6 commits into
mainfrom
repo-radius-deploy-workflow

Conversation

@sylvainsf

@sylvainsf sylvainsf commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Description

Lands the Repo Radius deploy workflow in the repository and adapts it to the
building blocks that have since merged, so the multi-cluster + state-storage work
has a stable, in-repo consumer to validate against and frontends can drive Repo
Radius without relying on any external project.

The deploy workflow previously existed only as a generated string produced
outside Radius, where the contract it depends on had no reviewed home. This PR
lands it in-tree under .github/extension/ and rewires it onto the merged seams
instead of patching the RP/DE deployments by hand.

What's included

  • .github/extension/radius-deploy.yml — the deploy workflow. It creates an
    ephemeral k3d control plane, restores durable state with rad startup, runs the
    dispatched rad commands against the user's external AKS/EKS cluster, persists
    state again with rad shutdown, and tears the control plane down.
    • Honors the workflow_dispatch contract: environment + radius_commands
      (a single command string or a JSON array, rad prefix omitted, run in
      order, stop on first failure). Each command's output is uploaded as the
      radius-output artifact for incremental polling.
    • Uses the merged multi-cluster seam (--set global.targetCluster.enabled=true)
      rather than patching the RP/DE deployments by hand. RADIUS_TARGET_KUBECONFIG
      now drives both Bicep and Terraform, so the separate KUBE_CONFIG_PATH
      variable is no longer needed.
    • Credentials are provider-native, not registered: AWS OIDC session creds are
      injected as pod env vars; Azure uses Workload Identity, with the client/
      tenant IDs registered and the GitHub Actions OIDC JWT projected into the pods
      as the federated token file. rad env update sets scope only.
  • .github/extension/radius-verify-credentials.yml — ports the companion
    verify workflow so the contract is testable without a full deploy.
  • .github/extension/README.md — documents both workflows and the
    RADIUS_TARGET_KUBECONFIG / credential / state-persistence contract.
  • eng/design-notes/environments/2026-06-repo-radius-deploy-workflow.md — the
    deploy-workflow technical design (Investment 3 of the Repo Radius feature spec).
  • test/functional-portable/statestore/.../statestore_lifecycle_test.go — the
    rad startup / rad shutdown lifecycle test, re-pointed at this workflow's install
    model and un-gated to run in CI on its own dedicated, isolated KinD cluster (new
    statestore-noncloud matrix leg + make target). Hardened against the upgrade-test
    flakes (Fix flaky upgrade test: replace fixed sleeps with polling and increase timeouts #12245): polls for control-plane readiness (503-tolerant) and for aggregated
    APIService deregistration instead of sleeping.

In scope / out of scope

  • In scope: adapting the deploy workflow to the rad startup / rad shutdown
    lifecycle while preserving the workflow_dispatch contract; porting the verify
    workflow so it can be exercised without a deploy.
  • Out of scope (tracked separately): the cloud-side OIDC / permission
    provisioning; mid-run cloud-token refresh (an accepted limitation — a long Azure
    run may outlive the one-time token exchange).

Status

  • #12214 (rad startup /
    rad shutdown + database.enabled=true chart wiring) has merged; this branch is
    rebased onto it and the statestore lifecycle test is wired into CI and un-gated.
  • Still a draft pending final review; the cloud-side OIDC provisioning and mid-run
    token refresh remain out of scope (tracked separately).

Type of change

  • This pull request adds or changes features of Radius and has an approved issue (issue link required).

Related: #12118

Contributor checklist

Please verify that the PR meets the following requirements, where applicable:

  • An overview of proposed schema changes is included in a linked GitHub issue.
    • Yes
    • Not applicable
  • A design document is added or updated under eng/design-notes/ in this repository, if new APIs are being introduced.
    • Yes
    • Not applicable
  • The design document has been reviewed and approved by Radius maintainers/approvers.
    • Yes
    • Not applicable
  • A PR for resource-types-contrib is created, if resource types or recipes are affected by the changes in this PR.
    • Yes
    • Not applicable
  • A PR for dashboard is created, if the Radius Dashboard is affected by the changes in this PR.
    • Yes
    • Not applicable
  • A PR for the documentation repository is created, if the changes in this PR affect the documentation or any user facing updates are made.
    • Yes
    • Not applicable

@github-actions

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.88%. Comparing base (8412ca0) to head (15d7e4d).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12243      +/-   ##
==========================================
- Coverage   52.89%   52.88%   -0.01%     
==========================================
  Files         751      751              
  Lines       48353    48353              
==========================================
- Hits        25574    25572       -2     
- Misses      20383    20384       +1     
- Partials     2396     2397       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Unit Tests

    2 files  ±0    450 suites  ±0   7m 39s ⏱️ +18s
5 591 tests ±0  5 589 ✅ ±0  2 💤 ±0  0 ❌ ±0 
6 788 runs  ±0  6 786 ✅ ±0  2 💤 ±0  0 ❌ ±0 

Results for commit 15d7e4d. ± Comparison against base commit 8412ca0.

♻️ This comment has been updated with latest results.

Port the Repo Radius deploy workflow from the github-extension prototype into
.github/extension/ and adapt it to the merged building blocks: multi-cluster v1
(global.targetCluster seam) and externalized state (rad startup / rad shutdown).

- radius-deploy.yml: ephemeral k3d control plane, restore state with rad startup,
  run the dispatched radius_commands, persist state with rad shutdown, tear down.
  Honors the workflow_dispatch contract (environment + radius_commands as a single
  string or JSON array; rad prefix omitted; stop-on-first-failure) and uploads
  per-command output as the radius-output artifact.
- Credentials are provider-native, not registered: AWS OIDC session creds are
  injected as pod env vars; Azure uses Workload Identity with the GitHub Actions
  OIDC JWT projected as the federated token file. rad env update sets scope only.
- radius-verify-credentials.yml: port the verify workflow so the contract is
  testable without a deploy.
- README.md: document both workflows and the RADIUS_TARGET_KUBECONFIG / credential
  / state-persistence contract.
- eng/design-notes: add the deploy-workflow technical design (Investment 3).

Related: #12118
Signed-off-by: Sylvain Niles <[email protected]>
Remove references to the external prototype project and speak in generic terms
(an earlier proof of concept, a separate project) so the design does not rely on
naming an external repository.

Related: #12118
Signed-off-by: Sylvain Niles <[email protected]>
…ated

Re-point the `rad startup` / `rad shutdown` state-storage lifecycle test at this
PR's deploy workflow model and remove the `RADIUS_STATE_E2E` gate so it runs in
CI on its own dedicated, isolated cluster.

- Add a `statestore-noncloud` leg to the non-cloud functional matrix. Each matrix
  leg runs on its own runner with its own KinD cluster, so the test's destructive
  install/uninstall/reinstall cycle never affects other legs. The shared "Install
  Radius" step is skipped for this leg because the test drives its own install.
- Drive install with the build-under-test images (chart + per-RP image flags from
  testutil.SetDefault, DE_IMAGE/DE_TAG, and the secure local registry CA), mirroring
  the shared Install Radius step, plus `database.enabled=true` for the state backend.
- Harden the lifecycle against the flakes seen in the upgrade test (#12245): replace
  fixed sleeps with polling — wait for the control plane treating 503 from the UCP
  aggregated APIService as retryable, and poll discovery until `api.ucp.dev/v1alpha3`
  deregisters before reinstalling so the next install doesn't race the teardown.
- Add the `test-functional-statestore-noncloud` make target and a 40m timeout for
  the leg.

Related: #12118
Signed-off-by: Sylvain Niles <[email protected]>
@sylvainsf sylvainsf force-pushed the repo-radius-deploy-workflow branch from 1590df2 to 122d84e Compare June 25, 2026 01:33
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Functional Tests - statestore-noncloud

1 tests   1 ✅  3m 1s ⏱️
1 suites  0 💤
1 files    0 ❌

Results for commit 15d7e4d.

♻️ This comment has been updated with latest results.

@sylvainsf sylvainsf changed the title Add Repo Radius deploy workflow (rad startup/shutdown) Add Repo Radius deploy workflow Jun 25, 2026
…kage

The statestore test lives one directory deeper than the upgrade test it was
modeled on (statestore/noncloud vs upgrade), so the chart path needs four ../
segments to reach the repo root, not three. CI failed with 'stat
../../../deploy/Chart: no such file or directory'.

Signed-off-by: Sylvain Niles <[email protected]>
…dent

waitForControlPlane called rp.NewRPTestOptions, which requires a configured rad
workspace. It ran inside installRadius (before the test creates the workspace),
so it panicked with 'default workspace is not set' every iteration and looped
until the timeout. Poll the radius-system Deployments for the Available condition
via the Kubernetes client instead — the real, workspace-independent readiness
signal, matching the workflow's 'kubectl wait --for=condition=Available'.

Signed-off-by: Sylvain Niles <[email protected]>
Two CI failures after install/deploy succeeded:

- 'rad uninstall' prompted for confirmation and failed opening /dev/tty in CI.
  Pass --yes for non-interactive teardown.
- 'rad shutdown' tried to push the radius-state branch to the checkout's GitHub
  origin, which has no push credentials in CI. Run shutdown/startup from a
  dedicated throwaway git repo with no remote (via a second CLI with
  WorkingDirectory set) so gitstate commits state locally only — the design's
  supported local/test case. Both commands share the same repo so the state
  committed by shutdown survives into startup.

Signed-off-by: Sylvain Niles <[email protected]>
@radius-functional-tests

radius-functional-tests Bot commented Jun 25, 2026

Copy link
Copy Markdown

Radius functional test overview

🔍 Go to test action run

Click here to see the test run details
Name Value
Repository radius-project/radius
Commit ref 15d7e4d
Unique ID func9526378753
Image tag pr-func9526378753
  • KinD: v0.29.0
  • Dapr: 1.14.4
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-func9526378753
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-func9526378753
  • dynamic-rp test image location: ghcr.io/radius-project/dev/dynamic-rp:pr-func9526378753
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-func9526378753
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-func9526378753
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
⌛ Starting corerp-cloud functional tests...
⌛ Starting ucp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

@sylvainsf

Copy link
Copy Markdown
Contributor Author

Closing as superseded.

This draft's content has been carried forward (and further evolved) into sk593's #12250 (add-deploy-workflow) via the merged #12264:

  • the rad_commands dispatch contract,
  • the env-var renames (AWS_ROLE_ARN, AWS_EKS_CLUSTER_NAME / AZURE_AKS_CLUSTER_NAME, KUBERNETES_NAMESPACE),
  • the rename to radius-run-rad-commands.yml (two-action model),
  • the deploy-workflow design note, the statestore lifecycle test, the build/test.mk target, and the functional-test-noncloud matrix leg.

git diff add-deploy-workflow radius-commands-and-design-note -- .github/extension/ .github/skills/ eng/design-notes/.../repo-radius-deploy-workflow.md is empty, so nothing here is lost by closing. The work now lands via #12250.

@sylvainsf sylvainsf closed this Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant