Skip to content

fix(portable-resources): stop masking recipe errors when status save fails#12315

Open
sylvainsf wants to merge 2 commits into
mainfrom
fix-recipe-error-masking
Open

fix(portable-resources): stop masking recipe errors when status save fails#12315
sylvainsf wants to merge 2 commits into
mainfrom
fix-recipe-error-masking

Conversation

@sylvainsf

Copy link
Copy Markdown
Contributor

Description

When a recipe fails with a RecipeError, handleRecipeError in createorupdateresource.go persists the failure status as best-effort bookkeeping. Previously, if that persistence save itself failed, the controller returned the save error instead of the real recipe error.

On the Postgres backend the sensitive-resource redaction double-save could produce a spurious ErrConcurrency at exactly that persistence step, so a deploy that actually failed with an actionable recipe error — for example secrets "dbsecret" already exists — surfaced to the user as the operation failed due to a concurrency conflict, hiding the real cause.

This was observed deploying a Radius.Security/secrets resource: the Terraform recipe genuinely failed (leftover Kubernetes secret in the target namespace), but the user only saw a concurrency conflict.

Fix

A RecipeError is terminal either way, so always return the recipe's error details and log the persistence failure rather than overwriting the real cause with the save error.

Tests

Adds TestCreateOrUpdateResource_Run_RecipeErrorSurfacedWhenStatusSaveFails, which drives a RecipeError through the engine with a failing status-persistence save and asserts the surfaced error is the recipe error (not the save conflict), that the result is terminal, and that it is not requeued. The test fails against the previous behavior (with the exact the operation failed due to a concurrency conflict symptom) and passes with this change.

Relationship to #12312

The underlying Postgres ETag bug that made the persistence save fail in the first place is fixed separately in #12312 (refresh the stored ETag on Postgres updates). That fix removes the spurious ErrConcurrency; this PR is the complementary hardening so that any failure of the best-effort status save can never again mask the real recipe error. The two are independent and can merge in either order.

Type of change

  • This pull request fixes a bug in Radius and has an approved issue (issue link required).

Contributor checklist

  • An overview of proposed schema changes is included in a linked GitHub issue.
    • Not applicable
  • A design document is added or updated under eng/design-notes/, if new APIs are being introduced.
    • Not applicable
  • The design document has been reviewed and approved by Radius maintainers/approvers.
    • Not applicable
  • A PR for resource-types-contrib is created, if resource types or recipes are affected.
    • Not applicable
  • A PR for dashboard is created, if the Radius Dashboard is affected.
    • Not applicable
  • A PR for the documentation repository is created, if the changes affect documentation or user-facing behavior.
    • Not applicable

…fails

When a recipe fails with a RecipeError, handleRecipeError persists the
failure status as best-effort bookkeeping. Previously, if that persistence
save itself failed, the controller returned the save error instead of the
real recipe error. On the Postgres backend the redaction double-save could
produce a spurious ErrConcurrency there, so a deploy that actually failed
with an actionable recipe error (for example `secrets "dbsecret" already
exists`) surfaced to the user as "the operation failed due to a concurrency
conflict".

A RecipeError is terminal either way, so always return the recipe's error
details and log the persistence failure rather than overwriting the real
cause.

Add a regression test that drives a RecipeError with a failing status save
and asserts the surfaced error is the recipe error, not the save conflict.
The test fails against the previous behavior and passes with this change.

Signed-off-by: Sylvain Niles <[email protected]>
Copilot AI review requested due to automatic review settings July 2, 2026 18:38
@sylvainsf sylvainsf requested review from a team as code owners July 2, 2026 18:38
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the portable-resources async controller’s recipe-error handling so that a terminal RecipeError is always surfaced to the caller, even if the controller’s best-effort status persistence fails (e.g., due to a transient/spurious DB concurrency conflict). This prevents user-facing failures from being misleadingly reported as database concurrency errors when the real root cause is an actionable recipe failure.

Changes:

  • Update handleRecipeError to log (but not return) a failure from the best-effort status Save, ensuring the original RecipeError remains the surfaced/terminal error.
  • Add a regression test that forces the status persistence Save to fail and asserts the controller still returns a terminal failed result containing the recipe error details (and does not requeue).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
pkg/portableresources/backend/controller/createorupdateresource.go Ensures RecipeError is returned even when best-effort status persistence fails (prevents masking the real recipe failure).
pkg/portableresources/backend/controller/createorupdateresource_test.go Adds a regression test verifying the surfaced error is the recipe error (not the persistence/save error) and the operation is terminal/non-requeued.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Unit Tests

    2 files  ±0    452 suites  ±0   7m 31s ⏱️ -4s
5 659 tests +3  5 657 ✅ +3  2 💤 ±0  0 ❌ ±0 
6 856 runs  +3  6 854 ✅ +3  2 💤 ±0  0 ❌ ±0 

Results for commit 48d3c36. ± Comparison against base commit 659a78c.

♻️ This comment has been updated with latest results.

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.99%. Comparing base (659a78c) to head (48d3c36).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12315      +/-   ##
==========================================
+ Coverage   52.96%   52.99%   +0.03%     
==========================================
  Files         754      754              
  Lines       48686    48683       -3     
==========================================
+ Hits        25785    25799      +14     
+ Misses      20472    20459      -13     
+ Partials     2429     2425       -4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Functional Tests - upgrade-noncloud

3 tests  ±0   3 ✅ ±0   3m 41s ⏱️ ±0s
1 suites ±0   0 💤 ±0 
1 files   ±0   0 ❌ ±0 

Results for commit 48d3c36. ± Comparison against base commit 659a78c.

♻️ This comment has been updated with latest results.

Signed-off-by: Sylvain Niles <[email protected]>
@radius-functional-tests

radius-functional-tests Bot commented Jul 3, 2026

Copy link
Copy Markdown

Radius functional test overview

🔍 Go to test action run

Click here to see the test run details
Name Value
Repository radius-project/radius
Commit ref 48d3c36
Unique ID func091e1ccfff
Image tag pr-func091e1ccfff
  • Dapr: 1.14.4
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-func091e1ccfff
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-func091e1ccfff
  • dynamic-rp test image location: ghcr.io/radius-project/dev/dynamic-rp:pr-func091e1ccfff
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-func091e1ccfff
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-func091e1ccfff
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
⌛ Starting ucp-cloud functional tests...
⌛ Starting corerp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
❌ corerp-cloud functional test failed. Please check the logs for more details

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Functional Tests - corerp-cloud

26 tests  ±0   12 ✅  - 13   31m 17s ⏱️ + 19m 50s
 2 suites ±0    2 💤 + 1 
 1 files   ±0   12 ❌ +12 

For more details on these failures, see this check.

Results for commit 48d3c36. ± Comparison against base commit 659a78c.

This pull request skips 1 test.
github.com/radius-project/radius/test/functional-portable/corerp/cloud/resources ‑ Test_AWS_LogsLogGroup_Existing/deploy_testdata/aws-logs-loggroup-existing.bicep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants