fix(portable-resources): stop masking recipe errors when status save fails#12315
fix(portable-resources): stop masking recipe errors when status save fails#12315sylvainsf wants to merge 2 commits into
Conversation
…fails When a recipe fails with a RecipeError, handleRecipeError persists the failure status as best-effort bookkeeping. Previously, if that persistence save itself failed, the controller returned the save error instead of the real recipe error. On the Postgres backend the redaction double-save could produce a spurious ErrConcurrency there, so a deploy that actually failed with an actionable recipe error (for example `secrets "dbsecret" already exists`) surfaced to the user as "the operation failed due to a concurrency conflict". A RecipeError is terminal either way, so always return the recipe's error details and log the persistence failure rather than overwriting the real cause. Add a regression test that drives a RecipeError with a failing status save and asserts the surfaced error is the recipe error, not the save conflict. The test fails against the previous behavior and passes with this change. Signed-off-by: Sylvain Niles <[email protected]>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
This PR hardens the portable-resources async controller’s recipe-error handling so that a terminal RecipeError is always surfaced to the caller, even if the controller’s best-effort status persistence fails (e.g., due to a transient/spurious DB concurrency conflict). This prevents user-facing failures from being misleadingly reported as database concurrency errors when the real root cause is an actionable recipe failure.
Changes:
- Update
handleRecipeErrorto log (but not return) a failure from the best-effort statusSave, ensuring the originalRecipeErrorremains the surfaced/terminal error. - Add a regression test that forces the status persistence
Saveto fail and asserts the controller still returns a terminal failed result containing the recipe error details (and does not requeue).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/portableresources/backend/controller/createorupdateresource.go | Ensures RecipeError is returned even when best-effort status persistence fails (prevents masking the real recipe failure). |
| pkg/portableresources/backend/controller/createorupdateresource_test.go | Adds a regression test verifying the surfaced error is the recipe error (not the persistence/save error) and the operation is terminal/non-requeued. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #12315 +/- ##
==========================================
+ Coverage 52.96% 52.99% +0.03%
==========================================
Files 754 754
Lines 48686 48683 -3
==========================================
+ Hits 25785 25799 +14
+ Misses 20472 20459 -13
+ Partials 2429 2425 -4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Signed-off-by: Sylvain Niles <[email protected]>
Radius functional test overviewClick here to see the test run details
Test Status⌛ Building Radius and pushing container images for functional tests... |
Functional Tests - corerp-cloud26 tests ±0 12 ✅ - 13 31m 17s ⏱️ + 19m 50s For more details on these failures, see this check. Results for commit 48d3c36. ± Comparison against base commit 659a78c. This pull request skips 1 test. |
Description
When a recipe fails with a
RecipeError,handleRecipeErrorincreateorupdateresource.gopersists the failure status as best-effort bookkeeping. Previously, if that persistence save itself failed, the controller returned the save error instead of the real recipe error.On the Postgres backend the sensitive-resource redaction double-save could produce a spurious
ErrConcurrencyat exactly that persistence step, so a deploy that actually failed with an actionable recipe error — for examplesecrets "dbsecret" already exists— surfaced to the user asthe operation failed due to a concurrency conflict, hiding the real cause.This was observed deploying a
Radius.Security/secretsresource: the Terraform recipe genuinely failed (leftover Kubernetes secret in the target namespace), but the user only saw a concurrency conflict.Fix
A
RecipeErroris terminal either way, so always return the recipe's error details and log the persistence failure rather than overwriting the real cause with the save error.Tests
Adds
TestCreateOrUpdateResource_Run_RecipeErrorSurfacedWhenStatusSaveFails, which drives aRecipeErrorthrough the engine with a failing status-persistence save and asserts the surfaced error is the recipe error (not the save conflict), that the result is terminal, and that it is not requeued. The test fails against the previous behavior (with the exactthe operation failed due to a concurrency conflictsymptom) and passes with this change.Relationship to #12312
The underlying Postgres ETag bug that made the persistence save fail in the first place is fixed separately in #12312 (refresh the stored ETag on Postgres updates). That fix removes the spurious
ErrConcurrency; this PR is the complementary hardening so that any failure of the best-effort status save can never again mask the real recipe error. The two are independent and can merge in either order.Type of change
Contributor checklist
eng/design-notes/, if new APIs are being introduced.