fix(infra): retry transient Graph owners failure in geneva-identities (AROSLSRE-1193)#5677
Conversation
… (AROSLSRE-1193) The geneva-actions-entra-app Entra app/service-principal writes owners.relationships through Microsoft Graph, which intermittently fails with "Failed to update one or more relationships in 'owners'. Please try again later." and fails the whole ARM deployment. Add an automatedRetry block to the geneva-identities step matching that transient error (3 retries, 1m apart), following the existing pipeline retry convention.
There was a problem hiding this comment.
⚠️ Not ready to approve
The retry matching includes an overly generic substring that can unintentionally retry unrelated failures, increasing rollout latency and obscuring distinct errors.
Pull request overview
Adds an EV2 automated step retry to the geneva-identities ARM step in the global rollout pipeline to mitigate intermittent Microsoft Graph failures when updating the owners relationship for the Entra app/service principal.
Changes:
- Add an
automatedRetrypolicy to thegeneva-identitiesstep to retry transient Graph owners-relationship failures (3 attempts, 1m apart).
File summaries
| File | Description |
|---|---|
| dev-infrastructure/global-pipeline.yaml | Adds an automatedRetry block to the geneva-identities ARM step to make global rollouts resilient to transient Graph owners relationship commit failures. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 1
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
Drop the generic "Please try again later" match so the automatedRetry only fires on the Graph owners-relationship failure, avoiding retries of unrelated transient errors in the step.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: inbharajmani, raelga The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What
Add an
automatedRetryblock to thegeneva-identitiesARM step indev-infrastructure/global-pipeline.yamlso the global rollout retries on the transient Microsoft Graphownersrelationship error (up to 3 times, 1 minute apart).Why
The
geneva-actions-entra-appEntra app/service-principal is created viaMicrosoft.Graph/applications@betaindev-infrastructure/modules/entra/app.bicep, which writesowners.relationships. Graph intermittently fails to commit that relationship and asks the caller to retry, failing the whole ARM deployment and the rollout step:This is a known eventually-consistent Graph behaviour, so an automated retry is the right fix and matches the existing pipeline retry convention used elsewhere in the repo (e.g.
kube-applier,admin,acm).Testing
make validate-config-pipelinespasses (schema validation of all pipelines includingglobal-pipeline.yaml).Special notes for your reviewer
automatedRetryis part ofstepMetain thepipeline.schema.v1schema and is valid on ARM steps; it maps to EV2 automated step retry. The matched error strings are taken verbatim from the failing deployment message.PR Checklist
make validate-config-pipelines)