Skip to content

fix(infra): retry transient Graph owners failure in geneva-identities (AROSLSRE-1193)#5677

Merged
openshift-merge-bot[bot] merged 2 commits into
Azure:mainfrom
raelga:raelga/aroslsre-1193-geneva-owners-retry
Jun 17, 2026
Merged

fix(infra): retry transient Graph owners failure in geneva-identities (AROSLSRE-1193)#5677
openshift-merge-bot[bot] merged 2 commits into
Azure:mainfrom
raelga:raelga/aroslsre-1193-geneva-owners-retry

Conversation

@raelga

@raelga raelga commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

Add an automatedRetry block to the geneva-identities ARM step in dev-infrastructure/global-pipeline.yaml so the global rollout retries on the transient Microsoft Graph owners relationship error (up to 3 times, 1 minute apart).

Why

The geneva-actions-entra-app Entra app/service-principal is created via Microsoft.Graph/applications@beta in dev-infrastructure/modules/entra/app.bicep, which writes owners.relationships. Graph intermittently fails to commit that relationship and asks the caller to retry, failing the whole ARM deployment and the rollout step:

Code: BadRequest
Message: Failed to update one or more relationships in 'owners'. Please try again later.
Target: /resources/entraApp; inside Global.singleton.geneva-identities-uksouth-1 step

This is a known eventually-consistent Graph behaviour, so an automated retry is the right fix and matches the existing pipeline retry convention used elsewhere in the repo (e.g. kube-applier, admin, acm).

Testing

make validate-config-pipelines passes (schema validation of all pipelines including global-pipeline.yaml).

Special notes for your reviewer

automatedRetry is part of stepMeta in the pipeline.schema.v1 schema and is valid on ARM steps; it maps to EV2 automated step retry. The matched error strings are taken verbatim from the failing deployment message.

PR Checklist

  • Changes validated locally (make validate-config-pipelines)
  • Linked to Jira issue AROSLSRE-1193

… (AROSLSRE-1193)

The geneva-actions-entra-app Entra app/service-principal writes owners.relationships through Microsoft Graph, which intermittently fails with "Failed to update one or more relationships in 'owners'. Please try again later." and fails the whole ARM deployment. Add an automatedRetry block to the geneva-identities step matching that transient error (3 retries, 1m apart), following the existing pipeline retry convention.
Copilot AI review requested due to automatic review settings June 16, 2026 16:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

The retry matching includes an overly generic substring that can unintentionally retry unrelated failures, increasing rollout latency and obscuring distinct errors.

Pull request overview

Adds an EV2 automated step retry to the geneva-identities ARM step in the global rollout pipeline to mitigate intermittent Microsoft Graph failures when updating the owners relationship for the Entra app/service principal.

Changes:

  • Add an automatedRetry policy to the geneva-identities step to retry transient Graph owners-relationship failures (3 attempts, 1m apart).
File summaries
File Description
dev-infrastructure/global-pipeline.yaml Adds an automatedRetry block to the geneva-identities ARM step to make global rollouts resilient to transient Graph owners relationship commit failures.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment thread dev-infrastructure/global-pipeline.yaml Outdated
Drop the generic "Please try again later" match so the automatedRetry
only fires on the Graph owners-relationship failure, avoiding retries of
unrelated transient errors in the step.
@inbharajmani

Copy link
Copy Markdown
Collaborator

/lgtm

@openshift-ci

openshift-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: inbharajmani, raelga

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit 4c8c609 into Azure:main Jun 17, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants