CNTRLPLANE-3277: Add Azure OAuth LoadBalancer private topology e2e test by Nirshal · Pull Request #8584 · openshift/hypershift

Nirshal · 2026-05-25T15:56:17Z

What this PR does / why we need it:

Adds the AzureOAuthLoadBalancerPrivateTest to the v2 Ginkgo e2e suite, validating
that the OAuth server works correctly when published via LoadBalancer on an Azure
Private topology cluster.

The test verifies:

The oauth-openshift Service is created as type LoadBalancer with an allocated endpoint
The Service carries the Azure internal LoadBalancer annotation (confirming it's an ILB, not public)
The OAuth token flow (kubeadmin login + htpasswd IDP setup + token request) works through the private endpoint

Changes to lifecycle/azure.go:

Adds --oauth-publishing-strategy=LoadBalancer to the existing private cluster variant, reusing it instead of spinning up a separate cluster (per reviewer feedback to reduce Azure CI cluster count)
Adds self-managed-azure-oauth-lb-private to the private TestGroup's LabelFilter so the new test runs on the same cluster

Note: The OAuth token flow test requires the test runner to have network connectivity
to the Azure VNet (e.g., running in the management cluster or via VPN/peering).

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-3277

Special notes for your reviewer:

This builds on top of CNTRLPLANE-3222: Port v1 lifecycle tests to v2 Ginkgo framework #8527 (CNTRLPLANE-3222) which ported the v1 lifecycle tests to v2 and established the Azure self-managed test infrastructure.
The private cluster variant now always creates with OAuth via LoadBalancer. This means private topology with default OAuth (Route) is no longer tested in CI. This tradeoff was agreed upon to avoid an additional Azure cluster.
AzureOAuthLoadBalancerPrivateTest mutates the HostedCluster (creates htpasswd IDP, triggers kubeadmin secret deletion). It is registered last for the private cluster in RegisterHostedClusterAzureTests to avoid affecting other private topology tests.
No companion PR in openshift/release is needed since the private cluster already exists in CI.

Checklist:

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests. No unit tests needed — changes are e2e test code and CI cluster configuration only, with no new business logic.

Summary by CodeRabbit

Tests
- Added end-to-end test coverage for Azure private-topology clusters running OAuth behind a LoadBalancer, verifying service type, internal LB annotation, and token flow through the LoadBalancer endpoint.
- Introduced an ordered validation suite for private Azure deployments to ensure OAuth LoadBalancer behavior.
- Broadened the Azure test selection to include an additional private OAuth LoadBalancer variant.

openshift-merge-bot · 2026-05-25T15:56:20Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci-robot · 2026-05-25T15:56:21Z

@Nirshal: This pull request references CNTRLPLANE-3277 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Adds the AzureOAuthLoadBalancerPrivateTest to the v2 Ginkgo e2e suite, validating
that the OAuth server works correctly when published via LoadBalancer on an Azure
Private topology cluster.

The test verifies:

The oauth-openshift Service is created as type LoadBalancer with an allocated endpoint

The Service carries the Azure internal LoadBalancer annotation (confirming it's an ILB, not public)

The OAuth token flow (kubeadmin login + htpasswd IDP setup + token request) works through the private endpoint

Also adds:

oauth-lb-private cluster spec in lifecycle/azure.go combining --endpoint-access=Private with --oauth-publishing-strategy=LoadBalancer

Corresponding test matrix entry with label self-managed-azure-oauth-lb-private

Note: The OAuth token flow test requires the test runner to have network connectivity
to the Azure VNet (e.g., running in the management cluster or via VPN/peering).

Companion PR needed: openshift/release changes to create/destroy/dump the
oauth-lb-private cluster and wire the label filter in the CI job.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-3277

Special notes for your reviewer:

This builds on top of CNTRLPLANE-3222: Port v1 lifecycle tests to v2 Ginkgo framework #8527 (CNTRLPLANE-3222) which ported the v1 lifecycle tests to v2 and established the Azure self-managed test infrastructure.

The naming oauth-lb-private follows the existing oauth-lb convention rather than private-oauth-lb to stay consistent with Bryan's naming. Open to renaming both to public-oauth-lb / private-oauth-lb for clarity if preferred.

The ValidateOAuthWithIdentityProviderViaLoadBalancer helper in test/e2e/util/oauth.go does not clean up the htpasswd Secret it creates — this is pre-existing and safe since each test variant uses a separate cluster, but worth a follow-up.

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests. No unit tests needed — changes are e2e test code and CI cluster configuration only, with no new business logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-05-25T15:56:27Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-05-25T15:56:33Z

📝 Walkthrough

Walkthrough

Adds --oauth-publishing-strategy=LoadBalancer and the private NAT subnet arg to the Azure private cluster ExtraArgs, expands the Azure private test label selector to include self-managed-azure-oauth-lb-private, and adds an ordered E2E test (AzureOAuthLoadBalancerPrivateTest) that waits for oauth-openshift to be a LoadBalancer, asserts the Azure internal LoadBalancer annotation, and validates the OAuth token flow via the LoadBalancer endpoint.

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner
  participant HostedCluster
  participant AzureLoadBalancer
  participant ManagementClient
  TestRunner->>HostedCluster: wait for oauth-openshift Service type=LoadBalancer
  HostedCluster->>AzureLoadBalancer: allocate LB and ingress endpoint
  AzureLoadBalancer->>HostedCluster: annotate Service with InternalLoadBalancerAnnotation
  TestRunner->>ManagementClient: perform OAuth token flow via LB endpoint
  ManagementClient->>HostedCluster: validate token exchange through LB ingress

Suggested reviewers

enxebre
cblecker

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Test Structure And Quality	⚠️ Warning	ValidateOAuthWithIdentityProviderViaLoadBalancer creates htpasswd Secret and mutates HostedCluster OAuth config without cleanup, causing test order-dependency on shared clusters.	Add AfterAll block to restore original OAuth config and delete htpasswd Secret with apierrors.IsNotFound checks during cleanup.
Microshift Test Compatibility	⚠️ Warning	Test uses config.openshift.io/v1, user.openshift.io/v1, and osin.openshift.io/v1 APIs unavailable on MicroShift without protection mechanisms like [apigroup:config.openshift.io] label.	Add [apigroup:config.openshift.io] tag to test name, or add [Skipped:MicroShift] label, or guard with exutil.IsMicroShiftCluster() check.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding an Azure OAuth LoadBalancer private topology e2e test. It is concise, specific, and directly matches the pull request's primary objective.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All test names are stable and deterministic with no dynamic content; dynamic values correctly appear only in test bodies, not titles.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	The new AzureOAuthLoadBalancerPrivateTest does not make multi-node assumptions; it tests Service creation, annotations, and OAuth token flow which all work on single-node clusters.
Topology-Aware Scheduling Compatibility	✅ Passed	This PR only modifies e2e test files. It does not introduce deployment manifests, operator code, or controllers with scheduling constraints.
Ote Binary Stdout Contract	✅ Passed	PR introduces no OTE stdout contract violations: adds only configuration strings and Ginkgo test functions (inside Context/It blocks), with no process-level stdout writes.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Test uses DNS hostnames validated by regex for URL construction, not raw IP addresses. No hardcoded IPv4/IPv6 addresses or public internet connectivity detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

test/e2e/v2/tests/hosted_cluster_azure_test.go (1)

304-320: ⚡ Quick win

Move platform/topology guard skips to BeforeEach with standardized skip text.

In Line 304-320, the new suite uses BeforeAll and custom skip messages for platform checks. Please switch to BeforeEach and the repository’s required message format for platform-specific skips.

♻️ Suggested adjustment

-		BeforeAll(func() {
+		BeforeEach(func() {
 			testCtx = getTestCtx()
 			hc = testCtx.GetHostedCluster()
 			if hc == nil || hc.Spec.Platform.Type != hyperv1.AzurePlatform {
-				Skip("Azure OAuth LB Private tests are only for Azure platform")
+				Skip("[Azure OAuth LB Private] test is only for Azure platform")
 			}
 			if hc.Spec.Platform.Azure == nil || hc.Spec.Platform.Azure.Topology != hyperv1.AzureTopologyPrivate {
 				Skip("Azure OAuth LB Private tests require Private topology")
 			}

As per coding guidelines, "test/e2e/v2/tests/**/*_test.go: Use BeforeEach with Skip() for platform-specific tests with message format: [Feature] test is only for [Platform] platform".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/v2/tests/hosted_cluster_azure_test.go` around lines 304 - 320,
Replace the platform/topology guard block currently inside BeforeAll with a
BeforeEach that performs the same checks (use
getTestCtx()/testCtx.GetHostedCluster(), validate hc.Spec.Platform.Type,
hc.Spec.Platform.Azure.Topology, and
netutil.ServicePublishingStrategyByTypeByHC(hc, hyperv1.OAuthServer)), move
controlPlaneNamespace assignment into this BeforeEach, and call Skip() with the
repository-standard messages: for platform use "Azure OAuth LB Private tests is
only for Azure platform" and for topology use "Azure OAuth LB Private tests
requires Private topology" (also keep the existing Skip for publishing strategy
if present); update references to BeforeAll -> BeforeEach and keep the same
symbol names (getTestCtx, testCtx.GetHostedCluster, controlPlaneNamespace,
netutil.ServicePublishingStrategyByTypeByHC, hyperv1.OAuthServer).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/v2/tests/hosted_cluster_azure_test.go`:
- Around line 304-320: Replace the platform/topology guard block currently
inside BeforeAll with a BeforeEach that performs the same checks (use
getTestCtx()/testCtx.GetHostedCluster(), validate hc.Spec.Platform.Type,
hc.Spec.Platform.Azure.Topology, and
netutil.ServicePublishingStrategyByTypeByHC(hc, hyperv1.OAuthServer)), move
controlPlaneNamespace assignment into this BeforeEach, and call Skip() with the
repository-standard messages: for platform use "Azure OAuth LB Private tests is
only for Azure platform" and for topology use "Azure OAuth LB Private tests
requires Private topology" (also keep the existing Skip for publishing strategy
if present); update references to BeforeAll -> BeforeEach and keep the same
symbol names (getTestCtx, testCtx.GetHostedCluster, controlPlaneNamespace,
netutil.ServicePublishingStrategyByTypeByHC, hyperv1.OAuthServer).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4942b228-db2d-42ed-83bf-2dd3738e5e94

📥 Commits

Reviewing files that changed from the base of the PR and between 7625684 and b8a5e30.

📒 Files selected for processing (2)

test/e2e/v2/lifecycle/azure.go
test/e2e/v2/tests/hosted_cluster_azure_test.go

codecov · 2026-05-25T16:03:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.79%. Comparing base (7507291) to head (c274044).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #8584   +/-   ##
=======================================
  Coverage   41.79%   41.79%           
=======================================
  Files         759      759           
  Lines       94037    94037           
=======================================
  Hits        39304    39304           
  Misses      51983    51983           
  Partials     2750     2750

Flag	Coverage Δ
cmd-support	`35.11% <ø> (ø)`
cpo-hostedcontrolplane	`44.10% <ø> (ø)`
cpo-other	`43.45% <ø> (ø)`
hypershift-operator	`51.87% <ø> (ø)`
other	`31.56% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Nirshal · 2026-05-26T13:54:51Z

Re: CodeRabbit suggestion to change BeforeAll to BeforeEach — intentionally keeping BeforeAll here.

This Context uses the Ordered decorator with shared state (testCtx, hc, controlPlaneNamespace) across three sequential It blocks. BeforeAll runs once and initializes the shared variables; BeforeEach would redundantly re-fetch the test context and re-run the guards before every It.

This matches the existing pattern in AzurePrivateTopologyTest (same file, line 91) which also uses Ordered + BeforeAll for the same reason.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/v2/tests/hosted_cluster_azure_test.go`:
- Around line 379-382: The test calls
ValidateOAuthWithIdentityProviderViaLoadBalancer which mutates the HostedCluster
OAuth config and creates an htpasswd Secret without restoring them; capture the
original HostedCluster.Spec.OAuth (and any other mutated fields) and the
existence/data of the htpasswd Secret before calling
ValidateOAuthWithIdentityProviderViaLoadBalancer, then defer a restoration
function that re-applies the original OAuth config and deletes or restores the
htpasswd Secret, and in that deferred cleanup handle apierrors.IsNotFound(error)
when attempting to restore/delete resources so cleanup tolerates resources
already removed; use the HostedCluster object and the Secret name returned/used
by ValidateOAuthWithIdentityProviderViaLoadBalancer to identify what to restore.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: bcc13421-364c-476a-94f1-73cf7c97a172

📥 Commits

Reviewing files that changed from the base of the PR and between b8a5e30 and 5d52eb0.

📒 Files selected for processing (2)

test/e2e/v2/lifecycle/azure.go
test/e2e/v2/tests/hosted_cluster_azure_test.go

coderabbitai · 2026-05-26T14:00:21Z

+		It("should complete OAuth token flow through LoadBalancer endpoint", func() {
+			ctx := testCtx.Context
+			e2eutil.ValidateOAuthWithIdentityProviderViaLoadBalancer(GinkgoTB(), ctx, testCtx.MgmtClient, hc)
+		})


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore cluster auth state after OAuth flow mutation.

Line 381 calls ValidateOAuthWithIdentityProviderViaLoadBalancer, which mutates HostedCluster OAuth config and creates an htpasswd Secret without restoring/deleting them. This makes outcomes order-dependent on shared clusters. Please capture original state and defer restoration, with apierrors.IsNotFound() handling during cleanup.

As per coding guidelines: "When a test mutates cluster state, capture original state before mutation and defer restoration; check apierrors.IsNotFound() in cleanup to handle already-deleted resources".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/v2/tests/hosted_cluster_azure_test.go` around lines 379 - 382, The test calls ValidateOAuthWithIdentityProviderViaLoadBalancer which mutates the HostedCluster OAuth config and creates an htpasswd Secret without restoring them; capture the original HostedCluster.Spec.OAuth (and any other mutated fields) and the existence/data of the htpasswd Secret before calling ValidateOAuthWithIdentityProviderViaLoadBalancer, then defer a restoration function that re-applies the original OAuth config and deletes or restores the htpasswd Secret, and in that deferred cleanup handle apierrors.IsNotFound(error) when attempting to restore/delete resources so cleanup tolerates resources already removed; use the HostedCluster object and the Secret name returned/used by ValidateOAuthWithIdentityProviderViaLoadBalancer to identify what to restore.

The mutation happens inside the shared helper ValidateOAuthWithIdentityProviderViaLoadBalancer (in test/e2e/util/oauth.go), which is also used by the existing AzureOAuthLoadBalancerTest (line 284, same file) without cleanup. Adding cleanup in the caller would mean duplicating restore logic that belongs in the helper itself.

This is a pre-existing concern, not introduced by this PR. Noted in the PR description as a known tradeoff. If we address it, it should be done in the shared helper as a follow-up.

@Nirshal, that's a fair point — the concern is pre-existing and your reasoning is correct. Duplicating restore logic in each caller would be the wrong approach; the cleanup belongs inside ValidateOAuthWithIdentityProviderViaLoadBalancer itself so all callers benefit automatically.

Would you like me to open a follow-up GitHub issue to track adding htpasswd Secret cleanup (and OAuth config restoration) to the shared helper?

🧠 Learnings used

Learnt from: cblecker Repo: openshift/hypershift PR: 8511 File: test/e2e/v2/tests/hosted_cluster_metrics_test.go:203-245 Timestamp: 2026-05-15T03:34:00.446Z Learning: When reviewing HyperShift v2 e2e tests under test/e2e/v2/, do not require MicroShift “guard” logic or markers. HyperShift v2 e2e tests in this directory run only against hosted clusters (no MicroShift targets), so missing exutil.IsMicroShiftCluster() checks and/or [Skipped:MicroShift] labels should not be flagged.

Learnt from: bryan-cox Repo: openshift/hypershift PR: 8527 File: test/e2e/v2/tests/nodepool_autoscaling_test.go:236-236 Timestamp: 2026-05-15T14:34:25.062Z Learning: In openshift/hypershift v2 e2e lifecycle CI tests under test/e2e/v2/tests/, it is OK to reference external image registries because the CI environment is not air-gapped. Image references like registry.access.redhat.com/ubi9/ubi-minimal:latest in lifecycle DaemonSet definitions should not be flagged for requiring internal registries. For the autoscaling workload created by newAutoscalingWorkload (nodepool_autoscaling_test.go), ensure it uses registry.k8s.io/pause:3.9 (not ubi-minimal).

Learnt from: bryan-cox Repo: openshift/hypershift PR: 8469 File: test/e2e/util/util_test.go:17-131 Timestamp: 2026-05-17T22:37:02.057Z Learning: In Go e2e tests under test/e2e, do not call t.Parallel() (neither at the top-level test nor in any subtest) when the test (or any helper it calls) uses t.Setenv—for example, when calling azureutil.SetAsAroHCPTest(t). The Go test runner panics with "testing: test using t.Setenv can not use t.Parallel" because environment variables are process-global. Keep such tests serial to ensure the environment setup is safe and deterministic.

Nirshal · 2026-05-26T14:18:42Z

/test ?

Nirshal · 2026-05-26T14:19:01Z

/test ?

Nirshal · 2026-05-26T14:20:23Z

/test e2e-azure-v2-self-managed

cblecker

The test structure itself is solid — good use of Ordered + BeforeAll, thorough skip guards, and the internal LB annotation check is a meaningful addition for private topology coverage.

Two things to address before this can merge:

1. Healthcheck unreachable from CI (blocking): WaitForOAuthLoadBalancerReady dials the OAuth hostname directly via TLS, which resolves to the internal LB IP for private topology — unreachable from the build cluster. The CI run confirmed this (job 2059278516129632256, oauth.go:375, 300s timeout). The token flow step worked; it's only the raw healthcheck that fails. Fix needs to happen in the shared helper.

2. Test isolation on the shared cluster: The public AzureOAuthLoadBalancerTest works because it has a dedicated cluster. This private variant shares a cluster with read-only tests and relies on declaration order for isolation. Either a dedicated cluster, or a sequential matrix step, would give the same guarantee the public variant has.

Minor nit: the Service type + endpoint predicates in the first It block are nearly identical to the public variant (~30 lines). Not blocking, but worth a shared helper if more LB variants get added.

cblecker · 2026-05-29T21:11:09Z

+		It("should complete OAuth token flow through LoadBalancer endpoint", func() {
+			ctx := testCtx.Context
+			e2eutil.ValidateOAuthWithIdentityProviderViaLoadBalancer(GinkgoTB(), ctx, testCtx.MgmtClient, hc)
+		})


The CI run showed this test can't pass as-is — WaitForOAuthLoadBalancerReady (in test/e2e/util/oauth.go:350-375) does a direct TLS dial to https://<oauth-hostname>/healthz. For private topology that hostname resolves to the Azure internal LB IP (e.g., 10.0.0.22), which is unreachable from the CI build cluster.

Interestingly the OAuth token flow itself succeeds because it routes through the guest kubeconfig transport — it's specifically the raw healthcheck that fails. So the underlying validation logic works; it's just the connectivity check that's incompatible with private topology.

The fix needs to happen in the shared helper. A few approaches:

Skip WaitForOAuthLoadBalancerReady entirely for private topology and rely on the token flow validation alone — it already exercises the full OAuth stack end-to-end.

Use the guest kubeconfig transport for the healthcheck too, the same way the token flow step does.

Route it through a pod on the management cluster that has VNet connectivity.

This needs to be addressed in the helper before this can land.

Addressed in c274044. Extracted WaitForOAuthLoadBalancerEndpoint from WaitForOAuthLoadBalancerReady — it waits for the LB endpoint without the direct /healthz TLS dial. The private test now calls WaitForOAuthLoadBalancerEndpoint + ValidateOAuthIdentityProviderFlow (also extracted), skipping the health check entirely while still validating the full OAuth IDP token flow end-to-end.

WaitForOAuthLoadBalancerReady is unchanged in behavior — it now composes WaitForOAuthLoadBalancerEndpoint + the health check, so the public test path is unaffected.

cblecker · 2026-05-29T21:11:19Z

+	AzureOAuthLoadBalancerPrivateTest(getTestCtx)
 }

 var _ = Describe("Hosted Cluster Azure", Label("hosted-cluster-azure"), func() {


The public variant (AzureOAuthLoadBalancerTest) works cleanly because it runs on a dedicated cluster (cluster-name-oauth-lb) created specifically for it — mutation is safe there since nothing else runs on that cluster. This private variant is trying to replicate that pattern but without the dedicated cluster, sharing cluster-name-private with the read-only AzurePrivateTopologyTest instead.

Registering this last is a reasonable mitigation, but declaration order between Context blocks isn't enforced by Ginkgo's Ordered decorator — that only applies within a single Context. If the registration order changes, or a new test gets inserted between the two, the read-only private topology tests could run after the HC has been mutated (htpasswd IDP created, kubeadmin secret deleted).

The cleanest fix that matches how the public variant is structured would be a dedicated cluster-name-oauth-lb-private cluster. I understand the reviewer feedback to reduce cluster count, so if a dedicated cluster is off the table, the other option is to make this a sequential step in the test matrix (after the read-only private tests), the way etcd-chaos runs sequentially after upgrade on the upgrade cluster. That would give the same isolation guarantee without an extra cluster.

Addressed in c274044. Moved the private test group from Parallel to Sequential in the test matrix, following the same pattern as upgrade-and-chaos. The new private-and-oauth sequential group runs read-only private topology tests first, then the mutating OAuth LB test second — guaranteeing isolation regardless of registration order.

Add e2e validation for OAuth LoadBalancer publishing on Azure Private topology. Reuse the existing private cluster variant by adding --oauth-publishing-strategy=LoadBalancer to its creation args, avoiding an additional Azure cluster in CI. Signed-off-by: Alessandro Rossi <[email protected]> Commit-Message-Assisted-by: Claude (via Claude Code)

Extract WaitForOAuthLoadBalancerEndpoint and ValidateOAuthIdentityProviderFlow helpers from the monolithic WaitForOAuthLoadBalancerReady and ValidateOAuthWithIdentityProviderViaLoadBalancer functions. The private topology test now skips the direct /healthz TLS dial (unreachable on internal LB) while still validating the full OAuth IDP token flow. Move the private test group from Parallel to Sequential in the test matrix, running read-only tests before the mutating OAuth flow to guarantee isolation. Signed-off-by: Alessandro Rossi <[email protected]> Commit-Message-Assisted-by: Claude (via Claude Code)

openshift-ci · 2026-06-17T12:57:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Nirshal
Once this PR has been reviewed and has the lgtm label, please ask for approval from cblecker. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Nirshal · 2026-06-17T12:58:55Z

/test e2e-azure-v2-self-managed

openshift-ci · 2026-06-17T13:51:48Z

@Nirshal: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-azure-v2-self-managed	`c274044`	link	true	`/test e2e-azure-v2-self-managed`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

hypershift-jira-solve-ci · 2026-06-17T14:06:24Z

Now I have the full root cause. Here's the analysis:

Test Failure Analysis Complete

Job Information

Prow Job: N/A — these are Codecov status checks, not Prow CI jobs
Build ID: N/A
PR: CNTRLPLANE-3277: Add Azure OAuth LoadBalancer private topology e2e test #8584
Commit: c274044
Failed Checks: codecov/project (state: error), codecov/patch (state: error)

Test Failure Analysis

Error

codecov/project: "CI failed." (state: error, 2026-06-17T13:52:43Z)
codecov/patch:   "CI failed." (state: error, 2026-06-17T13:52:44Z)

Summary

The Codecov checks initially passed successfully at 13:05 UTC (CheckRun conclusion: SUCCESS, coverage: "41.79% (+0.00%) compared to 7507291", patch: "Coverage not affected"). Approximately 47 minutes later, at 13:52 UTC, Codecov posted a second round of commit status updates with state error and description "CI failed." — this was triggered exactly ~84 seconds after the ci/prow/e2e-azure-v2-self-managed Prow job posted its failure status at 13:51:19 UTC. This is a Codecov reactive behavior, not a coverage failure. The coverage itself passed; Codecov simply mirrors CI failure state back as an error on its own status checks.

Root Cause

The root cause is not a Codecov or coverage issue — it is a cascading status effect from the ci/prow/e2e-azure-v2-self-managed Prow job failure.

Here is the exact sequence of events:

13:05:07–13:05:08 UTC — Codecov posted CheckRun results: codecov/project = SUCCESS (41.79% coverage, +0.00%), codecov/patch = SUCCESS ("Coverage not affected"). These are the real coverage results.
13:07:52–13:07:53 UTC — Codecov also posted commit StatusContext results confirming success.
13:51:19 UTC — The ci/prow/e2e-azure-v2-self-managed Prow job reported FAILURE.
13:52:43–13:52:44 UTC — Codecov reacted by posting new commit StatusContext entries with state: error and description: "CI failed.". This overwrote the earlier successful StatusContext entries.

The Codecov configuration in .codecov.yml has wait_for_ci: false, which means Codecov reports coverage results before CI completes. However, Codecov still monitors CI status afterward and updates its own commit status to reflect CI failures — this is standard Codecov behavior where it signals "CI failed" as an error on its status checks even though coverage analysis itself succeeded.

GitHub's status API uses a "last status wins" model per context — the later error status at 13:52 overwrites the earlier success status at 13:07 for the same codecov/project and codecov/patch context names. The CheckRun API entries (which show SUCCESS) remain unaffected, creating the apparent contradiction where CheckRuns show success but StatusContexts show error.

These Codecov errors are purely informational and do not indicate any problem with the PR's code or test coverage. The actual failure to investigate is ci/prow/e2e-azure-v2-self-managed.

Recommendations

Ignore the Codecov errors — they are a mirror of the ci/prow/e2e-azure-v2-self-managed failure and carry no independent signal. Coverage is fine at 41.79% with no regression.
Investigate the real failure — the ci/prow/e2e-azure-v2-self-managed job that failed at 13:51:19 UTC. That job's artifacts are at: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_hypershift/8584/pull-ci-openshift-hypershift-main-e2e-azure-v2-self-managed/2067230554738135040
No action needed on Codecov configuration — the wait_for_ci: false setting is correct for fast feedback. The "CI failed" error status is expected Codecov behavior and does not block merge (Prow/tide does not gate on codecov status checks).

Evidence

Evidence	Detail
codecov/project CheckRun	SUCCESS at 13:05:07 UTC, conclusion: success
codecov/patch CheckRun	SUCCESS at 13:05:08 UTC, conclusion: success
codecov/project StatusContext (initial)	`success` at 13:07:52 UTC — "41.79% (+0.00%) compared to `7507291`"
codecov/patch StatusContext (initial)	`success` at 13:07:53 UTC — "Coverage not affected when comparing `7507291...c274044`"
ci/prow/e2e-azure-v2-self-managed	`failure` at 13:51:19 UTC — "Job failed."
codecov/project StatusContext (overwrite)	`error` at 13:52:43 UTC — "CI failed." (84s after Prow failure)
codecov/patch StatusContext (overwrite)	`error` at 13:52:44 UTC — "CI failed." (85s after Prow failure)
.codecov.yml config	`wait_for_ci: false` — reports coverage before CI finishes, but still reacts to CI failures
Coverage impact	Zero — 41.79% project coverage, no change on patch

Nirshal · 2026-06-17T14:59:43Z

CI failure on e2e-azure-v2-self-managed was an infrastructure issue — the hypershift-install step timed out after ~5m due to API server rate limiting (client rate limiter Wait returned an error: context deadline exceeded). The HyperShift operator never deployed, so no test clusters were created and none of our test code ran. Retriggering.

/test e2e-azure-v2-self-managed

cblecker

Both previous blocking items are addressed — thank you for the updates! One minor annotation fix needed for CI tooling.

cblecker

nit: missing [Feature:AzureOAuth] annotation for Sippy mapping

cblecker · 2026-06-17T17:17:00Z

+// The OAuth token flow test requires the test runner to have network connectivity
+// to the Azure VNet (e.g., running in the management cluster or via VPN/peering).
+func AzureOAuthLoadBalancerPrivateTest(getTestCtx internal.TestContextGetter) {
+	Context("Azure OAuth LoadBalancer in Private Topology", Label("Azure", "self-managed-azure-oauth-lb-private"), Ordered, func() {


The Context block is missing the [Feature:AzureOAuth] annotation that the other tests in this file have (e.g., line 230). Per the v2 AGENTS.md section 19, this is needed for Sippy Component Readiness mapping. Should be:

Context("[Feature:AzureOAuth] Azure OAuth LoadBalancer in Private Topology", Label("Azure", "self-managed-azure-oauth-lb-private"), Ordered, func() {

I totally missed that, thanks a lot for finding it!

…uth test The Context block for AzureOAuthLoadBalancerPrivateTest was missing the [Feature:AzureOAuth] Sippy/CR annotation required by v2 AGENTS.md section 19 for test-to-Jira component mapping. Signed-off-by: Alessandro Rossi <[email protected]> Commit-Message-Assisted-by: Claude (via Claude Code)

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 25, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2026

openshift-ci Bot added the do-not-merge/needs-area label May 25, 2026

openshift-ci Bot added area/platform/azure PR/issue for Azure (AzurePlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels May 25, 2026

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

This was referenced May 26, 2026

OCPBUGS-84577: clear stale EtcdRecoveryActive failure condition when etcd is healthy #8406

Merged

OCPBUGS-84303: fix(api): add IPv6 OVN join subnet config to prevent dual-stack routing collision #8421

Merged

Nirshal force-pushed the CNTRLPLANE-3277 branch from b8a5e30 to 5d52eb0 Compare May 26, 2026 13:51

Nirshal marked this pull request as ready for review May 26, 2026 13:56

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026

openshift-ci Bot requested review from cblecker and sjenning May 26, 2026 13:58

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

cblecker suggested changes May 29, 2026

View reviewed changes

openshift-ci Bot assigned cblecker May 29, 2026

Nirshal added 2 commits June 17, 2026 14:14

Nirshal force-pushed the CNTRLPLANE-3277 branch from 5d52eb0 to c274044 Compare June 17, 2026 12:56

cblecker reviewed Jun 17, 2026

View reviewed changes

Conversation

Nirshal commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented May 25, 2026

Uh oh!

openshift-ci-robot commented May 25, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

openshift-ci Bot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Nirshal commented May 26, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nirshal May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Nirshal commented May 26, 2026

Uh oh!

Nirshal commented May 26, 2026

Uh oh!

Nirshal commented May 26, 2026

Uh oh!

cblecker left a comment

Choose a reason for hiding this comment

Uh oh!

cblecker May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Nirshal Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

cblecker May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Nirshal Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

Nirshal commented Jun 17, 2026

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

hypershift-jira-solve-ci Bot commented Jun 17, 2026

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Nirshal commented May 25, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented May 25, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading

codecov Bot commented May 25, 2026 •

edited

Loading

coderabbitai Bot May 26, 2026 •

edited

Loading