Skip to content

CNTRLPLANE-2589: Migrate to OTE support,fix pod security,Pathological Event monitor failures#859

Open
ropatil010 wants to merge 1 commit into
openshift:masterfrom
ropatil010:migrate-e2e-encryption-final
Open

CNTRLPLANE-2589: Migrate to OTE support,fix pod security,Pathological Event monitor failures#859
ropatil010 wants to merge 1 commit into
openshift:masterfrom
ropatil010:migrate-e2e-encryption-final

Conversation

@ropatil010

@ropatil010 ropatil010 commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

Hi Team,

This PR migrates authentication operator e2e tests from library-go to the OpenShift Tests Extension (OTE) framework and fixes multiple CI monitor test failures discovered during migration.

Part 1: OTE Framework Migration

1. Test Structure Refactoring
- **Extracted test logic from test files to separate implementation files** for dual-mode compatibility:
  - `test/e2e-oidc/external_oidc.go` - 1133 lines of test implementation
  - `test/e2e-encryption/encryption.go` - 79 lines of encryption test helpers  
  - `test/e2e-encryption-perf/encryption_perf.go` - 117 lines of performance test helpers
  - `test/e2e-encryption-rotation/encryption_rotation.go` - 85 lines of rotation test helpers
- **Updated test files to use OTE registration**:
  - Tests now accept `testing.TB` instead of `*testing.T` for broader compatibility
  - Test functions can run both standalone (`go test`) and via Ginkgo/OTE framework

2. New Test Library Infrastructure
Created reusable test libraries for better code organization:
- **`test/library/encryption/scenarios.go`** (247 lines) - Common encryption test scenarios
- **`test/library/encryption/perf_helpers.go`** (186 lines) - Performance measurement utilities  
- **`test/library/encryption_wrappers.go`** (39 lines) - OTE-compatible encryption test wrappers
- **`test/library/featuregates.go`** (42 lines) - Feature gate management helpers

3. Code Quality Improvements
- **Fixed golint issues**:
  - Context parameters now first in function signatures
  - Renamed `clientID` → `clientId` for consistent casing
  - Added godoc comments for exported types and functions
- **Improved TestNamespaceBuilder**:
  - Changed from private `testNamespaceBuilder` to public `TestNamespaceBuilder`
  - Added comprehensive documentation
  - **Fixed PSA label handling** (see Fix #1 below)

4. OIDC Test Compatibility Fixes
- **Fixed context handling**: Use `context.WithCancel` instead of `context.TODO()` for proper cleanup
- **Replaced Ginkgo-specific calls**: `g.By()` → `t.Logf()` for dual-mode compatibility

Part 2: CI Monitor Test Failures (Critical Fixes)
After migration, the new e2e-aws-operator-serial-ote job exposed previously hidden monitor test failures.
Fix #1: PodSecurityViolation Monitor Failures ✅ (50% Reduction)

**Problem:**
CI failing with PodSecurityViolation errors:

[Monitor:audit-log-analyzer] [bz-apiserver-auth][invariant] audit analysis PodSecurityViolation
[Monitor:legacy-test-framework-invariants-alerts] alert/PodSecurityViolation should not be at or above info

**Root Cause:**
- Only set `pod-security.kubernetes.io/enforce` label on test namespaces
- Missing `pod-security.kubernetes.io/audit` and `pod-security.kubernetes.io/warn` labels
- OpenShift cluster defaults: audit=restricted, warn=restricted
- Result: Audit logs captured violations even though pods were allowed to run

**Investigation (Live Cluster Testing on 2026-06-11):**
| IDP | Security Mode | Result |
|-----|---------------|--------|
| Keycloak | RESTRICTED | ✅ 1/1 Ready, fully functional |
| Keycloak | PRIVILEGED | ✅ Works but unnecessary |
| GitLab | RESTRICTED | ❌ CrashLoopBackOff - "Permission denied" errors |
| GitLab | PRIVILEGED | ✅ 1/1 Running, fully functional |

**Solution Implemented:**
1. **Fixed PSA label handling** (`test/library/client.go`):
   ```go
   func (b *TestNamespaceBuilder) WithPSaEnforcement(level psapi.Level) *TestNamespaceBuilder {
       b.ns.Labels[psapi.EnforceLevelLabel] = string(level)   // enforce
       b.ns.Labels[psapi.AuditLevelLabel] = string(level)     // audit ← ADDED
       b.ns.Labels[psapi.WarnLevelLabel] = string(level)      // warn ← ADDED
       return b
   }

Now all three PSA labels are set consistently, eliminating audit log violations.

  1. Made IDP security context configurable (test/library/idpdeployment.go):
    • Added createContainerSecurityContext() helper
    • Keycloak uses RESTRICTED mode (violations eliminated ✅)
    • GitLab uses PRIVILEGED mode (required for system services)
      Impact:
  • Before: Both Keycloak and GitLab triggering violations
  • After: Only GitLab triggering violations (unavoidable architectural limitation)
  • Net Result: 50% reduction in PodSecurityViolation failures ✅

Fix #2: Pathological Event Monitor Failures 

Problem:
Encryption tests failing with pathological event errors:

[Monitor:legacy-test-framework-invariants-pathological][sig-arch] 
events should not repeat pathologically for ns/openshift-oauth-apiserver
events should not repeat pathologically for ns/openshift-apiserver

Root Cause:

  • Encryption tests update apiServer.Spec.Encryption → triggers API server rollouts
  • API server rollouts cause temporary disruption events:
    • FailedScheduling (pods can't schedule during rollout)
    • ReadinessProbeFailed (new pods not ready yet)
  • Tests were NOT marked [Disruptive]
  • Monitor framework assumed cluster was "Stable" and flagged disruption events

Solution:
Added [Disruptive] label to all encryption test suites:

- ClusterStability: "",
+ ClusterStability: ote.ClusterStabilityDisruptive,

Applied to:

  • test/e2e-encryption/encryption.go
  • test/e2e-encryption-perf/encryption_perf.go
  • test/e2e-encryption-rotation/encryption_rotation.go
  • test/e2e-encryption-kms/encryption_kms.go

Why This Works:

  • OTE framework checks test metadata to determine cluster stability
  • ClusterStabilityDisruptive → framework runs newDisruptiveMonitorTests()
  • Disruptive monitor tests skip pathological event checks
  • Disruption events from API server rollouts are now expected and ignored

Impact: Pathological event failures ELIMINATED


Documented Limitations (Cannot Fix in This PR)
  1. GitLab PodSecurityViolation (Unavoidable)
    Why it remains:
  • GitLab container is an all-in-one package running multiple system services
  • Requires root access for PostgreSQL, Redis, Nginx, Sidekiq, SSH management
  • Restricted mode → CrashLoopBackOff with "Permission denied" errors
  • Conclusion: Fundamental architectural limitation of the GitLab container

Justification:

  • Provides valuable OIDC test coverage for real-world IDP
  • No alternative GitLab image supports restricted mode
  • Net benefit: 50% reduction still achieved (Keycloak fixed)
  1. Unknown Keycloak Image (Separate Issue)
    Failure:
[Monitor:known-image-checker][sig-arch] Only known images used by tests
Error: quay.io/keycloak/keycloak:25.0 not in mirrored registry

Why not fixed here:

  • Requires changes to openshift/origin repository
  • See: test/extended/util/image/README.md in openshift/origin
  • Out of scope for authentication-operator PR

**Future work:**
- Submit PR to openshift/origin to add Keycloak 25.0 to known images list

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 27, 2026
@openshift-ci-robot

openshift-ci-robot commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

@ropatil010: This pull request references CNTRLPLANE-2589 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Hi Team,

This PR restores the Ginkgo v2 migration from PR #839 with a critical fix for the stdout corruption issue that caused it to be reverted in PR #857.

Changes from PR #839:

  • Migrate e2e-encryption, e2e-encryption-kms, e2e-encryption-perf, e2e-encryption-rotation, and e2e-oidc test suites to Ginkgo v2
  • Add testing.TB compatibility layer for library-go integration
  • Create new test implementation files with Ginkgo v2 specs
  • Configure appropriate test suite parallelism and qualifiers
  • Remove obsolete main_test.go files

Critical fix (new):

  • Redirect klog output to stderr to prevent warnings from corrupting JSON output during test listing. Previously, klog warnings were written to stdout, causing 'invalid character W' JSON parsing errors that blocked nightly CI jobs.

The fix adds klog.SetOutput(os.Stderr) in main() to ensure:

  • stdout contains only clean JSON (parseable by openshift-tests)
  • klog warnings still logged to stderr (visible for debugging)

Fixes the issue from PR #857.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from ibihim and liouk March 27, 2026 18:13
@coderabbitai

coderabbitai Bot commented Mar 28, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR refactors and significantly expands encryption and OIDC e2e test suites, extracts shared test implementations into a new helper layer, makes the Keycloak client context-aware across all operations, updates test library signatures for consistent context-first parameter ordering, and hardens pod security for identity provider test deployments.

Changes

Encryption E2E Test Suite

Layer / File(s) Summary
Suite registration and disruptive exclusion
cmd/cluster-authentication-operator-tests-ext/main.go
Blank-imports new encryption and OIDC e2e Ginkgo suites, registers three new serial encryption test suites (basic, rotation, perf) with tag-based filtering, and excludes disruptive tests from the operator serial suite.
Basic encryption test suite
test/e2e-encryption/encryption.go, test/e2e-encryption/encryption_test.go, test/e2e-encryption/main_test.go
Implements Ginkgo specs for identity, unset, and on/off encryption tests delegating to shared helpers; refactors package from e2eencryption to e2e_encryption.
Encryption rotation test suite
test/e2e-encryption-rotation/encryption_rotation.go, test/e2e-encryption-rotation/e2e-encryption-rotation_test.go
Implements Ginkgo serial test validating encryption key rotation using a RotationScenario with custom rotation forcing via unsupported config overrides and token-of-life assertions.
Encryption perf test suite
test/e2e-encryption-perf/encryption_perf.go, test/e2e-encryption-perf/encryption_perf_test.go
Implements AES-CBC performance test measuring migration duration with token creation workload (3 workers), stat collection (token count threshold 14,000), and 10-minute migration time cap.
Shared encryption scenarios and helpers
test/library/encryption/scenarios.go, test/library/encryption/perf_helpers.go, test/library/encryption/helpers.go
Provides TestPerfEncryption, TestEncryptionType dispatcher, provider-specific wrappers (Identity/Unset/AESCBC/AESGCM/KMS), TestEncryptionTurnOnAndOff, SetAndWaitForEncryptionType, async migration-progress watching, and parallel database population with worker management.
Encryption wrapper compatibility layer
test/library/encryption_wrappers.go
Provides testing.TB-accepting wrappers for library-go encryption test functions to support Ginkgo v2 without unsafe type conversion.

OIDC E2E Test Suite

Layer / File(s) Summary
OIDC test implementation with Keycloak integration
test/e2e-oidc/external_oidc.go
Implements comprehensive OIDC e2e flow: feature-gate checking, Keycloak setup, ConfigMap management, admission-time validation for invalid CEL claim mappings, operator degradation scenarios for invalid issuer config, successful OIDC rollout with multiple username/prefix-policy variants, auth-config JSON verification with feature-gated UID mapping, manual edit overwrite behavior, and JWT/JWKS validation helpers for token authentication scenarios.
OIDC test entry point wrapper
test/e2e-oidc/external_oidc_test.go
Converts TestExternalOIDCWithKeycloak from in-file implementation to thin wrapper delegating to shared helper.
Keycloak test context threading
test/e2e/keycloak.go
Updates Keycloak client API calls to thread testContext through client lookup, direct-access-grants enablement, group/user creation, and group removal operations.

Test Library Refactoring and Enhancements

Layer / File(s) Summary
Wait helper signature and parameter reordering
test/library/waits.go
Updates WaitForClusterOperatorStatusStable, WaitForNewKASRollout, WaitForClusterOperatorStatusAlwaysAvailable, and CheckClusterOperatorStatus to accept (ctx context.Context, t testing.TB, ...) instead of (t ..., ctx, ...) for consistent parameter order; refactors WaitForRouteAdmitted error handling.
Keycloak client context-aware refactoring
test/library/keycloakidp.go
Refactors KeycloakClient to be context-aware: adds kc.do(ctx, ...) HTTP helper using http.NewRequestWithContext, updates all admin methods to accept context.Context, implements pagination for ListGroups and ListUsers with first/max, adds DeleteGroup and DeleteUser with ID resolution and success-code flexibility (204/404), and threads context through AddKeycloakIDP.
New test library helpers and utilities
test/library/featuregates.go, test/library/client.go
Adds CheckFeatureGatesOrSkip to skip tests when required feature gates are disabled, updates NewClientConfigForTest error handling with require.NoError, and exports TestNamespaceBuilder fluent builder for namespace creation with pod security and label configuration.

IDP Deployment Security and Cleanup

Layer / File(s) Summary
Pod security hardening and IDP cleanup
test/library/idpdeployment.go
Changes IDP deployment from privileged to restricted pod security admission, removes privileged SCC RoleBinding, hardens container security context (non-root, no privilege escalation, runtime-default seccomp, drop all capabilities), and adds CleanIDPConfigByName helper to remove identity providers from oauth/cluster config.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • openshift/cluster-authentication-operator#839: Both PRs modify cmd/cluster-authentication-operator-tests-ext/main.go to register and import encryption-related OTE/Ginkgo suites (including serial encryption, rotation, and perf), so the changes overlap at the suite wiring level.
  • openshift/cluster-authentication-operator#901: The main PR updates the OIDC e2e test to expect the well-known readiness controller condition type change from WellKnownReadyControllerProgressing to WellKnownReadyControllerDegradationObserved, which directly matches the retrieved PR's readiness-condition refactor.
  • openshift/cluster-authentication-operator#857: Both PRs touch the encryption E2E suite wiring: the main PR adds/blank-imports and registers multiple encryption suites (including new serial basic/rotation/perf qualifiers) in cmd/cluster-authentication-operator-tests-ext/main.go, while the retrieved PR reverts/removes those same blank-import and encryption suite registrations.

Suggested labels

lgtm, verified

Suggested reviewers

  • ibihim

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings, 1 inconclusive)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error test/e2e-oidc/external_oidc.go:134 logs internal infrastructure: t.Logf("keycloak Admin URL: %s", kcClient.AdminURL()) exposes hostname, port, and network topology. Remove line 134: t.Logf("keycloak Admin URL: %s", kcClient.AdminURL()) to prevent exposure of internal infrastructure details.
Docstring Coverage ⚠️ Warning Docstring coverage is 65.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test quality issues: Keycloak group/user created but not cleaned up; fetchIssuerJWKS lacks context; assertions in encryption_perf.go create errors then assert NoError(). Add t.Cleanup() for Keycloak resources. Add context to fetchIssuerJWKS. Fix assertion logic that creates errors then requires them to be nil.
Microshift Test Compatibility ⚠️ Warning New Ginkgo e2e tests use unavailable OpenShift APIs (ClusterOperator, FeatureGates, Authentication, APIServerEncryption) without MicroShift protection tags ([apigroup:...] or [Skipped:MicroShift]). Add [apigroup:config.openshift.io] and [apigroup:operator.openshift.io] tags to test names, or use [Skipped:MicroShift] labels. MicroShift only supports Route and SecurityContextConstraints APIs.
Title check ❓ Inconclusive The PR title is vague and uses non-specific language (e.g. 'fix pod security', 'Pathological Event monitor failures') that obscures the primary objective of migrating e2e tests to Ginkgo v2 and OTE support. Revise the title to clearly highlight the main change: 'Migrate e2e encryption tests to Ginkgo v2 and OTE support' or 'Migrate e2e-encryption suites to Ginkgo v2 with OTE framework integration'.
✅ Passed checks (10 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test titles in the PR are stable and deterministic with no dynamic content (pod names, timestamps, UUIDs, namespaces, IPs) and are descriptive of the test functionality.
Single Node Openshift (Sno) Test Compatibility ✅ Passed All new Ginkgo e2e tests lack multi-node assumptions; they test APIServer encryption and OIDC auth which work on SNO single-node clusters without node scheduling or HA requirements.
Topology-Aware Scheduling Compatibility ✅ Passed PR is test-only refactoring. No deployment manifests, operator code, or controllers were modified; topology-aware scheduling check does not apply.
Ote Binary Stdout Contract ✅ Passed klog properly redirected to stderr via klog.LogToStderr(true) in main(); no process-level fmt.Print* or log.Print* calls; test packages use safe Ginkgo v2 patterns.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed New Ginkgo e2e tests contain no hardcoded IPv4 addresses, IPv4-only parsing, or external internet connectivity. OIDC test deploys Keycloak in-cluster and accesses only cluster-internal endpoints.
No-Weak-Crypto ✅ Passed PR uses only standard Go crypto libraries (crypto/rsa, crypto/tls, golang-jwt/v5) with no weak algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto, or unsafe token comparisons.
Container-Privileges ✅ Passed PR removes privileged mode and enforces restricted security contexts (AllowPrivilegeEscalation: false, RunAsNonRoot: true, RuntimeDefault seccomp) in all modified test files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch from debcb9a to 9541074 Compare March 30, 2026 02:49
@ropatil010

Copy link
Copy Markdown
Contributor Author

Before fix in CI logs we were hitting the same error which was mentioned in the Reverting PR: #857

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859/pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-ipv6/2037594585412145152

After fix the CI logs got pass

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859/pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-ipv6/2038524461480677376

/assign @liouk @stbenjam
Discussion: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1774601343401799

@ropatil010 ropatil010 changed the title CNTRLPLANE-2589: [WIP] Migrate e2e-encryption tests to Ginkgo v2 with stdout fix CNTRLPLANE-2589: Migrate e2e-encryption tests to Ginkgo v2 with stdout fix Mar 30, 2026
Comment thread pkg/controllers/readiness/wellknown_ready_controller.go Outdated
Comment thread test/library/client.go Outdated
Comment thread test/library/encryption/helpers.go Outdated
Comment thread pkg/controllers/readiness/wellknown_ready_controller.go Outdated
@liouk

liouk commented Apr 2, 2026

Copy link
Copy Markdown
Member

@ropatil010 please correct me if I'm wrong, but this issue seems that it's caused by OTE itself, is there a plan on fixing that on the OTE side?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
test/e2e-oidc/external_oidc.go (1)

441-470: Consider extending algorithm support for future-proofing.

The extractRSAPubKeyFunc only handles RS256. While this matches current Keycloak defaults, if the IDP configuration changes to use RS384/RS512, tests would fail with an unhelpful error. This is acceptable for now but worth noting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-oidc/external_oidc.go` around lines 441 - 470, extractRSAPubKeyFunc
currently only accepts "RS256" and returns an error for other RSA algorithms;
update it to accept other RSA signing algorithms (e.g., "RS384", "RS512") or any
algorithm that starts with "RS" so the JWKS parsing is future-proof: modify the
switch on key.Alg in extractRSAPubKeyFunc (and the related issuerJWKS/Key usage)
to handle "RS384" and "RS512" (or use a strings.HasPrefix(key.Alg, "RS") check)
and keep constructing the rsa.PublicKey the same way, while keeping the
unexpected-algorithm error for non-RS algorithms. Ensure the error messages
still include key.KID and key.Alg for clarity.
test/e2e-encryption-perf/encryption_perf.go (1)

55-66: Simplify error handling with direct t.Fatal calls.

The pattern of creating an error and immediately passing it to require.NoError is confusing. Since the error is always non-nil when created, require.NoError will always fail, making this equivalent to t.Fatal but harder to read.

♻️ Proposed simplification
 		AssertDBPopulatedFunc: func(t testing.TB, errorStore map[string]int, statStore map[string]int) {
 			tokenCount, ok := statStore[tokenStatsKey]
 			if !ok {
-				err := errors.New("missing oauth access tokens count stats, can't continue the test")
-				require.NoError(t, err)
+				t.Fatal("missing oauth access tokens count stats, can't continue the test")
 			}
 			if tokenCount < 14000 {
-				err := fmt.Errorf("expected to create at least 14000 tokens but %d were created", tokenCount)
-				require.NoError(t, err)
+				t.Fatalf("expected to create at least 14000 tokens but %d were created", tokenCount)
 			}
 			t.Logf("Created %d access tokens", tokenCount)
 		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-encryption-perf/encryption_perf.go` around lines 55 - 66, In
AssertDBPopulatedFunc simplify error handling by replacing the pattern that
constructs an error and calls require.NoError with direct test failures: when
statStore[tokenStatsKey] is missing call t.Fatalf (or require.FailNow) with a
clear message about the missing oauth access tokens count, and when tokenCount <
14000 call t.Fatalf (or require.FailNow) with a message showing the expected vs
actual tokenCount; update references around AssertDBPopulatedFunc and
tokenStatsKey accordingly.
test/e2e-encryption-rotation/encryption_rotation.go (1)

47-75: Consider adding retry logic for the operator update.

The UnsupportedConfigFunc performs a read-modify-write operation without handling potential conflict errors. If the operator object is modified between the Get (line 49) and Update (line 73), the update will fail with a conflict. For a 3-hour test, this could cause intermittent failures.

If conflicts are expected to be rare in this test context, the current implementation is acceptable, but retry-on-conflict would improve reliability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-encryption-rotation/encryption_rotation.go` around lines 47 - 75,
Wrap the read-modify-write in UnsupportedConfigFunc with retry-on-conflict logic
(e.g., use k8s.io/client-go/util/retry.RetryOnConflict) so that if
cs.OperatorClient.Update returns a conflict you re-get the Operator
(cs.OperatorClient.Get), reapply the unsupportedConfig merge steps and retry the
Update; ensure the loop returns the final error if retries are exhausted and
keep references to UnsupportedConfigFunc, cs.OperatorClient.Get, and
cs.OperatorClient.Update to locate the change.
test/e2e-encryption/encryption.go (1)

54-74: Consider reusing the provided ClientSet parameter.

The CreateResourceFunc callback receives a library.ClientSet parameter but ignores it, creating a new client via operatorencryption.GetClients(t). If this is intentional (e.g., needing operator-specific client configuration), adding a brief comment would clarify the design choice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-encryption/encryption.go` around lines 54 - 74, The
CreateResourceFunc inside testEncryptionTurnOnAndOff ignores the provided
library.ClientSet parameter and calls operatorencryption.GetClients(t); either
update CreateResourceFunc to use the passed-in client (the second parameter)
when calling operatorencryption.CreateAndStoreTokenOfLife (or adapt/convert the
library.ClientSet to the expected client type), or if using GetClients(t) is
intentional, add a brief comment above CreateResourceFunc explaining why
operatorencryption.GetClients(t) is required instead of the supplied client;
reference CreateResourceFunc, operatorencryption.CreateAndStoreTokenOfLife, and
operatorencryption.GetClients for locating the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Around line 136-137: The go.mod replace pointing to the forked module
(github.com/ropatil010/openshift-tests-extension
v0.0.0-20260406095036-3a490698dab7) introduces supply-chain and reproducibility
risk; update the repository by adding a clear TODO comment in go.mod and create
a tracking issue linking upstream PR `#60` ("feat: configure klog to write to
stderr") so we revert this replace as soon as that PR is merged, then remove the
replace and restore github.com/openshift-eng/openshift-tests-extension; also add
a CI/release gate (or note in the release checklist) to block merging release
candidates until the follow-up removal is completed.

In `@test/e2e-encryption-rotation/encryption_rotation.go`:
- Around line 27-28: Fix the typo in the comment above testEncryptionRotation:
change "encyrption.Reason" to "encryption.Reason" in the comment that describes
forcing a key rotation so the term is spelled correctly and matches the
configuration field name.

In `@test/library/encryption/perf_helpers.go`:
- Around line 35-46: The predicate passed to wait.PollUntilContextTimeout
currently sends any EncryptionMigrationControllerProgressing=True timestamp even
if it predates the test and it never checks the cond.Reason; update the
predicate used by wait.PollUntilContextTimeout in perf_helpers.go (the anonymous
func that calls getOperatorConditionsFn and writes to migrationStartedCh) to
also require cond.Reason == "Migrating" and that cond.LastTransitionTime is
strictly after a test start time captured before the poll (e.g., record start :=
time.Now() or start := time.Now().UTC() before calling
wait.PollUntilContextTimeout, then only send cond.LastTransitionTime.Time if
cond.Type == "EncryptionMigrationControllerProgressing" && cond.Status ==
operatorv1.ConditionTrue && cond.Reason == "Migrating" &&
cond.LastTransitionTime.Time.After(start)). Ensure the log message remains
meaningful when the newer transition is observed.

In `@test/library/encryption/scenarios.go`:
- Around line 189-210: The test uses a non-blocking select to read
migrationStartedCh which can fall through before the watcher goroutine sends;
change this to wait for the signal with a bounded timeout instead of using the
default case: replace the select block reading migrationStartedCh with a
blocking receive combined with a timeout (e.g., select { case migrationStarted
:= <-migrationStartedCh: ... case <-time.After(timeout): tb.Error(...) }) or use
a context with deadline that you cancel when runTestEncryptionPerf returns;
ensure you reference the watcher by keeping migrationStartedCh passed to
watchForMigrationControllerProgressingConditionAsync and use that same channel
for the bounded wait so the watcher is not left running indefinitely.
- Around line 72-83: The switch in TestEncryptionType currently routes the
empty-string provider ("") to TestEncryptionTypeIdentity which causes
APIServer.Spec.Encryption.Type to be set to "identity"; change the dispatch so
that case configv1.EncryptionTypeIdentity handles the explicit identity value
and the empty-string case ("") dispatches to TestEncryptionTypeUnset instead.
Update the switch in the TestEncryptionType function (the cases for
configv1.EncryptionTypeIdentity and "") so that "" calls
TestEncryptionTypeUnset(tb, scenario) while leaving the explicit Identity case
calling TestEncryptionTypeIdentity(tb, scenario).
- Around line 340-349: The APIServer Get+Update can fail with resourceVersion
conflicts; wrap the read-modify-update of apiServer (performed via
libClientSet.ApiServerConfig.Get and Update) inside a retry.RetryOnConflict
loop: import "k8s.io/client-go/util/retry", use retry.RetryOnConflict to
repeatedly Get the APIServer, set apiServer.Spec.Encryption.Type =
encryptionType when changed, and call Update until it succeeds or the retry
gives up; ensure errors from Update are returned/checked with
require.NoError(tb, err) as before.

---

Nitpick comments:
In `@test/e2e-encryption-perf/encryption_perf.go`:
- Around line 55-66: In AssertDBPopulatedFunc simplify error handling by
replacing the pattern that constructs an error and calls require.NoError with
direct test failures: when statStore[tokenStatsKey] is missing call t.Fatalf (or
require.FailNow) with a clear message about the missing oauth access tokens
count, and when tokenCount < 14000 call t.Fatalf (or require.FailNow) with a
message showing the expected vs actual tokenCount; update references around
AssertDBPopulatedFunc and tokenStatsKey accordingly.

In `@test/e2e-encryption-rotation/encryption_rotation.go`:
- Around line 47-75: Wrap the read-modify-write in UnsupportedConfigFunc with
retry-on-conflict logic (e.g., use k8s.io/client-go/util/retry.RetryOnConflict)
so that if cs.OperatorClient.Update returns a conflict you re-get the Operator
(cs.OperatorClient.Get), reapply the unsupportedConfig merge steps and retry the
Update; ensure the loop returns the final error if retries are exhausted and
keep references to UnsupportedConfigFunc, cs.OperatorClient.Get, and
cs.OperatorClient.Update to locate the change.

In `@test/e2e-encryption/encryption.go`:
- Around line 54-74: The CreateResourceFunc inside testEncryptionTurnOnAndOff
ignores the provided library.ClientSet parameter and calls
operatorencryption.GetClients(t); either update CreateResourceFunc to use the
passed-in client (the second parameter) when calling
operatorencryption.CreateAndStoreTokenOfLife (or adapt/convert the
library.ClientSet to the expected client type), or if using GetClients(t) is
intentional, add a brief comment above CreateResourceFunc explaining why
operatorencryption.GetClients(t) is required instead of the supplied client;
reference CreateResourceFunc, operatorencryption.CreateAndStoreTokenOfLife, and
operatorencryption.GetClients for locating the change.

In `@test/e2e-oidc/external_oidc.go`:
- Around line 441-470: extractRSAPubKeyFunc currently only accepts "RS256" and
returns an error for other RSA algorithms; update it to accept other RSA signing
algorithms (e.g., "RS384", "RS512") or any algorithm that starts with "RS" so
the JWKS parsing is future-proof: modify the switch on key.Alg in
extractRSAPubKeyFunc (and the related issuerJWKS/Key usage) to handle "RS384"
and "RS512" (or use a strings.HasPrefix(key.Alg, "RS") check) and keep
constructing the rsa.PublicKey the same way, while keeping the
unexpected-algorithm error for non-RS algorithms. Ensure the error messages
still include key.KID and key.Alg for clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a512169-a204-4df2-92e6-760ec43345c5

📥 Commits

Reviewing files that changed from the base of the PR and between 9d60ca2 and 626f7d7.

⛔ Files ignored due to path filters (17)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift-eng/openshift-tests-extension/pkg/cmd/cmd.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift-eng/openshift-tests-extension/pkg/cmd/cmdrun/runsuite.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift-eng/openshift-tests-extension/pkg/extension/extensiontests/result.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift-eng/openshift-tests-extension/pkg/extension/extensiontests/result_writer.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift-eng/openshift-tests-extension/pkg/extension/extensiontests/viewer.html is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift-eng/openshift-tests-extension/pkg/ginkgo/logging.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/README.md is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues_no_slog.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues_slog.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/klog.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/klogr.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/klogr_slog.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/textlogger/options.go is excluded by !vendor/**, !**/vendor/**
  • vendor/k8s.io/klog/v2/textlogger/textlogger.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (22)
  • cmd/cluster-authentication-operator-tests-ext/main.go
  • go.mod
  • pkg/controllers/readiness/wellknown_ready_controller.go
  • test/e2e-encryption-kms/encryption_kms.go
  • test/e2e-encryption-kms/encryption_kms_test.go
  • test/e2e-encryption-kms/main_test.go
  • test/e2e-encryption-perf/encryption_perf.go
  • test/e2e-encryption-perf/encryption_perf_test.go
  • test/e2e-encryption-rotation/e2e-encryption-rotation_test.go
  • test/e2e-encryption-rotation/encryption_rotation.go
  • test/e2e-encryption/encryption.go
  • test/e2e-encryption/encryption_test.go
  • test/e2e-encryption/main_test.go
  • test/e2e-oidc/external_oidc.go
  • test/e2e-oidc/external_oidc_test.go
  • test/library/client.go
  • test/library/encryption/helpers.go
  • test/library/encryption/perf_helpers.go
  • test/library/encryption/scenarios.go
  • test/library/encryption_wrappers.go
  • test/library/featuregates.go
  • test/library/waits.go
💤 Files with no reviewable changes (2)
  • test/e2e-encryption/main_test.go
  • test/e2e-encryption-kms/main_test.go

Comment thread go.mod Outdated
Comment thread test/e2e-encryption-rotation/encryption_rotation.go Outdated
Comment thread test/library/encryption/perf_helpers.go Outdated
Comment thread test/library/encryption/scenarios.go Outdated
Comment thread test/library/encryption/scenarios.go Outdated
Comment thread test/library/encryption/scenarios.go Outdated
@ropatil010 ropatil010 changed the title CNTRLPLANE-2589: Migrate e2e-encryption tests to Ginkgo v2 with stdout fix CNTRLPLANE-2589: [WIP] Migrate e2e-encryption tests to Ginkgo v2 with stdout fix Apr 6, 2026
@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch from 4342519 to 8f115d7 Compare April 7, 2026 14:45
ropatil010 added a commit to ropatil010/cluster-authentication-operator that referenced this pull request Apr 7, 2026
This commit migrates five test suites (e2e-encryption, e2e-encryption-kms,
e2e-encryption-perf, e2e-encryption-rotation, e2e-oidc) from standalone
test binaries to the OpenShift Tests Extension (OTE) framework using Ginkgo v2.

Key changes:
- Migrate tests to Ginkgo v2 with OTE integration
- Extract test logic into dedicated helper files for better organization
- Remove obsolete main_test.go files in favor of Ginkgo suite setup
- Configure appropriate test suite parallelism and qualifiers
- Add klog stderr redirection to fix JSON output corruption

Critical fix for klog output corruption:
- Add klog.SetOutput(os.Stderr) in test binary's main() to prevent klog
  warnings from corrupting JSON output during 'openshift-tests list' commands
- This fixes "invalid character 'W'" parsing errors in CI that caused PR openshift#839
  to be reverted in PR openshift#857
- Ensures stdout contains only clean JSON for CI consumption while klog
  warnings go to stderr

Test organization:
- operator/parallel: Non-serial operator, templates, and tokens tests (parallelism: 4)
- operator/serial: Serial operator, OIDC, templates, and tokens tests
- operator/disruptive: Tests marked as [Disruptive]
- operator-encryption/serial: Basic encryption tests
- operator-encryption-rotation/serial: Encryption key rotation tests
- operator-encryption-perf/serial: Encryption performance tests
- operator-encryption-kms/serial: KMS encryption tests

Dependencies:
- Use standard OTE version v0.0.0-20251205182537-ff5553e56f33
- Updated klog dependency to v2.140.0
- No fork dependencies

See: openshift#857 (revert)
     openshift#859 (this PR)

Co-Authored-By: Rohit Patil <[email protected]>
@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch from 8f115d7 to 6c90e52 Compare April 7, 2026 15:19
ropatil010 added a commit to ropatil010/cluster-authentication-operator that referenced this pull request Apr 8, 2026
Restore the init package approach to fix 'invalid character W looking for
beginning of value' errors in CI when listing tests.

The previous fix using klog.SetOutput(os.Stderr) in main() was too late -
klog warnings were already generated during Go's init phase, corrupting the
JSON output before main() could run.

This commit implements a two-phase approach:

1. Init phase (pkg/test/init): Suppress all klog output using io.Discard
   - Runs during Go initialization, before any dependency can trigger warnings
   - Prevents warnings from appearing in stdout during 'openshift-tests list'

2. Main phase: Reconfigure klog to stderr for normal test execution
   - Enables proper logging after initialization completes
   - Keeps stdout clean for JSON output

This is the same approach that made commit 9541074 pass in CI, but was
inadvertently removed in commit 6c90e52, causing test failures.

Fixes the root cause of CI failures in PR openshift#859
See: openshift#857

Co-Authored-By: Rohit Patil <[email protected]>
Comment thread cmd/cluster-authentication-operator-tests-ext/main.go Outdated
@ropatil010

Copy link
Copy Markdown
Contributor Author

Waiting for PR: openshift/origin#31009 to gets merged which fixes the issue "error: encountered errors while listing tests: invalid character 'W' looking for beginning of value".

The above mentioned issue has been fixed in this PR using init things but looks like it is not acceptable.

@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch from 753ca4a to aa529ed Compare April 28, 2026 05:41
@openshift-ci-robot

openshift-ci-robot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

@ropatil010: This pull request references CNTRLPLANE-2589 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Hi Team,

This PR restores the Ginkgo v2 migration from PR #839 with a critical fix for the stdout corruption issue that caused it to be reverted in PR #857.

Changes from PR #839:

  • Migrate e2e-encryption, e2e-encryption-kms, e2e-encryption-perf, e2e-encryption-rotation, and e2e-oidc test suites to Ginkgo v2
  • Add testing.TB compatibility layer for library-go integration
  • Create new test implementation files with Ginkgo v2 specs
  • Configure appropriate test suite parallelism and qualifiers
  • Remove obsolete main_test.go files

Critical fix (new):

  • Redirect klog output to stderr to prevent warnings from corrupting JSON output during test listing. Previously, klog warnings were written to stdout, causing 'invalid character W' JSON parsing errors that blocked nightly CI jobs.

The fix adds klog.SetOutput(os.Stderr) in main() to ensure:

  • stdout contains only clean JSON (parseable by openshift-tests)
  • klog warnings still logged to stderr (visible for debugging)

Fixes the issue from PR #857.

Summary by CodeRabbit

  • Tests

  • Added comprehensive E2E test suites for KMS encryption scenarios, external OIDC (Keycloak) authentication, encryption rotation, and encryption performance testing.

  • Refactored encryption tests to use shared helper functions for improved maintainability.

  • Enhanced test infrastructure with feature gate validation and migration monitoring utilities.

  • Removed custom test execution order randomization.

  • Chores

  • Updated logging-related dependency k8s.io/klog/v2 to v2.140.0.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
test/library/encryption/scenarios.go (3)

340-349: ⚠️ Potential issue | 🟠 Major

Retry the APIServer update on conflict.

This is still a single read-modify-write against apiserver/cluster. If anything updates the object between Get and Update, the test fails on a transient resourceVersion conflict even though retrying would succeed. Wrap the mutation in retry.RetryOnConflict here.

Suggested fix
+	"k8s.io/client-go/util/retry"
-	apiServer, err := libClientSet.ApiServerConfig.Get(context.TODO(), "cluster", metav1.GetOptions{})
-	require.NoError(tb, err)
-
-	// Update encryption type if needed
-	needsUpdate := apiServer.Spec.Encryption.Type != encryptionType
-	if needsUpdate {
-		tb.Logf("Updating encryption type in the config file for APIServer to %q", encryptionType)
-		apiServer.Spec.Encryption.Type = encryptionType
-		_, err = libClientSet.ApiServerConfig.Update(context.TODO(), apiServer, metav1.UpdateOptions{})
-		require.NoError(tb, err)
+	apiServer, err := libClientSet.ApiServerConfig.Get(context.TODO(), "cluster", metav1.GetOptions{})
+	require.NoError(tb, err)
+
+	needsUpdate := apiServer.Spec.Encryption.Type != encryptionType
+	if needsUpdate {
+		tb.Logf("Updating encryption type in the config file for APIServer to %q", encryptionType)
+		err = retry.RetryOnConflict(retry.DefaultRetry, func() error {
+			apiServer, err = libClientSet.ApiServerConfig.Get(context.TODO(), "cluster", metav1.GetOptions{})
+			if err != nil {
+				return err
+			}
+			apiServer.Spec.Encryption.Type = encryptionType
+			_, err = libClientSet.ApiServerConfig.Update(context.TODO(), apiServer, metav1.UpdateOptions{})
+			return err
+		})
+		require.NoError(tb, err)
 	} else {
 		tb.Logf("APIServer is already configured to use %q mode", encryptionType)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/library/encryption/scenarios.go` around lines 340 - 349, The update to
the APIServer object does a single read-modify-write and can fail on
resourceVersion conflicts; wrap the mutation+Update call in a
retry.RetryOnConflict loop so it re-reads and retries on conflict. Locate the
block that calls libClientSet.ApiServerConfig.Get and then sets
apiServer.Spec.Encryption.Type and calling libClientSet.ApiServerConfig.Update,
and replace that update path with a retry.RetryOnConflict that fetches the
latest ApiServer, applies the same encryptionType mutation to
apiServer.Spec.Encryption.Type, and retries the Update until it succeeds or a
non-conflict error occurs.

204-210: ⚠️ Potential issue | 🟠 Major

Don't use a non-blocking receive for migrationStartedCh.

If the watcher goroutine sends a moment later, this default path reports a false failure and never computes the migration duration. Wait on the same channel with a bounded timeout or cancelable context instead of falling through immediately.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/library/encryption/scenarios.go` around lines 204 - 210, The
non-blocking select on migrationStartedCh can cause false failures; replace the
select/default with a blocking receive that waits with a bounded timeout or a
cancelable context so the test actually observes the watcher send. Specifically,
wait for migrationStartedCh (instead of the default path) and compute
migrationTime = endTimeStamp.Sub(migrationStarted) before calling tb.Logf and
scenario.AssertMigrationTime; if the timeout/context expires, call tb.Error with
a clear timeout message. Use the existing symbols migrationStartedCh,
endTimeStamp, tb.Logf, and scenario.AssertMigrationTime to locate and update the
code.

73-81: ⚠️ Potential issue | 🟠 Major

Dispatch "" to TestEncryptionTypeUnset.

The empty-string provider still falls through to TestEncryptionTypeIdentity, which writes "identity" into APIServer.Spec.Encryption.Type instead of exercising the unset/defaulting path that this helper already models separately.

Suggested fix
 	switch provider {
 	case configv1.EncryptionTypeAESCBC:
 		TestEncryptionTypeAESCBC(tb, scenario)
 	case configv1.EncryptionTypeAESGCM:
 		TestEncryptionTypeAESGCM(tb, scenario)
 	case configv1.EncryptionTypeKMS:
 		TestEncryptionTypeKMS(tb, scenario)
-	case configv1.EncryptionTypeIdentity, "":
+	case configv1.EncryptionTypeIdentity:
 		TestEncryptionTypeIdentity(tb, scenario)
+	case "":
+		TestEncryptionTypeUnset(tb, scenario)
 	default:
 		tb.Fatalf("Unknown encryption type: %s", provider)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/library/encryption/scenarios.go` around lines 73 - 81, The switch in the
dispatcher incorrectly routes the empty-string provider to
TestEncryptionTypeIdentity; update the case handling so that
configv1.EncryptionTypeIdentity maps only to TestEncryptionTypeIdentity and the
empty string ("") maps to TestEncryptionTypeUnset. Locate the switch in
TestEncryption scenarios (the switch on provider in scenarios.go) and replace
the combined case `case configv1.EncryptionTypeIdentity, "":
TestEncryptionTypeIdentity(tb, scenario)` with two separate cases: one calling
TestEncryptionTypeIdentity(tb, scenario) for configv1.EncryptionTypeIdentity and
one calling TestEncryptionTypeUnset(tb, scenario) for the "" provider; leave the
other cases (TestEncryptionTypeAESCBC, TestEncryptionTypeAESGCM,
TestEncryptionTypeKMS) unchanged.
test/library/encryption/perf_helpers.go (1)

35-46: ⚠️ Potential issue | 🟠 Major

Only capture a fresh "Migrating" transition.

This still records any EncryptionMigrationControllerProgressing=True condition, even if it predates the current test or has a different reason. That can inflate the reported migration duration or use a stale start time. Capture a start := time.Now() before polling and only send LastTransitionTime when the condition is True, Reason == "Migrating", and the transition is newer than start.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/library/encryption/perf_helpers.go` around lines 35 - 46, The poll
currently sends any EncryptionMigrationControllerProgressing condition with
Status True, possibly using a stale LastTransitionTime; fix by recording a start
:= time.Now() immediately before calling wait.PollUntilContextTimeout and inside
the poll only send cond.LastTransitionTime to migrationStartedCh when cond.Type
== "EncryptionMigrationControllerProgressing", cond.Status ==
operatorv1.ConditionTrue, cond.Reason == "Migrating" and
cond.LastTransitionTime.Time.After(start); use getOperatorConditionsFn to fetch
conditions and ensure you return true,nil only when the transition is newer than
start.
🧹 Nitpick comments (3)
pkg/controllers/readiness/wellknown_ready_controller.go (1)

355-358: Optional: cache KAS service port once per function for readability.

Line 357 is correct, but pulling getKASServicePort() out of the loop makes intent clearer and avoids repeated calls in the condition.

♻️ Proposed refactor
 func getKASTargetPortFromService(service *corev1.Service) (int, bool) {
+	kasPort := getKASServicePort()
 	for _, port := range service.Spec.Ports {
-		if targetPort := port.TargetPort.IntValue(); targetPort != 0 && port.Protocol == corev1.ProtocolTCP && int(port.Port) == getKASServicePort() {
+		if targetPort := port.TargetPort.IntValue(); targetPort != 0 && port.Protocol == corev1.ProtocolTCP && int(port.Port) == kasPort {
 			return targetPort, true
 		}
 	}
 	return 0, false
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/readiness/wellknown_ready_controller.go` around lines 355 -
358, The loop in getKASTargetPortFromService repeatedly calls
getKASServicePort() inside the condition which obscures intent; cache the result
once before iterating by assigning kasPort := getKASServicePort(), then use
kasPort in the loop condition (referencing getKASTargetPortFromService and
getKASServicePort) so the code is clearer and avoids repeated calls.
test/e2e-encryption-perf/encryption_perf.go (2)

55-65: Consider using t.Fatalf directly instead of require.NoError with fabricated errors.

The current pattern creates an error just to pass it to require.NoError, which is indirect. Using t.Fatalf would be more idiomatic:

♻️ Suggested simplification
 AssertDBPopulatedFunc: func(t testing.TB, errorStore map[string]int, statStore map[string]int) {
 	tokenCount, ok := statStore[tokenStatsKey]
 	if !ok {
-		err := errors.New("missing oauth access tokens count stats, can't continue the test")
-		require.NoError(t, err)
+		t.Fatalf("missing oauth access tokens count stats, can't continue the test")
 	}
 	if tokenCount < 14000 {
-		err := fmt.Errorf("expected to create at least 14000 tokens but %d were created", tokenCount)
-		require.NoError(t, err)
+		t.Fatalf("expected to create at least 14000 tokens but %d were created", tokenCount)
 	}
 	t.Logf("Created %d access tokens", tokenCount)
 },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-encryption-perf/encryption_perf.go` around lines 55 - 65, The
AssertDBPopulatedFunc currently fabricates errors and calls require.NoError;
change those to direct test failures using t.Fatalf: when tokenStatsKey is
missing, call t.Fatalf("missing oauth access tokens count stats, can't continue
the test"); when tokenCount < 14000, call t.Fatalf("expected to create at least
14000 tokens but %d were created", tokenCount). Update the block in
AssertDBPopulatedFunc (which reads tokenCount, ok := statStore[tokenStatsKey])
to remove the temporary err variables and use t.Fatalf with the appropriate
messages.

81-97: Unused namespace parameter in the returned function.

The namespace parameter on line 82 is unused. If this is intentional (OAuth access tokens are cluster-scoped), consider using _ to make it explicit that it's intentionally ignored for clarity, consistent with the other unused parameters.

♻️ Suggested clarification
 func createAccessTokenWrapper(ctx context.Context, tokenClient oauthclient.OAuthAccessTokensGetter) library.DBLoaderFuncType {
-	return func(_ kubernetes.Interface, namespace string, errorCollector func(error), statsCollector func(string)) error {
+	return func(_ kubernetes.Interface, _ string, errorCollector func(error), statsCollector func(string)) error {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-encryption-perf/encryption_perf.go` around lines 81 - 97, The
returned anonymous function in createAccessTokenWrapper currently declares a
parameter named namespace that is never used; update the signature of the
returned func (in createAccessTokenWrapper) to replace the unused namespace
parameter with _ (e.g., func(_ kubernetes.Interface, _ string, errorCollector
func(error), statsCollector func(string)) error) so it's explicit the namespace
is intentionally ignored while keeping the rest of the behavior the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e-oidc/external_oidc.go`:
- Line 75: The suite is being gated incorrectly: change the
CheckFeatureGatesOrSkip call so it only requires FeatureGateExternalOIDC (i.e.,
call CheckFeatureGatesOrSkip(t, testCtx, testClient.configClient,
features.FeatureGateExternalOIDC) instead of passing both gates), and remove
FeatureGateExternalOIDCWithAdditionalClaimMappings from the top-level gate; for
tests that need the additional-claim behavior, add local calls to
CheckFeatureGatesOrSkip (or explicit checks) inside those specific subtests
before exercising code paths that depend on
FeatureGateExternalOIDCWithAdditionalClaimMappings; update references around
CheckFeatureGatesOrSkip and any subtest bodies that currently assume the extra
gate is globally enabled.
- Around line 111-118: The cleanup currently always calls
test.WaitForNewKASRollout after testClient.authResourceRollback even when
authResourceRollback was a no-op; change the cleanup closure (around
kasOriginalRevision, testClient.authResourceRollback, and
test.WaitForNewKASRollout) to detect whether the rollback actually persisted a
change and only call WaitForNewKASRollout when it did. Implement this by either:
(A) updating authResourceRollback to return an indicator (e.g., bool changed) in
addition to error and branch on that, or (B) after authResourceRollback, fetch
the current authentication spec (same shape as origAuthSpec) and compare it to
origAuthSpec and only invoke test.WaitForNewKASRollout if they differ; ensure
require.NoError checks still run for the rollback error but skip the rollout
wait when no spec change occurred.
- Around line 554-568: The informers are started with a non-cancelable context,
leaving goroutines running; change the Start/WaitForCacheSync usage to use a
cancelable context derived via context.WithCancel (derive cancelCtx, cancel :=
context.WithCancel(ctx) or similar) before calling
dynamicInformers.Start(cancelCtx.Done()) and
dynamicInformers.WaitForCacheSync(cancelCtx.Done()), and register cleanup to
stop them by either calling cancel() during test teardown or storing
dynamicInformers on the testClient and calling
dynamicInformers.Shutdown()/Shutdown in the teardown; update the code paths
around NewClusterScopedOperatorClient, dynamicInformers, Start,
WaitForCacheSync, and testClient to ensure the cancel function or Shutdown is
invoked in cleanup.

---

Duplicate comments:
In `@test/library/encryption/perf_helpers.go`:
- Around line 35-46: The poll currently sends any
EncryptionMigrationControllerProgressing condition with Status True, possibly
using a stale LastTransitionTime; fix by recording a start := time.Now()
immediately before calling wait.PollUntilContextTimeout and inside the poll only
send cond.LastTransitionTime to migrationStartedCh when cond.Type ==
"EncryptionMigrationControllerProgressing", cond.Status ==
operatorv1.ConditionTrue, cond.Reason == "Migrating" and
cond.LastTransitionTime.Time.After(start); use getOperatorConditionsFn to fetch
conditions and ensure you return true,nil only when the transition is newer than
start.

In `@test/library/encryption/scenarios.go`:
- Around line 340-349: The update to the APIServer object does a single
read-modify-write and can fail on resourceVersion conflicts; wrap the
mutation+Update call in a retry.RetryOnConflict loop so it re-reads and retries
on conflict. Locate the block that calls libClientSet.ApiServerConfig.Get and
then sets apiServer.Spec.Encryption.Type and calling
libClientSet.ApiServerConfig.Update, and replace that update path with a
retry.RetryOnConflict that fetches the latest ApiServer, applies the same
encryptionType mutation to apiServer.Spec.Encryption.Type, and retries the
Update until it succeeds or a non-conflict error occurs.
- Around line 204-210: The non-blocking select on migrationStartedCh can cause
false failures; replace the select/default with a blocking receive that waits
with a bounded timeout or a cancelable context so the test actually observes the
watcher send. Specifically, wait for migrationStartedCh (instead of the default
path) and compute migrationTime = endTimeStamp.Sub(migrationStarted) before
calling tb.Logf and scenario.AssertMigrationTime; if the timeout/context
expires, call tb.Error with a clear timeout message. Use the existing symbols
migrationStartedCh, endTimeStamp, tb.Logf, and scenario.AssertMigrationTime to
locate and update the code.
- Around line 73-81: The switch in the dispatcher incorrectly routes the
empty-string provider to TestEncryptionTypeIdentity; update the case handling so
that configv1.EncryptionTypeIdentity maps only to TestEncryptionTypeIdentity and
the empty string ("") maps to TestEncryptionTypeUnset. Locate the switch in
TestEncryption scenarios (the switch on provider in scenarios.go) and replace
the combined case `case configv1.EncryptionTypeIdentity, "":
TestEncryptionTypeIdentity(tb, scenario)` with two separate cases: one calling
TestEncryptionTypeIdentity(tb, scenario) for configv1.EncryptionTypeIdentity and
one calling TestEncryptionTypeUnset(tb, scenario) for the "" provider; leave the
other cases (TestEncryptionTypeAESCBC, TestEncryptionTypeAESGCM,
TestEncryptionTypeKMS) unchanged.

---

Nitpick comments:
In `@pkg/controllers/readiness/wellknown_ready_controller.go`:
- Around line 355-358: The loop in getKASTargetPortFromService repeatedly calls
getKASServicePort() inside the condition which obscures intent; cache the result
once before iterating by assigning kasPort := getKASServicePort(), then use
kasPort in the loop condition (referencing getKASTargetPortFromService and
getKASServicePort) so the code is clearer and avoids repeated calls.

In `@test/e2e-encryption-perf/encryption_perf.go`:
- Around line 55-65: The AssertDBPopulatedFunc currently fabricates errors and
calls require.NoError; change those to direct test failures using t.Fatalf: when
tokenStatsKey is missing, call t.Fatalf("missing oauth access tokens count
stats, can't continue the test"); when tokenCount < 14000, call
t.Fatalf("expected to create at least 14000 tokens but %d were created",
tokenCount). Update the block in AssertDBPopulatedFunc (which reads tokenCount,
ok := statStore[tokenStatsKey]) to remove the temporary err variables and use
t.Fatalf with the appropriate messages.
- Around line 81-97: The returned anonymous function in createAccessTokenWrapper
currently declares a parameter named namespace that is never used; update the
signature of the returned func (in createAccessTokenWrapper) to replace the
unused namespace parameter with _ (e.g., func(_ kubernetes.Interface, _ string,
errorCollector func(error), statsCollector func(string)) error) so it's explicit
the namespace is intentionally ignored while keeping the rest of the behavior
the same.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: be01b227-071d-425e-b47e-7ae44ef825ce

📥 Commits

Reviewing files that changed from the base of the PR and between 626f7d7 and aa529ed.

⛔ Files ignored due to path filters (11)
  • go.sum is excluded by !**/*.sum
  • vendor/k8s.io/klog/v2/README.md is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues_no_slog.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues_slog.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/klog.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/klogr.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/klogr_slog.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/textlogger/options.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/textlogger/textlogger.go is excluded by !**/vendor/**, !vendor/**
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (22)
  • cmd/cluster-authentication-operator-tests-ext/main.go
  • go.mod
  • pkg/controllers/readiness/wellknown_ready_controller.go
  • test/e2e-encryption-kms/encryption_kms.go
  • test/e2e-encryption-kms/encryption_kms_test.go
  • test/e2e-encryption-kms/main_test.go
  • test/e2e-encryption-perf/encryption_perf.go
  • test/e2e-encryption-perf/encryption_perf_test.go
  • test/e2e-encryption-rotation/e2e-encryption-rotation_test.go
  • test/e2e-encryption-rotation/encryption_rotation.go
  • test/e2e-encryption/encryption.go
  • test/e2e-encryption/encryption_test.go
  • test/e2e-encryption/main_test.go
  • test/e2e-oidc/external_oidc.go
  • test/e2e-oidc/external_oidc_test.go
  • test/library/client.go
  • test/library/encryption/helpers.go
  • test/library/encryption/perf_helpers.go
  • test/library/encryption/scenarios.go
  • test/library/encryption_wrappers.go
  • test/library/featuregates.go
  • test/library/waits.go
💤 Files with no reviewable changes (2)
  • test/e2e-encryption-kms/main_test.go
  • test/e2e-encryption/main_test.go
✅ Files skipped from review due to trivial changes (1)
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (6)
  • test/library/encryption_wrappers.go
  • test/library/featuregates.go
  • cmd/cluster-authentication-operator-tests-ext/main.go
  • test/e2e-encryption-kms/encryption_kms_test.go
  • test/e2e-encryption/encryption.go
  • test/library/waits.go

Comment thread test/e2e-oidc/external_oidc.go Outdated
Comment thread test/e2e-oidc/external_oidc.go Outdated
Comment thread test/e2e-oidc/external_oidc.go
@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch from aa529ed to 805bcb9 Compare April 28, 2026 05:57

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (7)
test/e2e-encryption-kms/encryption_kms.go (2)

48-53: Typo: "techincal" should be "technical".

Line 52 contains a spelling error in the comment.

📝 Fix typo
 	// TODO: Remove this skip once the authentication operator fully supports KMS encryption.
 	// Currently, while the API accepts encryption.type: "KMS" and the operator mounts the KMS
 	// plugin socket, it does not generate the EncryptionConfiguration with KMS provider stanza.
 	// This causes tests to timeout waiting for encryption keys to be created and migration to complete.
-	// See: https://redhat.atlassian.net/browse/OCPSTRAT-108 it is still under techincal preview
+	// See: https://redhat.atlassian.net/browse/OCPSTRAT-108 it is still under technical preview
 	t.Skip("Skipping KMS encryption test: operator implementation is incomplete")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-encryption-kms/encryption_kms.go` around lines 48 - 53, Fix the
spelling mistake in the comment above the t.Skip call: change "techincal" to
"technical" in the comment that references the operator being under technical
preview (the comment block immediately preceding the t.Skip("Skipping KMS
encryption test: operator implementation is incomplete") line).

95-100: Same typo: "techincal" should be "technical".

📝 Fix typo
 	// TODO: Remove this skip once the authentication operator fully supports KMS encryption.
 	// Currently, while the API accepts encryption.type: "KMS" and the operator mounts the KMS
 	// plugin socket, it does not generate the EncryptionConfiguration with KMS provider stanza.
 	// This causes tests to timeout waiting for encryption keys to be created and migration to complete.
-	// See: https://redhat.atlassian.net/browse/OCPSTRAT-108 it is still under techincal preview
+	// See: https://redhat.atlassian.net/browse/OCPSTRAT-108 it is still under technical preview
 	t.Skip("Skipping KMS encryption test: operator implementation is incomplete")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-encryption-kms/encryption_kms.go` around lines 95 - 100, Fix the
typo in the block comment above the t.Skip call: change "techincal" to
"technical" in the sentence referencing the technical preview and the Jira
ticket so the comment reads correctly; the change is in the comment surrounding
the t.Skip("Skipping KMS encryption test: operator implementation is
incomplete") statement in encryption_kms.go.
test/library/encryption/perf_helpers.go (1)

118-120: Misleading log message when workers encounter errors.

The log message "All workers completed successfully" is printed regardless of whether workers encountered errors. Since errors are collected in r.errorStore rather than causing the function to fail, this message can be misleading during debugging.

📝 Suggested improvement
 	r.wg.Wait()
-	t.Log("All workers completed successfully")
+	if len(r.errorStore) > 0 {
+		t.Logf("All workers completed with %d unique error types", len(r.errorStore))
+	} else {
+		t.Log("All workers completed successfully")
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/library/encryption/perf_helpers.go` around lines 118 - 120, The current
code always logs "All workers completed successfully" after r.wg.Wait() which is
misleading because worker errors are stored in r.errorStore; update the
post-wait logic to inspect r.errorStore (e.g., check its error count or
equivalent method on the runner) and only emit the success message when it is
empty, otherwise log or fail the test with the collected errors (use
t.Logf/t.Fatalf and include the contents of r.errorStore) so the message
accurately reflects worker outcomes.
test/e2e-encryption-rotation/encryption_rotation.go (1)

76-76: Use the predefined constant configv1.EncryptionTypeAESCBC instead of string cast.

Line 76 uses configv1.EncryptionType("aescbc"), which works but should use the constant configv1.EncryptionTypeAESCBC defined in the API types. This is consistent with how the constant is used throughout the encryption test library and aligns with the codebase pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-encryption-rotation/encryption_rotation.go` at line 76, Replace the
string-cast usage configv1.EncryptionType("aescbc") with the predefined constant
configv1.EncryptionTypeAESCBC in the EncryptionProvider field; locate the
assignment in encryption_rotation.go (the EncryptionProvider assignment) and
swap the value to configv1.EncryptionTypeAESCBC so the test uses the API
constant consistently with other tests.
test/e2e-oidc/external_oidc.go (3)

950-969: Consider cleaning up test-created Keycloak resources.

The test creates a group and user in Keycloak but doesn't explicitly clean them up. If the Keycloak instance persists across test runs, these resources accumulate. If the IDP cleanup (test.AddKeycloakIDP cleanup handlers at line 127) removes the entire Keycloak deployment, this is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-oidc/external_oidc.go` around lines 950 - 969, The test creates
Keycloak resources (group and user via kcClient.CreateGroup and
kcClient.CreateUser assigned to variables group and user) but never explicitly
deletes them; add cleanup logic that calls the appropriate
kcClient.DeleteGroup(group) and kcClient.DeleteUser(user) (or equivalent
deletion methods) in a deferred or test cleanup handler after creation, or
ensure these deletions are registered with the existing test.AddKeycloakIDP
cleanup handlers so the group and user are removed when the Keycloak IDP is torn
down.

390-439: Use context-aware HTTP requests for cancellation support.

The client.Get() calls don't support context cancellation. Consider accepting a context parameter and using http.NewRequestWithContext with client.Do() instead. This allows the requests to be cancelled if the test context times out.

♻️ Proposed fix
-func fetchIssuerJWKS(issuerURL string) (*jwks, error) {
+func fetchIssuerJWKS(ctx context.Context, issuerURL string) (*jwks, error) {
 	client := &http.Client{
 		Timeout: 30 * time.Second,
 		Transport: &http.Transport{
 			TLSClientConfig: &tls.Config{
 				InsecureSkipVerify: true,
 				MinVersion:         tls.VersionTLS12,
 			},
 		},
 	}

 	// grab openid-configuration JSON which contains the URL of the provider's JWKS
-	resp, err := client.Get(issuerURL + "/.well-known/openid-configuration")
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, issuerURL+"/.well-known/openid-configuration", nil)
+	if err != nil {
+		return nil, fmt.Errorf("could not create request for well-known config: %v", err)
+	}
+	resp, err := client.Do(req)
 	if err != nil {
 		return nil, fmt.Errorf("could not get issuer OpenID well-known configuration: %v", err)
 	}
 	defer resp.Body.Close()
 	// ... similar change for the second request at line 422
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-oidc/external_oidc.go` around lines 390 - 439, fetchIssuerJWKS
currently uses client.Get which doesn't support cancellation; change its
signature to accept ctx context.Context (e.g., fetchIssuerJWKS(ctx
context.Context, issuerURL string) (*jwks, error)) and replace both client.Get
calls with context-aware requests by creating req :=
http.NewRequestWithContext(ctx, "GET",
issuerURL+"/.well-known/openid-configuration", nil) and req =
http.NewRequestWithContext(ctx, "GET", oidcConfig.JwksURL, nil) and use
client.Do(req) for each, preserving existing error handling and defer
resp.Body.Close().

690-716: String-based JSON comparison is fragile.

The exact string comparison of auth-config.json will break if field ordering changes in the operator's JSON marshaling (though Go's json.Marshal is currently deterministic by struct field order). Consider unmarshaling both and using semantic comparison via equality.Semantic.DeepEqual for more robust assertions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-oidc/external_oidc.go` around lines 690 - 716, The test currently
compares auth-config.json via exact string equality (expectedAuthConfigJSON vs
actualCM.Data["auth-config.json"]) which is fragile; instead unmarshal both the
expectedAuthConfigJSON and actualCM.Data["auth-config.json"] into generic JSON
objects (e.g., map[string]interface{} or interface{}) and use
k8s.io/apimachinery/pkg/api/equality.Semantic.DeepEqual to assert semantic
equality; update the assertion after fetching actualCM (in the loop that calls
tc.kubeClient.CoreV1().ConfigMaps(...).Get) to unmarshal both JSON strings,
check errors with require.NoError, and require.Truef(t,
equality.Semantic.DeepEqual(expectedObj, actualObj), "unexpected
auth-config.json contents in %s/%s: diff=%v", actualCM.Namespace, actualCM.Name,
cmp.Diff(expectedObj, actualObj)) so field ordering changes won’t break the
test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e-encryption-kms/encryption_kms.go`:
- Around line 48-53: Fix the spelling mistake in the comment above the t.Skip
call: change "techincal" to "technical" in the comment that references the
operator being under technical preview (the comment block immediately preceding
the t.Skip("Skipping KMS encryption test: operator implementation is
incomplete") line).
- Around line 95-100: Fix the typo in the block comment above the t.Skip call:
change "techincal" to "technical" in the sentence referencing the technical
preview and the Jira ticket so the comment reads correctly; the change is in the
comment surrounding the t.Skip("Skipping KMS encryption test: operator
implementation is incomplete") statement in encryption_kms.go.

In `@test/e2e-encryption-rotation/encryption_rotation.go`:
- Line 76: Replace the string-cast usage configv1.EncryptionType("aescbc") with
the predefined constant configv1.EncryptionTypeAESCBC in the EncryptionProvider
field; locate the assignment in encryption_rotation.go (the EncryptionProvider
assignment) and swap the value to configv1.EncryptionTypeAESCBC so the test uses
the API constant consistently with other tests.

In `@test/e2e-oidc/external_oidc.go`:
- Around line 950-969: The test creates Keycloak resources (group and user via
kcClient.CreateGroup and kcClient.CreateUser assigned to variables group and
user) but never explicitly deletes them; add cleanup logic that calls the
appropriate kcClient.DeleteGroup(group) and kcClient.DeleteUser(user) (or
equivalent deletion methods) in a deferred or test cleanup handler after
creation, or ensure these deletions are registered with the existing
test.AddKeycloakIDP cleanup handlers so the group and user are removed when the
Keycloak IDP is torn down.
- Around line 390-439: fetchIssuerJWKS currently uses client.Get which doesn't
support cancellation; change its signature to accept ctx context.Context (e.g.,
fetchIssuerJWKS(ctx context.Context, issuerURL string) (*jwks, error)) and
replace both client.Get calls with context-aware requests by creating req :=
http.NewRequestWithContext(ctx, "GET",
issuerURL+"/.well-known/openid-configuration", nil) and req =
http.NewRequestWithContext(ctx, "GET", oidcConfig.JwksURL, nil) and use
client.Do(req) for each, preserving existing error handling and defer
resp.Body.Close().
- Around line 690-716: The test currently compares auth-config.json via exact
string equality (expectedAuthConfigJSON vs actualCM.Data["auth-config.json"])
which is fragile; instead unmarshal both the expectedAuthConfigJSON and
actualCM.Data["auth-config.json"] into generic JSON objects (e.g.,
map[string]interface{} or interface{}) and use
k8s.io/apimachinery/pkg/api/equality.Semantic.DeepEqual to assert semantic
equality; update the assertion after fetching actualCM (in the loop that calls
tc.kubeClient.CoreV1().ConfigMaps(...).Get) to unmarshal both JSON strings,
check errors with require.NoError, and require.Truef(t,
equality.Semantic.DeepEqual(expectedObj, actualObj), "unexpected
auth-config.json contents in %s/%s: diff=%v", actualCM.Namespace, actualCM.Name,
cmp.Diff(expectedObj, actualObj)) so field ordering changes won’t break the
test.

In `@test/library/encryption/perf_helpers.go`:
- Around line 118-120: The current code always logs "All workers completed
successfully" after r.wg.Wait() which is misleading because worker errors are
stored in r.errorStore; update the post-wait logic to inspect r.errorStore
(e.g., check its error count or equivalent method on the runner) and only emit
the success message when it is empty, otherwise log or fail the test with the
collected errors (use t.Logf/t.Fatalf and include the contents of r.errorStore)
so the message accurately reflects worker outcomes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b29dbfbd-1513-45ba-a96f-a0e9944a0f1f

📥 Commits

Reviewing files that changed from the base of the PR and between aa529ed and 805bcb9.

⛔ Files ignored due to path filters (11)
  • go.sum is excluded by !**/*.sum
  • vendor/k8s.io/klog/v2/README.md is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues_no_slog.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/internal/serialize/keyvalues_slog.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/klog.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/klogr.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/klogr_slog.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/textlogger/options.go is excluded by !**/vendor/**, !vendor/**
  • vendor/k8s.io/klog/v2/textlogger/textlogger.go is excluded by !**/vendor/**, !vendor/**
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (22)
  • cmd/cluster-authentication-operator-tests-ext/main.go
  • go.mod
  • pkg/controllers/readiness/wellknown_ready_controller.go
  • test/e2e-encryption-kms/encryption_kms.go
  • test/e2e-encryption-kms/encryption_kms_test.go
  • test/e2e-encryption-kms/main_test.go
  • test/e2e-encryption-perf/encryption_perf.go
  • test/e2e-encryption-perf/encryption_perf_test.go
  • test/e2e-encryption-rotation/e2e-encryption-rotation_test.go
  • test/e2e-encryption-rotation/encryption_rotation.go
  • test/e2e-encryption/encryption.go
  • test/e2e-encryption/encryption_test.go
  • test/e2e-encryption/main_test.go
  • test/e2e-oidc/external_oidc.go
  • test/e2e-oidc/external_oidc_test.go
  • test/library/client.go
  • test/library/encryption/helpers.go
  • test/library/encryption/perf_helpers.go
  • test/library/encryption/scenarios.go
  • test/library/encryption_wrappers.go
  • test/library/featuregates.go
  • test/library/waits.go
💤 Files with no reviewable changes (2)
  • test/e2e-encryption/main_test.go
  • test/e2e-encryption-kms/main_test.go
✅ Files skipped from review due to trivial changes (2)
  • go.mod
  • test/library/encryption/helpers.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • test/library/featuregates.go
  • test/library/client.go
  • test/e2e-encryption/encryption.go
  • pkg/controllers/readiness/wellknown_ready_controller.go
  • test/e2e-encryption/encryption_test.go
  • test/library/waits.go
  • test/e2e-encryption-perf/encryption_perf.go
  • test/library/encryption_wrappers.go
  • test/library/encryption/scenarios.go

@coderabbitai

coderabbitai Bot commented Apr 28, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the current CI status for PR `#859`
gh pr view 859 --repo openshift/cluster-authentication-operator --json statusCheckRollup,checks 2>/dev/null || \
gh pr checks 859 --repo openshift/cluster-authentication-operator 2>/dev/null

Unrecoverable error: Failed to clone repository


🏁 Script executed:

#!/bin/bash
# Get the current CI status for PR `#859` - no repo clone needed
gh pr checks 859 --repo openshift/cluster-authentication-operator 2>&1 | head -80

Length of output: 10050


Let me fetch the logs from the failing jobs to understand the root cause.


🏁 Script executed:

#!/bin/bash
# Try to fetch finished.json and junit/build logs from GCS for failing jobs
BASE="https://storage.googleapis.com/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859"

JOBS=(
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-upgrade/2049005550200426496"
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-encryption-perf-serial-ote-1of2/2049005556902924288"
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-encryption-serial-ote-2of2/2049005571230666752"
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-parallel-ote/2049005573680140288"
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-serial-ote/2049005575357861888"
)

for job in "${JOBS[@]}"; do
  echo "=============================="
  echo "JOB: $job"
  echo "=============================="
  url="$BASE/$job/finished.json"
  curl -sf "$url" 2>/dev/null || echo "Could not fetch finished.json"
  echo ""
done

Length of output: 3332


🏁 Script executed:

#!/bin/bash
BASE="https://storage.googleapis.com/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859"

JOBS=(
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-upgrade/2049005550200426496"
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-encryption-perf-serial-ote-1of2/2049005556902924288"
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-encryption-serial-ote-2of2/2049005571230666752"
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-parallel-ote/2049005573680140288"
  "pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-serial-ote/2049005575357861888"
)

for job in "${JOBS[@]}"; do
  echo "=============================="
  echo "JOB: $job"
  echo "=============================="
  # Try to get the last 100 lines of build-log.txt
  curl -sf "$BASE/$job/build-log.txt" 2>/dev/null | tail -100
  echo ""
done

Length of output: 50284


@ropatil010: I analyzed the logs for all 5 failing CI jobs. None of the failures appear to be caused by the code changes in this PR. Here's the breakdown:


1. e2e-agnostic-upgrade — ❌ MonitorTest failure

  • Failed with "failed due to a MonitorTest failure" (only 3 failures, threshold: 10).
  • This is a broad platform upgrade test unrelated to the encryption/auth changes in this PR.
  • The failure is at the infrastructure monitoring level, not test code.

2. e2e-aws-operator-encryption-perf-serial-ote-1of2 — ❌ Job timeout (not test failure)

  • The actual test step passed: "Step openshift-e2e-test succeeded after 28m40s"
  • The job failed because the gather-must-gather post-collection step hit the 2-hour overall job timeout: "Process did not finish before 2h0m0s timeout" → reason: executing_graph:interrupted.
  • This is a CI infrastructure timeout, completely unrelated to your PR changes. Note that the counterpart 2of2 job passed.

3. e2e-aws-operator-encryption-serial-ote-2of2 — ❌ MonitorTest failure

  • Failed with "failed due to a MonitorTest failure" (only 3 failures, threshold: 10).
  • Infrastructure-level monitoring failure (availability/invariant monitors), not test code.
  • Note that the counterpart 1of2 job passed.

4. e2e-aws-operator-parallel-ote — ❌ MonitorTest failure

  • Failed with "failed due to a MonitorTest failure" (only 2 failures, threshold: 10).
  • Same pattern — infrastructure monitors, not test code.

5. e2e-aws-operator-serial-ote — ❌ MonitorTest mass failure

  • Failed because 12 Monitor test failures exceeded the threshold of 10: "Mass test failures detected: 12 failures (threshold: 10)".
  • The failures are all infrastructure monitors (pod-lifecycle, image-registry-availability, legacy-storage-invariants, legacy-test-framework-invariants-pathological).
  • None of the 3 actual operator tests failed (3 pass, 0 flaky, 0 skip).

Summary

Job Root Cause Related to PR?
e2e-agnostic-upgrade MonitorTest failure (3/10) ❌ No
e2e-aws-operator-encryption-perf-serial-ote-1of2 Post-phase job timeout (test itself passed!) ❌ No
e2e-aws-operator-encryption-serial-ote-2of2 MonitorTest failure (3/10) ❌ No
e2e-aws-operator-parallel-ote MonitorTest failure (2/10) ❌ No
e2e-aws-operator-serial-ote Mass MonitorTest failure (12/10) ❌ No

All failures are infrastructure-level MonitorTest violations or a CI job timeout during artifact collection — not caused by the code changes in this PR. Notably, all GCP-based encryption jobs passed, the actual test steps in failing jobs either passed or weren't reached, and the original "invalid character 'W'" JSON parsing issue fixed by this PR is not present in any of the failures.

You can safely request a re-run of the failing jobs with /retest.

@ropatil010

Copy link
Copy Markdown
Contributor Author

Failure logs are not wrt the PR. Here the tc execution list

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859/pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-serial-ote/2049005575357861888/artifacts/e2e-aws-operator-serial-ote/openshift-e2e-test/artifacts/e2e.log
         
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859/pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-parallel-ote/2049005573680140288/artifacts/e2e-aws-operator-parallel-ote/openshift-e2e-test/artifacts/e2e.log
         
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859/pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-encryption-serial-ote-2of2/2049005571230666752/artifacts/e2e-aws-operator-encryption-serial-ote/openshift-e2e-test/artifacts/e2e.log
         
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859/pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-operator-encryption-perf-serial-ote-1of2/2049005556902924288/artifacts/e2e-aws-operator-encryption-perf-serial-ote/openshift-e2e-test/artifacts/e2e.log

PASS logs for profile: e2e-agnostic-ipv6

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-authentication-operator/859/pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-ipv6/2049008063356080128

The dependent PR is merged: openshift/origin#31009

/verified by CI results
@liouk Can you PTAL on this PR which is waiting from long time. Rebased this PR more than 3 times.

@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Jun 9, 2026
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2026
@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci Bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2026
@ropatil010 ropatil010 reopened this Jun 9, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@ropatil010: This pull request references CNTRLPLANE-2589 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Hi Team,

This PR adds security and context management improvements to the recently migrated e2e test suites, addressing code review feedback.

Changes from PR #839:

  • Migrate e2e-encryption, e2e-encryption-kms, e2e-encryption-perf, e2e-encryption-rotation, and e2e-oidc test suites to Ginkgo v2
  • Add testing.TB compatibility layer for library-go integration
  • Create new test implementation files with Ginkgo v2 specs
  • Configure appropriate test suite parallelism and qualifiers
  • Remove obsolete main_test.go files

Security Fixes

  1. TLS Certificate Validation (test/e2e-oidc/external_oidc.go)
  • Fixed MITM vulnerability by removing InsecureSkipVerify=true
  • Added proper CA certificate validation using x509.CertPool with trusted CA bundle
  1. Information Disclosure (2 files)
  • Redacted internal Kubernetes API hostnames from test logs
  • Fixed in test/library/client.go and test/library/encryption/helpers.go

Context Management Improvements

  1. Timeout Protection (7 locations)
  • Replaced context.TODO() with proper time-bounded contexts:
    • e2e-encryption-perf: 30-minute timeout
    • e2e-encryption-rotation: 3-hour timeout
    • e2e-encryption: 2-minute timeout for token creation
  1. Goroutine Cleanup (test/library/encryption/perf_helpers.go)
  • Fixed migration watcher goroutine to stop with test completion
  • Added cancelable context with proper cleanup
  1. Keycloak Client Refactor (18 methods in test/library/keycloakidp.go)
  • Updated all methods to accept context.Context for cancellation/timeout support
  • Modified do() method to use http.NewRequestWithContext()
  • Updated all callers in test files

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 9, 2026
@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch 3 times, most recently from 744b2c3 to c0f4b27 Compare June 9, 2026 15:06
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/library/waits.go (1)

171-190: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat an empty NodeStatuses list as a completed rollout.

If kas.Status.NodeStatuses is empty, the loop never runs and this returns success as soon as LatestAvailableRevision != origRevision. That lets callers proceed before any apiserver instance has actually reported the new revision.

Suggested fix
 	err := wait.PollUntilContextTimeout(ctx, 10*time.Second, 30*time.Minute, true, func(ctx context.Context) (bool, error) {
 		kas, err := kasClient.Get(ctx, "cluster", metav1.GetOptions{})
 		if err != nil {
 			t.Logf("kubeapiserver/cluster error: %v", err)
 			return false, nil
 		}
+
+		if kas.Status.LatestAvailableRevision == origRevision {
+			return false, nil
+		}
+		if len(kas.Status.NodeStatuses) == 0 {
+			return false, nil
+		}
 
 		for _, nodeStatus := range kas.Status.NodeStatuses {
-			if kas.Status.LatestAvailableRevision == origRevision {
-				return false, nil
-			}
-
 			if nodeStatus.CurrentRevision != kas.Status.LatestAvailableRevision {
 				return false, nil
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/library/waits.go` around lines 171 - 190, The callback passed to
wait.PollUntilContextTimeout (the anonymous func calling kasClient.Get)
currently treats an empty kas.Status.NodeStatuses as success when
LatestAvailableRevision != origRevision; change the callback to first check if
len(kas.Status.NodeStatuses) == 0 and if so return false, nil (not success) so
we wait until at least one apiserver reports; keep the existing checks for
LatestAvailableRevision vs origRevision and per-node CurrentRevision vs
LatestAvailableRevision and still set latestRevision =
nodeStatus.CurrentRevision when iterating.
♻️ Duplicate comments (2)
test/library/keycloakidp.go (1)

246-285: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Authentication requests can still outlive the test context.

KeycloakClientFor still creates an http.Client without a timeout, and AuthenticatePassword still uses http.NewRequest instead of a request-scoped context. The retry loops in AddKeycloakIDP depend on this path, so a stalled Keycloak connection can ignore poll cancellation and hang until the job timeout. Thread ctx into auth and give the client a timeout.

As per coding guidelines, **/*.go: Go security (prodsec-skills): context.Context for cancellation and timeouts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/library/keycloakidp.go` around lines 246 - 285, KeycloakClientFor
creates an http.Client with no timeout and AuthenticatePassword builds requests
without a context, allowing auth calls to outlive/circumvent caller cancellation
(e.g., the AddKeycloakIDP retry/poll). Fix by threading a context.Context into
AuthenticatePassword (signature change: AuthenticatePassword(ctx
context.Context, ...)), use http.NewRequestWithContext(ctx, ...) and ensure
callers (including AddKeycloakIDP and its retry loop) pass their ctx through;
also set a sensible timeout on the client created in KeycloakClientFor (or
accept a client/timeout param) so in-flight connections are bounded by
caller/cancelation. Ensure TokenURL() and KeycloakClientFor references remain
unchanged except for wiring the ctx and timeout.

Source: Coding guidelines

test/e2e-encryption-rotation/encryption_rotation.go (1)

53-78: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle Authentication operator update conflicts with retry.RetryOnConflict.

This Get → mutate → Update path can fail on resourceVersion conflicts and make this serial rotation test flaky when the operator updates the same object concurrently.

Suggested patch
 import (
 	"context"
 	"encoding/json"
 	"testing"
 	"time"
@@
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
 	"k8s.io/apimachinery/pkg/runtime"
+	"k8s.io/client-go/util/retry"
@@
 		ForceRotationFunc: library.StaticEncryptionForceRotation(func(rawUnsupportedEncryptionCfg []byte) error {
 			cs := operatorencryption.GetClients(t)
-			authOperator, err := cs.OperatorClient.Get(ctx, "cluster", metav1.GetOptions{})
-			if err != nil {
-				return err
-			}
-
-			unsupportedConfigAsMap := map[string]interface{}{}
-			if len(authOperator.Spec.UnsupportedConfigOverrides.Raw) > 0 {
-				if err := json.Unmarshal(authOperator.Spec.UnsupportedConfigOverrides.Raw, &unsupportedConfigAsMap); err != nil {
-					return err
-				}
-			}
-			unsupportedEncryptionConfigAsMap := map[string]interface{}{}
-			if err := json.Unmarshal(rawUnsupportedEncryptionCfg, &unsupportedEncryptionConfigAsMap); err != nil {
-				return err
-			}
-			if err := unstructured.SetNestedMap(unsupportedConfigAsMap, unsupportedEncryptionConfigAsMap, oauthapiconfigobservercontroller.OAuthAPIServerConfigPrefix); err != nil {
-				return err
-			}
-			rawUnsupportedCfg, err := json.Marshal(unsupportedConfigAsMap)
-			if err != nil {
-				return err
-			}
-			authOperator.Spec.UnsupportedConfigOverrides.Raw = rawUnsupportedCfg
-
-			_, err = cs.OperatorClient.Update(ctx, authOperator, metav1.UpdateOptions{})
-			return err
+			return retry.RetryOnConflict(retry.DefaultRetry, func() error {
+				authOperator, err := cs.OperatorClient.Get(ctx, "cluster", metav1.GetOptions{})
+				if err != nil {
+					return err
+				}
+
+				unsupportedConfigAsMap := map[string]interface{}{}
+				if len(authOperator.Spec.UnsupportedConfigOverrides.Raw) > 0 {
+					if err := json.Unmarshal(authOperator.Spec.UnsupportedConfigOverrides.Raw, &unsupportedConfigAsMap); err != nil {
+						return err
+					}
+				}
+				unsupportedEncryptionConfigAsMap := map[string]interface{}{}
+				if err := json.Unmarshal(rawUnsupportedEncryptionCfg, &unsupportedEncryptionConfigAsMap); err != nil {
+					return err
+				}
+				if err := unstructured.SetNestedMap(unsupportedConfigAsMap, unsupportedEncryptionConfigAsMap, oauthapiconfigobservercontroller.OAuthAPIServerConfigPrefix); err != nil {
+					return err
+				}
+				rawUnsupportedCfg, err := json.Marshal(unsupportedConfigAsMap)
+				if err != nil {
+					return err
+				}
+				authOperator.Spec.UnsupportedConfigOverrides.Raw = rawUnsupportedCfg
+
+				_, err = cs.OperatorClient.Update(ctx, authOperator, metav1.UpdateOptions{})
+				return err
+			})
 		}),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-encryption-rotation/encryption_rotation.go` around lines 53 - 78,
The Get→mutate→Update sequence on the Authentication operator
(cs.OperatorClient.Get / authOperator / cs.OperatorClient.Update) can fail with
resourceVersion conflicts; wrap the whole read-modify-update in a
retry.RetryOnConflict(retry.DefaultRetry, func() error { ... }) loop that
re-reads the authOperator inside the closure, applies the same JSON
Unmarshal/SetNestedMap/Marshal mutation to
authOperator.Spec.UnsupportedConfigOverrides.Raw, and then calls Update; return
the Update error from the closure and return the retry wrapper result from the
calling function so conflicts are retried safely.
🧹 Nitpick comments (5)
test/library/waits.go (1)

61-70: 🏗️ Heavy lift

Pass a caller context through this wait chain.

WaitForClusterOperatorStatus still hardcodes context.TODO(), so WaitForOperatorToPickUpChanges and the other wrappers can keep polling for the full 10 minutes after the enclosing test has already been canceled. Thread ctx through this helper family and use it in both PollUntilContextTimeout and CheckClusterOperatorStatus.

As per coding guidelines, **/*.go: Go security (prodsec-skills): context.Context for cancellation and timeouts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/library/waits.go` around lines 61 - 70, WaitForClusterOperatorStatus
currently uses context.TODO() causing callers like
WaitForOperatorToPickUpChanges to be unable to cancel the wait; change the
signature of WaitForClusterOperatorStatus to accept a context.Context parameter,
thread that ctx through the call to wait.PollUntilContextTimeout instead of
context.TODO(), and pass the same ctx into CheckClusterOperatorStatus (and
update any callers such as WaitForOperatorToPickUpChanges to forward their ctx);
ensure all references to PollUntilContextTimeout and CheckClusterOperatorStatus
use the passed ctx so the wait honors caller cancellation.

Source: Coding guidelines

test/library/client.go (1)

168-171: ⚡ Quick win

Accept a caller context for namespace creation.

This helper hardcodes context.Background(), even though callers already track test contexts (test/library/idpdeployment.go:50-59). If the API call stalls, namespace creation ignores suite cancellation. Adding ctx context.Context here keeps the new exported builder consistent with the rest of the context-aware helpers.

As per coding guidelines, **/*.go: Go security (prodsec-skills): context.Context for cancellation and timeouts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/library/client.go` around lines 168 - 171, Change
TestNamespaceBuilder.Create to accept a caller context (add ctx context.Context
to the signature of Create) and pass that ctx into kubeClient.Create instead of
using context.Background(); update the callsite(s) that use Create (e.g., in
idpdeployment helpers) to forward their test context, and ensure the file
imports context if not already present and the function name
TestNamespaceBuilder.Create and the kubeClient.Create(...) invocation are
updated accordingly.

Source: Coding guidelines

test/e2e-encryption-perf/encryption_perf.go (1)

79-79: ⚡ Quick win

Prefer configv1.EncryptionTypeAESCBC over string casting.

Line 79 should use the typed constant for stronger API compatibility and typo safety.

Proposed change
-			APIServerEncryption: configv1.APIServerEncryption{Type: configv1.EncryptionType("aescbc")},
+			APIServerEncryption: configv1.APIServerEncryption{Type: configv1.EncryptionTypeAESCBC},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-encryption-perf/encryption_perf.go` at line 79, Replace the
string-cast encryption type with the typed constant: in the APIServerEncryption
struct (symbol APIServerEncryption / type configv1.APIServerEncryption) change
usage of configv1.EncryptionType("aescbc") to the exported constant
configv1.EncryptionTypeAESCBC to ensure type safety and avoid typos.
test/e2e-encryption/encryption.go (1)

76-76: ⚡ Quick win

Use the typed encryption constant instead of a raw string cast.

Line 76 should prefer the API constant to avoid typo drift and keep enum usage explicit.

Proposed change
-			APIServerEncryption: configv1.APIServerEncryption{Type: configv1.EncryptionType("aescbc")},
+			APIServerEncryption: configv1.APIServerEncryption{Type: configv1.EncryptionTypeAESCBC},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-encryption/encryption.go` at line 76, Replace the raw string cast
used for APIServerEncryption (configv1.APIServerEncryption{Type:
configv1.EncryptionType("aescbc")}) with the typed enum constant (e.g.
configv1.AESCBC) to avoid typo drift; update the value to
configv1.APIServerEncryption{Type: configv1.AESCBC} so the code uses the
declared constant instead of a literal string cast.
test/e2e-encryption-rotation/encryption_rotation.go (1)

49-49: ⚡ Quick win

Use the typed AES-CBC constant instead of a string cast.

configv1.EncryptionTypeAESCBC is clearer and avoids literal drift.

Suggested patch
-			APIServerEncryption: configv1.APIServerEncryption{Type: configv1.EncryptionType("aescbc")},
+			APIServerEncryption: configv1.APIServerEncryption{Type: configv1.EncryptionTypeAESCBC},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-encryption-rotation/encryption_rotation.go` at line 49, Replace the
string-cast encryption type with the typed constant: in the APIServerEncryption
initialization (configv1.APIServerEncryption) use configv1.EncryptionTypeAESCBC
instead of configv1.EncryptionType("aescbc") so the field reads
APIServerEncryption: configv1.APIServerEncryption{Type:
configv1.EncryptionTypeAESCBC}.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-oidc/external_oidc.go`:
- Around line 396-431: Change fetchIssuerJWKS to accept a context.Context (e.g.,
func fetchIssuerJWKS(ctx context.Context, issuerURL string) ...) and replace the
two client.Get calls with context-aware requests: create
http.NewRequestWithContext(ctx, "GET",
issuerURL+"/.well-known/openid-configuration", nil) and use client.Do(req) (and
similarly for oidcConfig.JwksURL). Ensure error handling and resp.Body.Close
remain unchanged and propagate ctx from callers like testOIDCAuthentication.

In `@test/library/keycloakidp.go`:
- Around line 703-716: The loop in DeleteUserFromGroups keeps each DELETE
response body open by deferring resp.Body.Close() inside the loop; change this
to close resp.Body explicitly at the end of each iteration (and before any early
return) so each resp from kc.do is closed immediately after checking StatusCode
and error handling; locate the DeleteUserFromGroups function, remove or replace
the in-loop defer resp.Body.Close() and call resp.Body.Close() just after
handling non-nil err and again immediately before returning an error for non-204
responses (and after successful iterations) when using kc.do and userGroupsURL.

---

Outside diff comments:
In `@test/library/waits.go`:
- Around line 171-190: The callback passed to wait.PollUntilContextTimeout (the
anonymous func calling kasClient.Get) currently treats an empty
kas.Status.NodeStatuses as success when LatestAvailableRevision != origRevision;
change the callback to first check if len(kas.Status.NodeStatuses) == 0 and if
so return false, nil (not success) so we wait until at least one apiserver
reports; keep the existing checks for LatestAvailableRevision vs origRevision
and per-node CurrentRevision vs LatestAvailableRevision and still set
latestRevision = nodeStatus.CurrentRevision when iterating.

---

Duplicate comments:
In `@test/e2e-encryption-rotation/encryption_rotation.go`:
- Around line 53-78: The Get→mutate→Update sequence on the Authentication
operator (cs.OperatorClient.Get / authOperator / cs.OperatorClient.Update) can
fail with resourceVersion conflicts; wrap the whole read-modify-update in a
retry.RetryOnConflict(retry.DefaultRetry, func() error { ... }) loop that
re-reads the authOperator inside the closure, applies the same JSON
Unmarshal/SetNestedMap/Marshal mutation to
authOperator.Spec.UnsupportedConfigOverrides.Raw, and then calls Update; return
the Update error from the closure and return the retry wrapper result from the
calling function so conflicts are retried safely.

In `@test/library/keycloakidp.go`:
- Around line 246-285: KeycloakClientFor creates an http.Client with no timeout
and AuthenticatePassword builds requests without a context, allowing auth calls
to outlive/circumvent caller cancellation (e.g., the AddKeycloakIDP retry/poll).
Fix by threading a context.Context into AuthenticatePassword (signature change:
AuthenticatePassword(ctx context.Context, ...)), use
http.NewRequestWithContext(ctx, ...) and ensure callers (including
AddKeycloakIDP and its retry loop) pass their ctx through; also set a sensible
timeout on the client created in KeycloakClientFor (or accept a client/timeout
param) so in-flight connections are bounded by caller/cancelation. Ensure
TokenURL() and KeycloakClientFor references remain unchanged except for wiring
the ctx and timeout.

---

Nitpick comments:
In `@test/e2e-encryption-perf/encryption_perf.go`:
- Line 79: Replace the string-cast encryption type with the typed constant: in
the APIServerEncryption struct (symbol APIServerEncryption / type
configv1.APIServerEncryption) change usage of configv1.EncryptionType("aescbc")
to the exported constant configv1.EncryptionTypeAESCBC to ensure type safety and
avoid typos.

In `@test/e2e-encryption-rotation/encryption_rotation.go`:
- Line 49: Replace the string-cast encryption type with the typed constant: in
the APIServerEncryption initialization (configv1.APIServerEncryption) use
configv1.EncryptionTypeAESCBC instead of configv1.EncryptionType("aescbc") so
the field reads APIServerEncryption: configv1.APIServerEncryption{Type:
configv1.EncryptionTypeAESCBC}.

In `@test/e2e-encryption/encryption.go`:
- Line 76: Replace the raw string cast used for APIServerEncryption
(configv1.APIServerEncryption{Type: configv1.EncryptionType("aescbc")}) with the
typed enum constant (e.g. configv1.AESCBC) to avoid typo drift; update the value
to configv1.APIServerEncryption{Type: configv1.AESCBC} so the code uses the
declared constant instead of a literal string cast.

In `@test/library/client.go`:
- Around line 168-171: Change TestNamespaceBuilder.Create to accept a caller
context (add ctx context.Context to the signature of Create) and pass that ctx
into kubeClient.Create instead of using context.Background(); update the
callsite(s) that use Create (e.g., in idpdeployment helpers) to forward their
test context, and ensure the file imports context if not already present and the
function name TestNamespaceBuilder.Create and the kubeClient.Create(...)
invocation are updated accordingly.

In `@test/library/waits.go`:
- Around line 61-70: WaitForClusterOperatorStatus currently uses context.TODO()
causing callers like WaitForOperatorToPickUpChanges to be unable to cancel the
wait; change the signature of WaitForClusterOperatorStatus to accept a
context.Context parameter, thread that ctx through the call to
wait.PollUntilContextTimeout instead of context.TODO(), and pass the same ctx
into CheckClusterOperatorStatus (and update any callers such as
WaitForOperatorToPickUpChanges to forward their ctx); ensure all references to
PollUntilContextTimeout and CheckClusterOperatorStatus use the passed ctx so the
wait honors caller cancellation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aeeac077-120e-410e-9f38-78cfdf7bdc30

📥 Commits

Reviewing files that changed from the base of the PR and between 4c8a2c0 and c0f4b27.

📒 Files selected for processing (20)
  • cmd/cluster-authentication-operator-tests-ext/main.go
  • test/e2e-encryption-perf/encryption_perf.go
  • test/e2e-encryption-perf/encryption_perf_test.go
  • test/e2e-encryption-rotation/e2e-encryption-rotation_test.go
  • test/e2e-encryption-rotation/encryption_rotation.go
  • test/e2e-encryption/encryption.go
  • test/e2e-encryption/encryption_test.go
  • test/e2e-encryption/main_test.go
  • test/e2e-oidc/external_oidc.go
  • test/e2e-oidc/external_oidc_test.go
  • test/e2e/keycloak.go
  • test/library/client.go
  • test/library/encryption/helpers.go
  • test/library/encryption/perf_helpers.go
  • test/library/encryption/scenarios.go
  • test/library/encryption_wrappers.go
  • test/library/featuregates.go
  • test/library/idpdeployment.go
  • test/library/keycloakidp.go
  • test/library/waits.go

Comment thread test/e2e-oidc/external_oidc.go Outdated
Comment thread test/library/keycloakidp.go
@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch 4 times, most recently from 54487ef to 42f050f Compare June 10, 2026 12:32
@ropatil010

Copy link
Copy Markdown
Contributor Author

Below Failure profiles are related to monitor cases.

  1. e2e-aws-operator-encryption-perf-serial-ote-2of2
0 pass, 0 flaky, 0 skip (7m41s)
  1. e2e-aws-operator-encryption-serial-ote-2of2
passed: (30m34s) 2026-06-10T15:20:57 "[sig-auth] authentication operator [Encryption][Serial] TestEncryptionTurnOnAndOff [Timeout:3h]"
passed: (5m1s) 2026-06-10T15:25:58 "[sig-auth] authentication operator [Encryption][Serial] TestEncryptionTypeIdentity"
  1. e2e-aws-operator-serial-ote
passed: (2m57s) 2026-06-10T14:41:03 "[sig-auth] authentication operator [OIDC][Serial] TestKeycloakAsOIDCPasswordGrantCheckAndGroupSync"
passed: (3m50s) 2026-06-10T14:44:52 "[sig-auth] authentication operator [OIDC][Serial] TestGitLabAsOIDCPasswordGrantCheck"
  1. e2e-gcp-operator-disruptive
passed: (45.5s) 2026-06-10T13:45:09 "[sig-auth] authentication operator [NetworkPolicy][Disruptive] should enforce cross-namespace ingress traffic"
passed: (13.7s) 2026-06-10T13:45:23 "[sig-auth] authentication operator [NetworkPolicy][Disruptive] should enforce NetworkPolicy allow/deny basics in a test namespace"
passed: (13.3s) 2026-06-10T13:45:37 "[sig-auth] authentication operator [NetworkPolicy][Disruptive] should enforce authentication-operator NetworkPolicies"
passed: (1m29s) 2026-06-10T13:47:06 "[sig-auth] authentication operator [NetworkPolicy][Disruptive] should block unauthorized namespace traffic"
passed: (15m45s) 2026-06-10T14:02:52 "[sig-auth] authentication operator [NetworkPolicy][Disruptive] should restore auth NetworkPolicies after delete or mutation[Timeout:30m]"
passed: (1m16s) 2026-06-10T14:04:08 "[sig-auth] authentication operator [NetworkPolicy][Disruptive] should enforce oauth-apiserver NetworkPolicies"
passed: (10.1s) 2026-06-10T14:04:19 "[sig-auth] authentication operator [NetworkPolicy][Disruptive] should ensure auth NetworkPolicies are defined"
passed: (13.3s) 2026-06-10T14:04:33 "[sig-auth] authentication operator [NetworkPolicy][Disruptive] should enforce auth NetworkPolicies"
passed: (2h27m16s) 2026-06-10T16:31:50 "[sig-auth] authentication operator [OIDC][Serial][Disruptive] TestExternalOIDCWithKeycloak [Timeout:3h]"

Hi @liouk PTAL on this PR. Any suggestion would you like to give?

/test e2e-aws-operator-encryption-perf-serial-ote-2of2

@ropatil010 ropatil010 changed the title [WIP] CNTRLPLANE-2589: Migrate e2e-* tests to Ginkgo v2 CNTRLPLANE-2589: Migrate e2e-* tests to Ginkgo v2 Jun 10, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 10, 2026
@ropatil010 ropatil010 changed the title CNTRLPLANE-2589: Migrate e2e-* tests to Ginkgo v2 CNTRLPLANE-2589: Migrate to OTE support, Fix pod security, Pathological Event monitor failures Jun 11, 2026
@ropatil010 ropatil010 changed the title CNTRLPLANE-2589: Migrate to OTE support, Fix pod security, Pathological Event monitor failures CNTRLPLANE-2589: Migrate to OTE support,fix pod security,Pathological Event monitor failures Jun 11, 2026

@liouk liouk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the direction of the most recent changes isn't the right one:

  1. Adding [Disruptive] to encryption tests breaks the dedicated encryption OTE suites. The encryption suite qualifiers in main.go filter on [Encryption] and [Serial] but explicitly exclude [Disruptive] (via !name.contains("[Disruptive]")). By adding [Disruptive] to these tests, they'll be
    excluded from their own suites and instead fall into the operator/disruptive suite. The encryption-specific OTE jobs (operator-encryption/serial, operator-encryption-rotation/serial,
    operator-encryption-perf/serial) will have zero tests to run. I don't think we need to add [Disruptive] to the tests, we should instead set cluster stability to disruptive; these are two different things (one is a tag used for test filtering, the other is a suite property that affects what monitor tests are executed on the suite).

  2. The e2e-gcp-operator-disruptive job fails on [Monitor:known-image-checker] Only known images used by tests -- the Keycloak/GitLab images aren't in the allowed list. Making the security context restricted
    doesn't fix this. Even with this commit, the OIDC test still runs in the disruptive suite and the image check will still fail. Setting cluster stability to disruptive for the OIDC tests is the solution here as well, as it will not check for known images, but at the same time the OIDC suite is indeed disruptive as it currently rolls out the kube-apiserver.

  3. This change contains excessive comments, where in most cases we could do with 1-2 lines. The code is self explanatory and in most cases the comments don't add any information. Additionally, there seem to be a few misplaced comments in keycloakidp.go.

Comment thread test/library/idpdeployment.go Outdated
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from liouk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2026
@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch 2 times, most recently from 750d010 to 853094c Compare June 12, 2026 07:22
Migrated authentication operator e2e tests from library-go to OTE framework.
- Converted test functions to accept testing.TB for better compatibility
- Fixed golint issues: context parameter ordering, ID naming, documentation
- Updated Keycloak security context to use restricted PSA instead of privileged SCC

Fixes for e2e-oidc test compatibility:
- Fixed context handling: Use context.WithCancel instead of context.TODO()
  to ensure proper cancellation when test completes or times out
- Replaced g.By() with t.Logf() for dual-mode compatibility (works with
  both standalone go test and Ginkgo/OTE framework)

These fixes resolve CI failures in e2e-oidc and e2e-oidc-techpreview jobs.

Co-Authored-By: Rohit Patil <[email protected]>

Fix PodSecurityViolation and pathological event monitor test failures

Changes:
- Make IDP security context configurable (privileged vs restricted)
- GitLab requires privileged mode (manages system services as root)
- Keycloak works with restricted mode (runs as non-root)
- Set ClusterStability to Disruptive for encryption test suites
- Encryption tests trigger API server rollouts causing temporary disruption events

This fixes monitor test failures:
- [Monitor:audit-log-analyzer] PodSecurityViolation (50% reduction)
- [Monitor:legacy-test-framework-invariants-pathological] events should not repeat pathologically

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@ropatil010 ropatil010 force-pushed the migrate-e2e-encryption-final branch from 853094c to 057237e Compare June 15, 2026 12:14
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@ropatil010: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator-serial-ote 057237e link false /test e2e-aws-operator-serial-ote
ci/prow/e2e-gcp-operator-disruptive 057237e link true /test e2e-gcp-operator-disruptive

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants