CNTRLPLANE-3509: Deep-verify release component digests in nodepool resolve step#80576
Conversation
|
@jparrill: This pull request references CNTRLPLANE-3509 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jparrill: GitHub didn't allow me to request PR reviews from the following users: openshift/hypershift-team. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughThe ChangesAuth Argument Threading
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ci-operator/step-registry/hypershift/resolve-nodepool-releases/hypershift-resolve-nodepool-releases-commands.sh (1)
38-60: ⚡ Quick winUse an auth argument array to avoid unquoted expansion and duplicate branching.
auth_argsis currently a string that is intentionally split at call sites; switching to an array is safer and lets you collapse the repeatedif [[ -n "${auth_args}" ]]branches.Suggested refactor
verify_image_pullable() { local pullspec=$1 - local auth_args="" + local -a auth_args=() if [[ -f "${REGISTRY_AUTH}" ]]; then - auth_args="-a ${REGISTRY_AUTH}" + auth_args=(-a "${REGISTRY_AUTH}") fi # Check 1: release tag manifest exists and is the right architecture - if [[ -n "${auth_args}" ]]; then - oc image info --filter-by-os linux/amd64 ${auth_args} "${pullspec}" &>/dev/null || return 1 - else - oc image info --filter-by-os linux/amd64 "${pullspec}" &>/dev/null || return 1 - fi + oc image info --filter-by-os linux/amd64 "${auth_args[@]}" "${pullspec}" &>/dev/null || return 1 # Check 2: internal component digests are still alive in the registry. # The release tag can outlive its component digests when the CI registry # garbage-collects old images. Extract the MCO digest from the release # metadata and verify it is actually pullable. local mco_digest - mco_digest=$(oc adm release info ${auth_args} "${pullspec}" --image-for=machine-config-operator 2>/dev/null) || return 1 - if [[ -n "${auth_args}" ]]; then - oc image info --filter-by-os linux/amd64 ${auth_args} "${mco_digest}" &>/dev/null || return 1 - else - oc image info --filter-by-os linux/amd64 "${mco_digest}" &>/dev/null || return 1 - fi + mco_digest=$(oc adm release info "${auth_args[@]}" "${pullspec}" --image-for=machine-config-operator 2>/dev/null) || return 1 + oc image info --filter-by-os linux/amd64 "${auth_args[@]}" "${mco_digest}" &>/dev/null || return 1 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/hypershift/resolve-nodepool-releases/hypershift-resolve-nodepool-releases-commands.sh` around lines 38 - 60, The auth_args variable is currently a string that gets intentionally split at call sites, causing duplicated if-else branches throughout the code. Convert auth_args to an array instead by initializing it as an empty array and conditionally adding the authentication arguments as array elements. Then replace all the duplicated if-else branches checking for non-empty auth_args with single command lines using array expansion syntax "${auth_args[@]}" to unpack the array arguments. Apply this refactoring to both the oc image info calls checking pullspec and mco_digest, as well as the oc adm release info call.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@ci-operator/step-registry/hypershift/resolve-nodepool-releases/hypershift-resolve-nodepool-releases-commands.sh`:
- Around line 38-60: The auth_args variable is currently a string that gets
intentionally split at call sites, causing duplicated if-else branches
throughout the code. Convert auth_args to an array instead by initializing it as
an empty array and conditionally adding the authentication arguments as array
elements. Then replace all the duplicated if-else branches checking for
non-empty auth_args with single command lines using array expansion syntax
"${auth_args[@]}" to unpack the array arguments. Apply this refactoring to both
the oc image info calls checking pullspec and mco_digest, as well as the oc adm
release info call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a10b5dae-c199-4fc4-8483-fb23bdc474b8
📒 Files selected for processing (1)
ci-operator/step-registry/hypershift/resolve-nodepool-releases/hypershift-resolve-nodepool-releases-commands.sh
csrwng
left a comment
There was a problem hiding this comment.
a couple of nits, otherwise lgtm
| fi | ||
|
|
||
| # Check 1: release tag manifest exists and is the right architecture | ||
| if [[ -n "${auth_args}" ]]; then |
There was a problem hiding this comment.
nit: if specifying ${auth_args} in the command line without quotes, I don't think you need this if
| # metadata and verify it is actually pullable. | ||
| local mco_digest | ||
| mco_digest=$(oc adm release info ${auth_args} "${pullspec}" --image-for=machine-config-operator 2>/dev/null) || return 1 | ||
| if [[ -n "${auth_args}" ]]; then |
The CI registry can garbage-collect individual component image digests while the release tag itself remains accessible. This caused NodePool tests to fail with "manifest unknown" when the ignition server tried to pull component images (e.g. machine-config-operator) by digest. The existing verify_image_pullable only checked the release tag via `oc image info`, which passed even for stale releases. Add a second check that extracts the MCO digest from the release metadata and verifies it is actually pullable in the registry. If the digest has been GC'd, the fallback to the nightly stream is triggered as intended. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Juan Manuel Parrilla Madrid <[email protected]>
fec02f3 to
5dfe2e3
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: csrwng, jparrill The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aws-4-22 |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aks-4-22 |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aks |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aws |
|
[REHEARSALNOTIFIER]
A total of 567 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jparrill: your |
4 similar comments
|
@jparrill: your |
|
@jparrill: your |
|
@jparrill: your |
|
@jparrill: your |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aws-4-22 |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aks-4-22 |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aks |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aws |
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
1 similar comment
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
1 similar comment
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/retest |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aks-4-22 |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aws |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aws-4-22 |
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-kubernetes-release-4.22-e2e-aws-ovn-hypershift |
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
1 similar comment
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jparrill: your |
|
/pj-rehearse pull-ci-openshift-hypershift-main-e2e-aws |
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-kubernetes-release-4.22-e2e-aws-ovn-hypershift |
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse ack |
|
@jparrill: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@jparrill: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
d46e312
into
openshift:main
Summary
TestNodePoolPrevReleaseN2/N3fail withmanifest unknownbecause the CI registry garbage-collects component image digests while the release tag itself remains accessibleverify_image_pullableonly checked the release tag viaoc image info, which passed even for stale releases with GC'd internal digestsmachine-config-operatordigest from the release metadata and verify it is actually pullable in the registrynightlystream is triggered as originally intendedRoot cause analysis
In PR openshift/hypershift#8669, the
e2e-aksjob failed because:4.21.0-0.ci-2026-06-08(7 days old) and4.20.0-0.ci-2026-06-07(8 days old) as the latest accepted CI releasesverify_image_pullableranoc image infoon the release tag → passed (tag still exists)machine-config-operatorby digest from the release →manifest unknown(digest GC'd)The fallback to nightly was never triggered because
oc image infoon the release tag passed — it doesn't check internal component digests.Changes
verify_image_pullable()now performs 3 checks:oc image info --filter-by-os linux/amd64on the release tag (existing check)oc adm release info --image-for=machine-config-operatorto extract the MCO digest from release metadataoc image info --filter-by-os linux/amd64on the MCO digest to verify it's still pullableTest plan
oc image info+oc adm release info --image-for+oc image infochain works for valid releasesresolve_release_imageruns the sameverify_image_pullableon the nightly release/cc @openshift/hypershift-team
🤖 Generated with Claude Code
Summary by CodeRabbit
This PR fixes CI flakiness in OpenShift’s HyperShift NodePool “previous release” tests (notably
TestNodePoolPrevReleaseN2/N3) by hardening how the resolve-nodepool-releases step validates candidate release images in the CI registry.What changed (practical impact)
The CI registry can garbage-collect individual component image digests (while the release tag itself still resolves). Previously, the step only confirmed that the release tag existed, which could still pass even when HyperShift later tried to pull components (e.g., machine-config-operator) by digest, causing “manifest unknown” failures at runtime.
To prevent stale releases from being used,
verify_image_pullable()in:ci-operator/step-registry/hypershift/resolve-nodepool-releases/hypershift-resolve-nodepool-releases-commands.shnow performs deeper verification:
linux/amd64(oc image info --filter-by-os linux/amd64).oc adm release info --image-for=machine-config-operator).linux/amd64(oc image info --filter-by-os linux/amd64against the digest).If any digest is missing (e.g., garbage-collected), the function fails verification so the job falls back to the nightly stream as intended.
Auth handling improvement
The change also centralizes registry authentication handling by introducing an
auth_argsvalue (derived only when/etc/ci-pull-credentials/.dockerconfigjsonexists) and applying it consistently to both the release-tag and digest pullability checks.