Use oc adm release extract --tools to download OCP binaries instead of release-controller file-cache#15
Conversation
|
LGTM, once E2E testing is done in a live environment and this passes this will be good to merge |
imatza-rh
left a comment
There was a problem hiding this comment.
The main concern is build type coverage - openshift_release_pull_spec is only set for the nightly/4-stable path today, but we need it for all paths (candidate, fast, stable, eus, specific builds). The release stream API returns pullSpec for nightly and 4-stable. For channel builds, release.txt on mirror.openshift.com has a Pull From: field with a digest-pinned pull spec that could be parsed directly. See the inline comment for details.
Also worth removing openshift_download_url from defaults/main.yml since it's now unused, and the task name fix from "installer" to "client" at line 72 is a nice catch.
|
|
||
| - name: Set openshift_release_pull_spec from release stream API response | ||
| ansible.builtin.set_fact: | ||
| openshift_release_pull_spec: "{{ latest_build_info.pullSpec }}" |
There was a problem hiding this comment.
This set_fact is inside the when: build_name == '' block, so it only fires for nightly and 4-stable. The channel path (candidate/fast/stable/eus, lines 55-74) and the specific-build path (lines 46-51) don't set openshift_release_pull_spec, which means get_openshift_release_binaries.yml would fail for every job using openshift_build_name: "candidate" - that's 6 of 13 job definitions.
For channel builds, release.txt already has a Pull From: field (quay.io/openshift-release-dev/ocp-release@sha256:...) that could be parsed here. For specific builds, a fallback constructing the pull spec from openshift_release_build_name would cover the rest.
There was a problem hiding this comment.
Yes , this was indeed a gap. Fixed in the latest push:
Channel builds (candidate/fast/stable/eus): The release.txt file already has a Pull From: field with a digest-pinned pull spec (e.g. quay.io/openshift-release-dev/ocp-release@sha256:...). Added a task to parse it with grep '^Pull From:' | awk '{print $3}' and set openshift_release_pull_spec from it.
Specific builds: Wrapped the existing set_fact in a block and added pull spec construction — nightly builds use registry.ci.openshift.org/ocp/release:<build_name>, GA builds use quay.io/openshift-release-dev/ocp-release:<build_name>-x86_64.
All three code paths now set openshift_release_pull_spec before get_openshift_release_binaries.yml runs.
| # file-cache (openshift-release-artifacts), which has no SLA and can get stuck | ||
| # indefinitely during tool extraction. | ||
| # | ||
| # Ref: https://redhat-internal.slack.com/archives/C04UEUYPN69/p1778758938890989 |
There was a problem hiding this comment.
Take into account that this is a public repo.
There was a problem hiding this comment.
Removed — the Ref: line with the Slack URL is gone.
| when: openshift_release_pull_spec is not defined or openshift_release_pull_spec == '' | ||
|
|
||
| - name: Extract pull secret from host cluster | ||
| ansible.builtin.shell: >- |
There was a problem hiding this comment.
This pipeline is missing set -o pipefail - if oc get secret fails, base64 -d still exits 0 and creates an empty file, then oc adm release extract fails later with a confusing auth error. The same role's get_openshift_release_build_name.yml:68 already uses pipefail.
Also worth adding no_log: true here since it handles the decoded pull secret - same pattern as tools_install_custom_mce_catalog/tasks/main.yml.
There was a problem hiding this comment.
Both addressed:
Added set -o pipefail && at the start of the pipeline so a failure in oc get secret properly propagates instead of being masked by base64 -d.
Added no_log: true to suppress the decoded pull secret from appearing in logs, consistent with tools_install_custom_mce_catalog/tasks/main.yml
| retries: 3 | ||
| delay: 30 | ||
|
|
||
| - name: Remove pull secret file |
There was a problem hiding this comment.
This cleanup is inside the block: but there's no always: section. If oc adm release extract exhausts its retries, the block exits and the decoded pull secret stays on disk. Moving this to an always: block would guarantee cleanup on all paths.
There was a problem hiding this comment.
Moved the Remove pull secret file task to an always: section on the outer block, so it runs regardless of whether oc adm release extract succeeds or exhausts retries. The pull secret file is now guaranteed to be cleaned up on all exit paths.
…or OCP binaries Co-authored-by: Cursor <[email protected]>
853342b to
bda356e
Compare
Summary
Replace the release-controller file-cache (
openshift-release-artifacts) withoc adm release extract --toolsfor downloading OCP installer and clientbinaries. The file-cache has no SLA or guaranteed support and can get stuck
indefinitely during tool extraction, causing job failures.
This was triggered by a 4.21 nightly job failure (OSPRH-31439) on 17 June 2026
where the artifacts server got permanently stuck extracting tools for build
4.21.0-0.nightly-2026-06-16-000931— the URL is still stuck 2 days later.What changed
get_openshift_release_build_name.yml: Save thepullSpecfrom therelease stream API response (already returned but previously discarded).
get_openshift_release_binaries.yml: Replace HTTP polling of thefile-cache with
oc adm release extract --tools, which pulls binariesdirectly from the container registry. The pull secret is extracted at
runtime from the host cluster via
rhoso_kubeconfig.Why
Per release-controller maintainer Brad Williams:
Manual validation on serval70 (18 June 2026)
All steps tested inside the
shiftstackclient-shiftstackpod on a liveRHOSO 18.0 deployment:
/usr/local/bin/oc(4.22.1) present in the podrhoso_kubeconfig:registry.ci.openshift.orgconfirmed present in authsoc adm release extract --toolssucceeded:Downloaded all tarballs for
4.21.0-0.nightly-2026-06-18-005110openshift-install version→4.21.0-0.nightly-2026-06-18-005110oc version --client→4.21.0-0.nightly-2026-06-18-005110ocin pod → guest cluster (api.ostest.shiftstack.local)--kubeconfig=rhoso_kubeconfig→ host cluster (api.ocp.openstack.lab)Scope / limitations
build_name: "") and4-stablecode paths — both gothrough the release stream API which returns
pullSpec.stream API and are unaffected by this change.