Skip to content

Use oc adm release extract --tools to download OCP binaries instead of release-controller file-cache#15

Open
tusharjadhav3302 wants to merge 1 commit into
mainfrom
use-oc-adm-release-extract-for-binaries
Open

Use oc adm release extract --tools to download OCP binaries instead of release-controller file-cache#15
tusharjadhav3302 wants to merge 1 commit into
mainfrom
use-oc-adm-release-extract-for-binaries

Conversation

@tusharjadhav3302

Copy link
Copy Markdown

Summary

Replace the release-controller file-cache (openshift-release-artifacts) with
oc adm release extract --tools for downloading OCP installer and client
binaries. The file-cache has no SLA or guaranteed support and can get stuck
indefinitely during tool extraction, causing job failures.

This was triggered by a 4.21 nightly job failure (OSPRH-31439) on 17 June 2026
where the artifacts server got permanently stuck extracting tools for build
4.21.0-0.nightly-2026-06-16-000931 — the URL is still stuck 2 days later.

What changed

  • get_openshift_release_build_name.yml: Save the pullSpec from the
    release stream API response (already returned but previously discarded).

  • get_openshift_release_binaries.yml: Replace HTTP polling of the
    file-cache with oc adm release extract --tools, which pulls binaries
    directly from the container registry. The pull secret is extracted at
    runtime from the host cluster via rhoso_kubeconfig.

Why

Per release-controller maintainer Brad Williams:

"The release-controller's file-cache is not bound by any SLA or guaranteed
support and therefore probably shouldn't be used for automation."

"Much more reliable to call the command directly and work locally:
oc adm release extract --tools --to=<PATH> <pullSpec>"

Manual validation on serval70 (18 June 2026)

All steps tested inside the shiftstackclient-shiftstack pod on a live
RHOSO 18.0 deployment:

  1. oc available: /usr/local/bin/oc (4.22.1) present in the pod
  2. Host cluster pull secret accessible via rhoso_kubeconfig:
    registry.ci.openshift.org confirmed present in auths
  3. oc adm release extract --tools succeeded:
    Downloaded all tarballs for 4.21.0-0.nightly-2026-06-18-005110
  4. Binaries verified:
    • openshift-install version4.21.0-0.nightly-2026-06-18-005110
    • oc version --client4.21.0-0.nightly-2026-06-18-005110
  5. Two kubeconfigs confirmed:
    • Default oc in pod → guest cluster (api.ostest.shiftstack.local)
    • --kubeconfig=rhoso_kubeconfig → host cluster (api.ocp.openstack.lab)
    • Pull secret correctly extracted from host cluster at all times

Scope / limitations

  • Covers nightly (build_name: "") and 4-stable code paths — both go
    through the release stream API which returns pullSpec.
  • Channel-based paths (candidate/fast/stable/eus) do not use the release
    stream API and are unaffected by this change.

@tusharjadhav3302 tusharjadhav3302 added the ready-for-review PR is ready for code review label Jun 18, 2026
@dlaw4608

Copy link
Copy Markdown

LGTM, once E2E testing is done in a live environment and this passes this will be good to merge

@imatza-rh imatza-rh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main concern is build type coverage - openshift_release_pull_spec is only set for the nightly/4-stable path today, but we need it for all paths (candidate, fast, stable, eus, specific builds). The release stream API returns pullSpec for nightly and 4-stable. For channel builds, release.txt on mirror.openshift.com has a Pull From: field with a digest-pinned pull spec that could be parsed directly. See the inline comment for details.

Also worth removing openshift_download_url from defaults/main.yml since it's now unused, and the task name fix from "installer" to "client" at line 72 is a nice catch.


- name: Set openshift_release_pull_spec from release stream API response
ansible.builtin.set_fact:
openshift_release_pull_spec: "{{ latest_build_info.pullSpec }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This set_fact is inside the when: build_name == '' block, so it only fires for nightly and 4-stable. The channel path (candidate/fast/stable/eus, lines 55-74) and the specific-build path (lines 46-51) don't set openshift_release_pull_spec, which means get_openshift_release_binaries.yml would fail for every job using openshift_build_name: "candidate" - that's 6 of 13 job definitions.

For channel builds, release.txt already has a Pull From: field (quay.io/openshift-release-dev/ocp-release@sha256:...) that could be parsed here. For specific builds, a fallback constructing the pull spec from openshift_release_build_name would cover the rest.

@tusharjadhav3302 tusharjadhav3302 Jun 18, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , this was indeed a gap. Fixed in the latest push:

Channel builds (candidate/fast/stable/eus): The release.txt file already has a Pull From: field with a digest-pinned pull spec (e.g. quay.io/openshift-release-dev/ocp-release@sha256:...). Added a task to parse it with grep '^Pull From:' | awk '{print $3}' and set openshift_release_pull_spec from it.

Specific builds: Wrapped the existing set_fact in a block and added pull spec construction — nightly builds use registry.ci.openshift.org/ocp/release:<build_name>, GA builds use quay.io/openshift-release-dev/ocp-release:<build_name>-x86_64.

All three code paths now set openshift_release_pull_spec before get_openshift_release_binaries.yml runs.

# file-cache (openshift-release-artifacts), which has no SLA and can get stuck
# indefinitely during tool extraction.
#
# Ref: https://redhat-internal.slack.com/archives/C04UEUYPN69/p1778758938890989

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take into account that this is a public repo.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — the Ref: line with the Slack URL is gone.

when: openshift_release_pull_spec is not defined or openshift_release_pull_spec == ''

- name: Extract pull secret from host cluster
ansible.builtin.shell: >-

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pipeline is missing set -o pipefail - if oc get secret fails, base64 -d still exits 0 and creates an empty file, then oc adm release extract fails later with a confusing auth error. The same role's get_openshift_release_build_name.yml:68 already uses pipefail.

Also worth adding no_log: true here since it handles the decoded pull secret - same pattern as tools_install_custom_mce_catalog/tasks/main.yml.

@tusharjadhav3302 tusharjadhav3302 Jun 18, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both addressed:

Added set -o pipefail && at the start of the pipeline so a failure in oc get secret properly propagates instead of being masked by base64 -d.
Added no_log: true to suppress the decoded pull secret from appearing in logs, consistent with tools_install_custom_mce_catalog/tasks/main.yml

retries: 3
delay: 30

- name: Remove pull secret file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup is inside the block: but there's no always: section. If oc adm release extract exhausts its retries, the block exits and the decoded pull secret stays on disk. Moving this to an always: block would guarantee cleanup on all paths.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the Remove pull secret file task to an always: section on the outer block, so it runs regardless of whether oc adm release extract succeeds or exhausts retries. The pull secret file is now guaranteed to be cleaned up on all exit paths.

@tusharjadhav3302 tusharjadhav3302 force-pushed the use-oc-adm-release-extract-for-binaries branch from 853342b to bda356e Compare June 18, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-review PR is ready for code review

Development

Successfully merging this pull request may close these issues.

3 participants