CNTRLPLANE-3054: auto-run migration scripts on Konflux task version bumps#8054
CNTRLPLANE-3054: auto-run migration scripts on Konflux task version bumps#8054celebdor wants to merge 2 commits into
Conversation
The update_trusted_task_bundles.py script now automatically detects and runs migration scripts from konflux-ci/build-definitions when upgrading tasks across versions. This prevents pipeline breakage caused by version bumps that remove/rename parameters or results (as happened with the init task 0.2→0.4 upgrade in CNTRLPLANE-3054). Migrations run by default with --upgrade-versions and can be skipped with --skip-migrations. In dry-run mode, pending migrations are listed without execution. Ref: CNTRLPLANE-3054 Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
@celebdor: This pull request references CNTRLPLANE-3054 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@celebdor: This pull request references CNTRLPLANE-3054 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Excluded labels (none allowed) (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe changes introduce automated migration script detection and execution into the trusted task bundles update workflow. A new Sequence Diagram(s)sequenceDiagram
actor User
participant Script as update_trusted_task_bundles.py
participant HTTP as Migration Scripts Endpoint
participant FileSystem as Local Pipeline Files
participant Shell as Migration Scripts
User->>Script: Run with version bump
Script->>Script: Detect version bump
Script->>Script: Compute intermediate versions
Script->>HTTP: Probe for migration scripts (curl)
HTTP-->>Script: Return available migration script URLs
Script->>Script: Generate list of migrations to run
alt Dry-run mode
Script-->>User: Print pending migrations
else Normal mode
Script->>Shell: Execute migration scripts in sequence
Shell->>FileSystem: Update pipeline YAML files
Shell-->>Script: Return execution results
Script-->>User: Report completed migrations
end
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@celebdor: This pull request references CNTRLPLANE-3054 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hack/tools/scripts/update_trusted_task_bundles.py`:
- Around line 564-573: The code collapses distinct version bumps by using
update.task_name as the dict key, causing overwrites and duplicate file entries;
change the keys for version_bumps and bump_files to a tuple of
(update.task_name, update.current_version, update.latest_version) when
collecting from all_results and ensure you deduplicate file paths (e.g., use a
set or check existence before append) so each unique migration tuple maps to a
unique list of files; apply the same tuple-key + dedupe fix to the other
collection block that mirrors this logic (the block using version_bumps and
bump_files later in the file).
- Around line 539-546: The current curl/http-check block (the subprocess.run
call with ["curl", "-fsSL", "-o", "/dev/null", "-w", "%{http_code}", url]
wrapped in try/except for subprocess.CalledProcessError and
subprocess.TimeoutExpired) treats transport/timeouts like "not found" and
swallows execution errors later; change it to treat non-200 non-404 responses
and any exceptions as fatal: explicitly check result.stdout for "200"
(available) and for "404" (treat as not available) but if result.stdout is any
other code or if subprocess.CalledProcessError/TimeoutExpired occurs, raise/exit
with an error so the update aborts; apply the same policy to the download
subprocess.run (the block that actually fetches the migration script) and the
migration execution subprocess.run (the block that runs the script which
currently only logs failures around lines 613-619 and 790-798), and ensure you
don’t write/commit bundle refs until all lookups, downloads, and executions
succeed (or roll back the write on failure) so partial migrations cannot leave
half-updated YAML.
- Around line 495-498: The MIGRATION_SCRIPT_URL currently points at the mutable
"main" branch and the fetched script is executed directly with bash; change
MIGRATION_SCRIPT_URL to reference an immutable ref (commit hash or release tag)
via a new constant or environment variable (e.g., MIGRATION_REF) and update
fetch logic to first download the script to a file, obtain a trusted
checksum/signature (from a pinned metadata URL or bundled allowlist), verify the
downloaded file against that checksum/signature, and only then execute it; also
stop piping remote content straight into bash—use the verified local file path
in the code paths that call the executor (look for MIGRATION_SCRIPT_URL and the
logic that invokes bash) and abort with a clear error if verification fails.
- Around line 530-531: The code uses task_name.removeprefix("task-") which
requires Python 3.9; replace it with a Python 3.8-compatible slice: set
short_name = task_name[len("task-"): ] if task_name.startswith("task-") else
task_name so the "task-" prefix is stripped only when present (locate the
short_name assignment that uses task_name.removeprefix).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 4fe7df0e-a110-44b4-aa73-9f9e62c69ad7
📒 Files selected for processing (2)
.claude/commands/update-konflux-tasks.mdhack/tools/scripts/update_trusted_task_bundles.py
| MIGRATION_SCRIPT_URL = ( | ||
| "https://raw.githubusercontent.com/konflux-ci/build-definitions/main" | ||
| "/task/{task_name}/{version}/migrations/{version}.sh" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'raw\.githubusercontent\.com/konflux-ci/build-definitions/main|curl", "-fsSL", "-o", script_path|bash", script_path' hack/tools/scripts/update_trusted_task_bundles.pyRepository: openshift/hypershift
Length of output: 274
Pin migration scripts to an immutable ref and verify before execution.
The migration source is resolved from the mutable main branch (line 496), and the downloaded script is executed directly with bash (line 610) without verification. This creates a supply-chain risk: any upstream change or compromise becomes arbitrary code execution in the caller's environment. Additionally, fetching from a mutable branch makes the update non-reproducible.
Use an immutable commit hash or release tag instead of /main/, and verify the downloaded content (e.g., checksum or signature) before invoking bash to execute it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hack/tools/scripts/update_trusted_task_bundles.py` around lines 495 - 498,
The MIGRATION_SCRIPT_URL currently points at the mutable "main" branch and the
fetched script is executed directly with bash; change MIGRATION_SCRIPT_URL to
reference an immutable ref (commit hash or release tag) via a new constant or
environment variable (e.g., MIGRATION_REF) and update fetch logic to first
download the script to a file, obtain a trusted checksum/signature (from a
pinned metadata URL or bundled allowlist), verify the downloaded file against
that checksum/signature, and only then execute it; also stop piping remote
content straight into bash—use the verified local file path in the code paths
that call the executor (look for MIGRATION_SCRIPT_URL and the logic that invokes
bash) and abort with a clear error if verification fails.
| # Strip "task-" prefix for the build-definitions repo path | ||
| short_name = task_name.removeprefix("task-") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'Python 3\.8\+|removeprefix\(' hack/tools/scripts/update_trusted_task_bundles.pyRepository: openshift/hypershift
Length of output: 137
removeprefix() requires Python 3.9, contradicting documented Python 3.8+ support.
Line 531 uses str.removeprefix(), which was added in Python 3.9. This breaks compatibility with Python 3.8, which is documented as supported at line 28. Either bump the minimum requirement to Python 3.9+ or use a 3.8-compatible alternative.
Python 3.8-compatible fix
- short_name = task_name.removeprefix("task-")
+ short_name = task_name[5:] if task_name.startswith("task-") else task_name🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hack/tools/scripts/update_trusted_task_bundles.py` around lines 530 - 531,
The code uses task_name.removeprefix("task-") which requires Python 3.9; replace
it with a Python 3.8-compatible slice: set short_name = task_name[len("task-"):
] if task_name.startswith("task-") else task_name so the "task-" prefix is
stripped only when present (locate the short_name assignment that uses
task_name.removeprefix).
| result = subprocess.run( | ||
| ["curl", "-fsSL", "-o", "/dev/null", "-w", "%{http_code}", url], | ||
| capture_output=True, text=True, timeout=10 | ||
| ) | ||
| if result.stdout.strip() == "200": | ||
| available.append((version, url)) | ||
| except (subprocess.CalledProcessError, subprocess.TimeoutExpired): | ||
| continue |
There was a problem hiding this comment.
Treat migration lookup and execution failures as fatal.
Lines 539-546 currently treat transport problems the same as “no script exists”, and Lines 613-619 only log execution failures. By the time Lines 790-798 are reached, the bundle refs have already been written, so the command can still exit successfully with half-migrated YAML. Distinguish 404 from lookup failures, and abort or roll back the update on any lookup, download, or execution error.
Also applies to: 613-619, 790-798
🧰 Tools
🪛 Ruff (0.15.6)
[error] 539-539: subprocess call: check for execution of untrusted input
(S603)
[error] 540-540: Starting a process with a partial executable path
(S607)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hack/tools/scripts/update_trusted_task_bundles.py` around lines 539 - 546,
The current curl/http-check block (the subprocess.run call with ["curl",
"-fsSL", "-o", "/dev/null", "-w", "%{http_code}", url] wrapped in try/except for
subprocess.CalledProcessError and subprocess.TimeoutExpired) treats
transport/timeouts like "not found" and swallows execution errors later; change
it to treat non-200 non-404 responses and any exceptions as fatal: explicitly
check result.stdout for "200" (available) and for "404" (treat as not available)
but if result.stdout is any other code or if
subprocess.CalledProcessError/TimeoutExpired occurs, raise/exit with an error so
the update aborts; apply the same policy to the download subprocess.run (the
block that actually fetches the migration script) and the migration execution
subprocess.run (the block that runs the script which currently only logs
failures around lines 613-619 and 790-798), and ensure you don’t write/commit
bundle refs until all lookups, downloads, and executions succeed (or roll back
the write on failure) so partial migrations cannot leave half-updated YAML.
| # Collect unique version bumps (task_name -> (current, target)) | ||
| version_bumps: Dict[str, Tuple[str, str]] = {} | ||
| bump_files: Dict[str, List[Path]] = {} | ||
|
|
||
| for filepath, result in all_results.items(): | ||
| for update in result.updates: | ||
| if update.is_version_bump: | ||
| key = update.task_name | ||
| version_bumps[key] = (update.current_version, update.latest_version) | ||
| bump_files.setdefault(key, []).append(filepath) |
There was a problem hiding this comment.
Don't collapse different bumps under task_name.
Lines 565-573 and 624-632 key migrations only by task name. If two files bump the same task from different starting versions, the later entry overwrites the earlier one; and if the same task appears multiple times in one file, that file is added repeatedly. The result is skipped intermediate migrations for some files or duplicate runs for others. Key this by (task_name, current_version, latest_version) and dedupe the file list.
Also applies to: 624-632
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hack/tools/scripts/update_trusted_task_bundles.py` around lines 564 - 573,
The code collapses distinct version bumps by using update.task_name as the dict
key, causing overwrites and duplicate file entries; change the keys for
version_bumps and bump_files to a tuple of (update.task_name,
update.current_version, update.latest_version) when collecting from all_results
and ensure you deduplicate file paths (e.g., use a set or check existence before
append) so each unique migration tuple maps to a unique list of files; apply the
same tuple-key + dedupe fix to the other collection block that mirrors this
logic (the block using version_bumps and bump_files later in the file).
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, celebdor The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Replace the separate "Check Migration Notes" and "Update Pipeline Files" steps with a combined "Apply Updates and Handle Migrations" step that: - Runs update_trusted_task_bundles.py --upgrade-versions to apply all updates (digest and version bumps) with automatic migrations in one shot - Then has the agent check for manual migration instructions in MIGRATION.md that the script can't handle automatically This eliminates the redundant manual file editing step since the script already handles bundle reference updates and migration scripts. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
/retest-required Hey @celebdor the PR LGTM, looks like Konflux pipeline is failing, I will tag it after it’s passing. |
|
@celebdor: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Stale PRs are closed after 21d of inactivity. If this PR is still relevant, comment to refresh it or remove the stale label. If this PR is safe to close now please do so with /lifecycle stale |
|
I now have the complete picture. Here is the analysis: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Konflux pipeline failed at validation time before any task executed. The PR branch (last pushed 2026-03-25) contains a stale Root CauseThe root cause is a stale branch that predates a breaking Tekton task migration. Here is the timeline:
The PR itself only modifies Recommendations
Evidence
|
Summary
update_trusted_task_bundles.pywhen upgrading Tekton tasks across versions--upgrade-versionsand can be skipped with--skip-migrations; in dry-run mode, pending migrations are listed without execution/update-konflux-tasksClaude command docs to reflect the new behavior andpmtrequirementHow it works
When a task version bump is detected (e.g., init 0.2 → 0.4), the script:
https://raw.githubusercontent.com/konflux-ci/build-definitions/main/task/{name}/{version}/migrations/{version}.shpmt(pipeline-migration-tool)Test plan
update_trusted_task_bundles.py .tekton/pipelines/*.yaml --dry-run --upgrade-versionsand verify migration scripts are listed--dry-runon a test branch and verify migrations are applied correctly--skip-migrationsflag prevents migration executionRef: CNTRLPLANE-3054
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
--skip-migrationsCLI flag to optionally bypass migration execution.Documentation