CNTRLPLANE-3633: docs: add presubmit e2e triage guide#8741
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@bryan-cox: This pull request references CNTRLPLANE-3633 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughA new CI triage documentation section is added for HyperShift. The landing page ( 🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/mkdocs.yml`:
- Around line 88-91: The 'Triage' subsection within the CI section of the
mkdocs.yml navigation is not in alphabetical order. Move the entire 'Triage'
entry (including its nested 'Presubmit Failures' and 'Daily CI Health'
subsections) to its correct alphabetical position within the CI section, which
should be after 'Sync Community Fork' and before 'V2 E2E Testing'. This will
ensure the navigation structure passes the verify-docs-nav-order.py validation
check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 1d387a8c-f823-48ca-aea9-24cefc035369
📒 Files selected for processing (4)
docs/content/how-to/ci/triage/daily-health.mddocs/content/how-to/ci/triage/index.mddocs/content/how-to/ci/triage/presubmit-failures.mddocs/mkdocs.yml
d6bb8df to
e9da9d3
Compare
|
I now have the complete root cause. Here is my analysis: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Verify job runs Root CauseThe PR modifies three documentation files under
The repository has an auto-generated file The update: api-deps workspace-sync deps api api-docs clients docs-aggregateWhen CI ran The fix is straightforward: the PR author needs to run Recommendations
Evidence
|
3124b35 to
ea9bc65
Compare
mgencur
left a comment
There was a problem hiding this comment.
Looks great! this is very useful. Thanks for putting this together.
Left a couple of comments.
|
|
||
| If you use Claude Code, these skills can automate most of the investigation below: | ||
|
|
||
| - `/e2e-analyze <prow-job-url> <artifacts-dir>` — Downloads build logs and artifacts, analyzes the failure, and outputs a structured error/summary/evidence report. This is a repo-local skill available to anyone with the hypershift repo. |
There was a problem hiding this comment.
The e2e-analyze command was the initial one and we created a more advanced variant ci:analyze-prow-job-test-failure. At the beginning they were same but then ci:analyze-prow-job-test-failure evolved further and included support for hypershift hosted cluster. I would probably use only that one and not e2e-analyze.
With that said, we could probably remove e2e-analyze from Hypershift repo and in this new guide, mention the openshift-eng/ai-helpers repository where the recommended commands exist.
There was a problem hiding this comment.
Good call — removed the e2e-analyze mention and pointed to openshift-eng/ai-helpers as the recommended source for the CI plugin skills.
|
|
||
| prow_e2e -->|"Cluster creation<br/>failed"| create["Check Artifacts for JUnit XML<br/>or search log for the error"] | ||
| prow_e2e -->|"Test failed"| tests["Find failed test name in<br/>JUnit XML or Ginkgo output"] | ||
| prow_e2e -->|"Teardown failed"| destroy["/retest — rarely your code"] |
There was a problem hiding this comment.
Do you think this box should have the "Same failure again"/ "Passes" edge leading to "Escalate" / "Done - was a flake" box? (similar to the edges going from "/retest once")
There was a problem hiding this comment.
Good idea — added "Same failure again" → Escalate and "Passes" → Done edges from the teardown/retest box.
|
/lgtm |
|
Pipeline controller notification No second-stage tests were triggered for this PR. This can happen when:
Use |
ea9bc65 to
67acfba
Compare
Create a new Triage subsection under CI docs with a step-by-step runbook for diagnosing PR presubmit e2e failures. Move the daily CI health check page into the triage section.
67acfba to
abcb838
Compare
|
/lgtm |
|
Pipeline controller notification No second-stage tests were triggered for this PR. This can happen when:
Use |
|
/lgtm |
|
/verified by @bryan-cox |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
What this PR does / why we need it:
Adds a step-by-step triage runbook for PR presubmit e2e failures. Developers see a red X on their PR and often don't know where to start — this guide walks them through it from top to bottom.
Creates a new Triage subsection under the CI docs (
docs/content/how-to/ci/triage/) with:checking-ci.mdinto the triage section (content unchanged)/e2e-analyze,ci:analyze-prow-job-test-failure,ci:analyze-prow-job-install-failure) for automated analysisWhich issue(s) this PR fixes:
Fixes CNTRLPLANE-3633
Special notes for your reviewer:
checking-ci.mdfile is moved (not copied) totriage/daily-health.md— content is unchanged, only the path changedmkdocs build— no errorsChecklist:
Summary by CodeRabbit