USHIFT-6971: patch imagePullPolicy before clock fast-forward in cert rotation test#6666
USHIFT-6971: patch imagePullPolicy before clock fast-forward in cert rotation test#6666copejon wants to merge 2 commits into
Conversation
…rotation test Deployments with imagePullPolicy: Always fail to pull images when the fast-forwarded clock exceeds CDN TLS cert expiry, causing greenboot timeout. Patch all such deployments to IfNotPresent before the time change since images are already cached and the VM is ephemeral.
|
@copejon: This pull request references USHIFT-6971 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds Robot Framework keywords that find deployments with containers set to ChangesCertificate Rotation — ImagePullPolicy patching and orchestration
Sequence DiagramsequenceDiagram
participant Test as Test Suite
participant Kube as Kubernetes API (oc)
participant Deploy as Deployment
Test->>Kube: oc get deploy -A -o json (list deployments)
Kube-->>Test: Deployment manifests
Test->>Test: Filter containers where imagePullPolicy == "Always"
Test->>Kube: oc patch (set container[index].imagePullPolicy = "IfNotPresent")
Kube->>Deploy: Apply patch, trigger rollout
Deploy-->>Kube: Rollout in progress
Test->>Kube: oc rollout status (wait)
Kube-->>Test: Rollout complete
Test->>Test: Stop chronyd and change system date
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: copejon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/suites/standard2/validate-certificate-rotation.robot (1)
176-177: ⚡ Quick winAdd defensive check for split parts count.
If the jq output is malformed or contains unexpected whitespace, indexing
${parts}[0],${parts}[1],${parts}[2]could fail with an index out of range error.🛡️ Suggested defensive check
FOR ${line} IN @{lines} IF '${line.strip()}' == '${EMPTY}' CONTINUE @{parts}= Split String ${line} + ${parts_len}= Get Length ${parts} + IF ${parts_len} != 3 + Log WARNING: Unexpected line format: ${line} + CONTINUE + END Patch Deploy ImagePullPolicy To IfNotPresent ${parts}[0] ${parts}[1] ${parts}[2] END🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/suites/standard2/validate-certificate-rotation.robot` around lines 176 - 177, After splitting `${line}` into `${parts}` before calling `Patch Deploy ImagePullPolicy To IfNotPresent`, add a defensive check using `Get Length` on `${parts}` and if the length is less than 3 use `Run Keyword If` to log a warning and `Continue For Loop` (or otherwise skip the current iteration) to avoid indexing `${parts}[0]`, `${parts}[1]`, `${parts}[2]`; ensure the check sits between the `Split String ${line}` and `Patch Deploy ImagePullPolicy To IfNotPresent` steps so malformed or whitespace-only output won't cause an index out of range.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/suites/standard2/validate-certificate-rotation.robot`:
- Around line 176-177: After splitting `${line}` into `${parts}` before calling
`Patch Deploy ImagePullPolicy To IfNotPresent`, add a defensive check using `Get
Length` on `${parts}` and if the length is less than 3 use `Run Keyword If` to
log a warning and `Continue For Loop` (or otherwise skip the current iteration)
to avoid indexing `${parts}[0]`, `${parts}[1]`, `${parts}[2]`; ensure the check
sits between the `Split String ${line}` and `Patch Deploy ImagePullPolicy To
IfNotPresent` steps so malformed or whitespace-only output won't cause an index
out of range.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: e5261f99-a148-4e36-920d-9a99c17c331c
📒 Files selected for processing (1)
test/suites/standard2/validate-certificate-rotation.robot
|
CI failure in
Retriggering. |
|
/retest-required |
|
@copejon: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| ... Prevents image pull failures when the system clock is fast-forwarded past CDN | ||
| ... TLS certificate expiry dates. | ||
| VAR ${jq_filter}= | ||
| ... .items[] | .metadata.namespace as $ns | .metadata.name as $d | .spec.template.spec.containers | to_entries[] | select(.value.imagePullPolicy == "Always") | "\\($ns) \\($d) \\(.key)" |
There was a problem hiding this comment.
Do we need this check? What will happen if we patch ALL image policies?
| Run With Kubeconfig | ||
| ... oc patch deploy -n ${ns} ${deploy} --type=json -p '[{"op":"replace","path":"/spec/template/spec/containers/${idx}/imagePullPolicy","value":"IfNotPresent"}]' | ||
| Run With Kubeconfig | ||
| ... oc rollout status deploy/${deploy} -n ${ns} --timeout=120s |
There was a problem hiding this comment.
How much time is this taking? Standard suites are already close to the 30m runtime limit, so this may delay those significantly.
| Change System Date To | ||
| [Documentation] Move the system to a future date. | ||
| [Arguments] ${future_date} | ||
| Patch ImagePullPolicy For Date Change |
There was a problem hiding this comment.
I have a more high-level question. If we have certain images with pull policy of Always, this would break offline configurations, wouldn't it?
Because of that reason, should we not be patching those images as part of rebase?
|
Closing in favor of a more targeted solution #6667 /close |
|
@copejon: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
imagePullPolicy: Alwaysfail image pulls when the cert rotation test fast-forwards the clock ~150 days past CDN TLS cert expiry, causingImagePullBackOffand greenboot timeout.IfNotPresent. Images are already cached from initial startup, so no pull is needed.Restore System Dateruns.Test plan
make verify-rfpasses clean (robocop check + format)Manual Rotation Of Etcd Signer Certs— PASS (37.7s)Certificate Rotation— PASS (98.8s)Summary by CodeRabbit