OCPBUGS-86506: Fix lso_lvset_provisioned_PV_count metric#629
Conversation
Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
@tsmetana: This pull request references Jira Issue OCPBUGS-86506, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe reconciler refactors how it counts already-provisioned symlinks and identifies stale entries. Instead of accumulating counts during the ChangesSymlink Count Consolidation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tsmetana The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@tsmetana: This pull request references Jira Issue OCPBUGS-86506, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tsmetana: This pull request references Jira Issue OCPBUGS-86506, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/diskmaker/controllers/lvset/reconcile.go`:
- Around line 303-311: The call to getAlreadySymlinked can fail and currently we
log the error but still publish totalProvisionedPVs (which may be an invalid
zero); change the flow so that after calling getAlreadySymlinked you check err
and, if non-nil, log the error and skip updating the metric (do not call
localmetrics.SetLVSProvisionedPVMetric) or return/propagate the error
instead—update the block around getAlreadySymlinked/totalProvisionedPVs to only
call localmetrics.SetLVSProvisionedPVMetric(nodeName, storageClassName,
totalProvisionedPVs) when err == nil.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6215edd6-8842-41fc-860e-971dd1bd77ba
📒 Files selected for processing (1)
pkg/diskmaker/controllers/lvset/reconcile.go
| totalProvisionedPVs, noMatch, err := getAlreadySymlinked(symLinkDir, blockDevices) | ||
| if err != nil { | ||
| klog.ErrorS(err, "error counting provisioned symlinks") | ||
| } | ||
|
|
||
| klog.InfoS("total devices provisioned", "storageClass", storageClassName, "count", totalProvisionedPVs) | ||
|
|
||
| // update metrics for total persistent volumes provisioned | ||
| localmetrics.SetLVSProvisionedPVMetric(nodeName, storageClassName, totalProvisionedPVs) |
There was a problem hiding this comment.
Handle symlink-count failures before updating metrics.
On Line 303, getAlreadySymlinked errors are only logged, but Line 311 still publishes totalProvisionedPVs (which can be zero from the failure path). This can reintroduce false 0 values for lso_lvset_provisioned_PV_count.
Suggested fix
totalProvisionedPVs, noMatch, err := getAlreadySymlinked(symLinkDir, blockDevices)
if err != nil {
- klog.ErrorS(err, "error counting provisioned symlinks")
+ return ctrl.Result{}, fmt.Errorf("error counting provisioned symlinks: %w", err)
}
klog.InfoS("total devices provisioned", "storageClass", storageClassName, "count", totalProvisionedPVs)
// update metrics for total persistent volumes provisioned
localmetrics.SetLVSProvisionedPVMetric(nodeName, storageClassName, totalProvisionedPVs)As per coding guidelines "Never ignore error returns".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| totalProvisionedPVs, noMatch, err := getAlreadySymlinked(symLinkDir, blockDevices) | |
| if err != nil { | |
| klog.ErrorS(err, "error counting provisioned symlinks") | |
| } | |
| klog.InfoS("total devices provisioned", "storageClass", storageClassName, "count", totalProvisionedPVs) | |
| // update metrics for total persistent volumes provisioned | |
| localmetrics.SetLVSProvisionedPVMetric(nodeName, storageClassName, totalProvisionedPVs) | |
| totalProvisionedPVs, noMatch, err := getAlreadySymlinked(symLinkDir, blockDevices) | |
| if err != nil { | |
| return ctrl.Result{}, fmt.Errorf("error counting provisioned symlinks: %w", err) | |
| } | |
| klog.InfoS("total devices provisioned", "storageClass", storageClassName, "count", totalProvisionedPVs) | |
| // update metrics for total persistent volumes provisioned | |
| localmetrics.SetLVSProvisionedPVMetric(nodeName, storageClassName, totalProvisionedPVs) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/diskmaker/controllers/lvset/reconcile.go` around lines 303 - 311, The
call to getAlreadySymlinked can fail and currently we log the error but still
publish totalProvisionedPVs (which may be an invalid zero); change the flow so
that after calling getAlreadySymlinked you check err and, if non-nil, log the
error and skip updating the metric (do not call
localmetrics.SetLVSProvisionedPVMetric) or return/propagate the error
instead—update the block around getAlreadySymlinked/totalProvisionedPVs to only
call localmetrics.SetLVSProvisionedPVMetric(nodeName, storageClassName,
totalProvisionedPVs) when err == nil.
There was a problem hiding this comment.
In case the the getAlreadySymlinked function returns an error, the value of the totalProvisionedPVs cannot be trusted anyway, just as the metric itself. In fact, setting it to zero would make any problem more visible and increase chance somebody would take a look into the problem.
There was a problem hiding this comment.
@tsmetana That's a fair point — a metric that drops to 0 is arguably a better observable signal than one that silently retains a stale value, and it's consistent with an "alert on anomaly" philosophy. The error is still surfaced via klog.ErrorS, so it isn't truly ignored. I accept the reasoning and withdraw the suggestion.
|
@tsmetana: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
The
lso_lvset_provisioned_PV_countPrometheus metric always reports 0 after the initial provisioning reconcile. The root cause is that the provisioned device count was only computed insideprocessNewSymlink(), which is never called once all devices already have symlinks.The fix is to just count all the symlinked devices.
Summary by CodeRabbit