Skip to content

feat(metrics): Add CLF Ready condition alert#3291

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
vparfonov:log7719
Jun 9, 2026
Merged

feat(metrics): Add CLF Ready condition alert#3291
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
vparfonov:log7719

Conversation

@vparfonov

@vparfonov vparfonov commented May 21, 2026

Copy link
Copy Markdown
Contributor

Description

Add ClusterLogForwarderNotReady alert that fires when a CLF has been in a not ready state for more than 1 minutes.

Depends on LOG-7718 which adds the log_forwarder_ready metric.

/cc
/assign

Links

Summary by CodeRabbit

  • New Features

    • Added a monitoring alert "ClusterLogForwarderNotReady" that fires after 1 minute when the cluster log forwarder is not ready; it raises an error-level notification and tags the event with service=clusterlogforwarder.
  • Documentation

    • Updated Collector alerts docs with the new alert, guidance on likely causes, and links to the runbook for troubleshooting.

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e07d0a7-22e6-4323-813e-9b679295d8b4

📥 Commits

Reviewing files that changed from the base of the PR and between 5d6fecb and d4ef9f6.

📒 Files selected for processing (3)
  • bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml
  • config/prometheus/collector_alerts.yaml
  • docs/administration/collector-metrics-and-alerts.adoc
✅ Files skipped from review due to trivial changes (1)
  • docs/administration/collector-metrics-and-alerts.adoc

📝 Walkthrough

Walkthrough

Adds a new Prometheus alert rule ClusterLogForwarderNotReady (fires when log_forwarder_ready{status="False"} == 1 for 1 minute) to both the bundled PrometheusRule manifest and the source Prometheus config, and documents the alert in the collector metrics documentation.

Changes

Cluster Log Forwarder Alert Rule

Layer / File(s) Summary
ClusterLogForwarderNotReady alert definition and docs
bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml, config/prometheus/collector_alerts.yaml, docs/administration/collector-metrics-and-alerts.adoc
Adds the ClusterLogForwarderNotReady alert (expression: log_forwarder_ready{status="False"} == 1, for: 1m), annotations referencing resource_namespace/resource_name and a runbook_url, and labels service: clusterlogforwarder, severity: error; documentation describes firing condition and guidance to check status conditions.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble logs beneath the moonlit tree,
A tiny rule to keep the cluster free,
"Not ready" I sniff for sixty seconds long,
Then trumpet softly a red-alert song,
Hop, check the status — the forwarder’s plea. 🐇📣

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description addresses the mandatory sections: it explains the intent (adding ClusterLogForwarderNotReady alert when CLF is not ready for >1 minute), references the dependent JIRA issue (LOG-7718), and provides links to dependent PRs and the related JIRA ticket (LOG-7719). However, the /cc and /assign placeholders remain unfilled despite being marked mandatory. Complete the mandatory /cc and /assign fields by specifying actual reviewers and approvers from the OWNERS file, or clarify if placeholder comments are acceptable in your workflow.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(metrics): Add CLF Ready condition alert' directly and clearly summarizes the main change—adding a new alert for the CLusterLogForwarder ready condition across configuration, manifests, and documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from cahartma and jcantrill May 21, 2026 11:06
@vparfonov

Copy link
Copy Markdown
Contributor Author

/assign @jcantrill

@jcantrill

Copy link
Copy Markdown
Contributor

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 21, 2026

@jcantrill jcantrill left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider doing a quick search of our documentation here to see if there is an update to be made.

runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-logging-operator/ClusterLogForwarderNotReady.md
summary: |-
The ClusterLogForwarder {{ $labels.resource_namespace }}/{{ $labels.resource_name }} has been in a not ready state
for more than 5 minutes. This could indicate a validation error in the ClusterLogForwarder spec or a deployment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we state there is a deployment error of the collector? We don't check the actual deployment of the pods to be able to surface this information.

@r2d2rnd I was actually wondering if we may have some "false" security here with regards to "Ready". This feature was created to mainly identify CLF validation errors that occurs after admission of the CLF where an admin does not get immediate feedback. Do we need to reframe this alert or rename it to clarify what it will actually expose?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped "deployment error"

Comment thread bundle/manifests/collector_monitoring.coreos.com_v1_prometheusrule.yaml Outdated
Add ClusterLogForwarderNotReady alert that fires when a CLF has been
in a not ready state for more than 1 minute. This typically indicates
a validation error in the ClusterLogForwarder spec. Severity is error
and includes a runbook URL for mitigation guidance.

Depends on LOG-7718 which adds the log_forwarder_ready metric.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@vparfonov

Copy link
Copy Markdown
Contributor Author

/test e2e-target

@jcantrill

Copy link
Copy Markdown
Contributor

/approve

@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, vparfonov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2026
@jcantrill

Copy link
Copy Markdown
Contributor

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 8, 2026
@jcantrill

Copy link
Copy Markdown
Contributor

/label verified by @jcantrill

@jcantrill

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2026
@jcantrill

Copy link
Copy Markdown
Contributor

/verified by @jcantrill

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 8, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@jcantrill: This PR has been marked as verified by @jcantrill.

Details

In response to this:

/verified by @jcantrill

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 5abb018 and 2 for PR HEAD d4ef9f6 in total

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD f4e6274 and 1 for PR HEAD d4ef9f6 in total

@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@vparfonov: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit b4d7692 into openshift:master Jun 9, 2026
8 checks passed
@vparfonov vparfonov deleted the log7719 branch June 9, 2026 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release/6.6 verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants