Skip to content

OCPBUGS-88757: Align HCCO CatalogSource registryPoll interval with OCP defaults#8745

Draft
shijadha wants to merge 1 commit into
openshift:mainfrom
shijadha:fix-catalogsource-poll-interval
Draft

OCPBUGS-88757: Align HCCO CatalogSource registryPoll interval with OCP defaults#8745
shijadha wants to merge 1 commit into
openshift:mainfrom
shijadha:fix-catalogsource-poll-interval

Conversation

@shijadha

@shijadha shijadha commented Jun 16, 2026

Copy link
Copy Markdown

Summary

Updates the HCCO-managed CatalogSource registryPoll.interval from 10m to 240m to align with standard OCP 4.18+ defaults established by operator-marketplace.

Context

In standard OCP clusters (4.18+), the operator-marketplace component sets the default registryPoll interval to 240 minutes (4 hours) to address performance issues including unbounded etcd growth and high I/O on control plane nodes (see operator-marketplace PR #695 and OCPBUGS-69441).

In HyperShift/HCP hosted clusters, the Hosted Cluster Config Operator (HCCO) manages CatalogSources independently with its own reconciliation logic in catalogs.go, which still hardcodes the interval to 10 minutes. This creates inconsistency between standard OCP and HCP behavior.

Changes

  • Updated RawInterval from "10m" to "240m" in catalogs.go
  • Updated Interval duration from 10 * time.Minute to 240 * time.Minute

Benefits

  • Reduced control plane resource consumption: Less CPU/memory/network for catalog polling
  • Lower etcd churn: Fewer updates to CatalogSource status means less etcd I/O and storage growth
  • Better scalability: Management clusters hosting many HCP clusters see 24x reduction in aggregate polling load (4 catalogs × 6 polls/hour → 4 catalogs × 0.25 polls/hour per hosted cluster)
  • Consistency with standard OCP: HCP clusters behave the same as OCP 4.18+ clusters

Trade-offs

  • Catalog update discovery latency: When a new operator catalog image is pushed to the registry, discovery time increases from up to 10 minutes to up to 4 hours
  • Mitigation: Users can manually trigger refresh by deleting the CatalogSource pod or updating the spec.image field

Related Issues

  • OCPBUGS-88757 - HyperShift HCCO hardcodes CatalogSource registryPoll interval to 10m instead of 240m
  • OCPBUGS-88758 - Audit HyperShift HCCO for divergences from standard OCP component defaults
  • OCPBUGS-69441 - Original issue for standard OCP (fixed in operator-marketplace)

Testing

  • Code change is straightforward and low-risk (2 lines)
  • No functional behavior change, only timing adjustment
  • Follows established pattern from operator-marketplace

Summary by CodeRabbit

  • Chores
    • Updated OLM catalog source polling interval from 10 minutes to 240 minutes (4 hours), reducing the frequency of automatic catalog checks.

Update HCCO-managed CatalogSource registryPoll interval from 10m to
240m to match standard OCP 4.18+ defaults set by operator-marketplace.

This change:
- Reduces control plane resource consumption from catalog polling
- Decreases etcd churn and storage growth
- Improves scalability for management clusters hosting many HCPs
- Aligns HyperShift behavior with upstream OCP (changed in OCPBUGS-69441)

Trade-off: Catalog update discovery latency increases from 10 minutes
to 4 hours. Users can manually trigger refresh by deleting the
CatalogSource pod if needed.

Signed-off-by: Shital Jadhav <[email protected]>
Commit-Message-Assisted-by: Claude (via Claude Code)
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 16, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@shijadha: This pull request references Jira Issue OCPBUGS-88757, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

Updates the HCCO-managed CatalogSource registryPoll.interval from 10m to 240m to align with standard OCP 4.18+ defaults established by operator-marketplace.

Context

In standard OCP clusters (4.18+), the operator-marketplace component sets the default registryPoll interval to 240 minutes (4 hours) to address performance issues including unbounded etcd growth and high I/O on control plane nodes (see operator-marketplace PR #695 and OCPBUGS-69441).

In HyperShift/HCP hosted clusters, the Hosted Cluster Config Operator (HCCO) manages CatalogSources independently with its own reconciliation logic in catalogs.go, which still hardcodes the interval to 10 minutes. This creates inconsistency between standard OCP and HCP behavior.

Changes

  • Updated RawInterval from "10m" to "240m" in catalogs.go
  • Updated Interval duration from 10 * time.Minute to 240 * time.Minute

Benefits

  • Reduced control plane resource consumption: Less CPU/memory/network for catalog polling
  • Lower etcd churn: Fewer updates to CatalogSource status means less etcd I/O and storage growth
  • Better scalability: Management clusters hosting many HCP clusters see 24x reduction in aggregate polling load (4 catalogs × 6 polls/hour → 4 catalogs × 0.25 polls/hour per hosted cluster)
  • Consistency with standard OCP: HCP clusters behave the same as OCP 4.18+ clusters

Trade-offs

  • Catalog update discovery latency: When a new operator catalog image is pushed to the registry, discovery time increases from up to 10 minutes to up to 4 hours
  • Mitigation: Users can manually trigger refresh by deleting the CatalogSource pod or updating the spec.image field

Related Issues

  • OCPBUGS-88757 - HyperShift HCCO hardcodes CatalogSource registryPoll interval to 10m instead of 240m
  • OCPBUGS-88758 - Audit HyperShift HCCO for divergences from standard OCP component defaults
  • OCPBUGS-69441 - Original issue for standard OCP (fixed in operator-marketplace)

Testing

  • Code change is straightforward and low-risk (2 lines)
  • No functional behavior change, only timing adjustment
  • Follows established pattern from operator-marketplace

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels Jun 16, 2026
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The OLM catalog source registry polling interval in reconcileCatalogSource has been increased from 10 minutes to 240 minutes. Both the RawInterval string field and the corresponding metav1.Duration Interval field have been updated to reflect this new value.

Suggested Reviewers

  • sjenning
  • sdminonne
  • jparrill
🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: updating the HCCO CatalogSource registryPoll interval to align with OCP 4.18+ defaults.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo test files were added or modified by this PR. The change only updates configuration values in catalogs.go, so the stable test names check is not applicable.
Test Structure And Quality ✅ Passed No Ginkgo test code was added in this PR. The custom check applies only when test code is present. The PR only modified source code (catalogs.go) with no accompanying test files.
Topology-Aware Scheduling Compatibility ✅ Passed Change updates only the CatalogSource registry polling interval from 10m to 240m with no scheduling constraints, affinity rules, node selectors, topology assumptions, or replica logic introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests are being added in this PR. The change is a configuration update (polling interval) in catalogs.go, not a test addition. Check is not applicable.
No-Weak-Crypto ✅ Passed No weak crypto, custom crypto implementations, or secret comparisons found. PR only changes polling interval configuration from 10m to 240m.
Container-Privileges ✅ Passed The PR changes a Go source file (catalogs.go) updating registry poll interval from 10m to 240m, with no changes to container privilege settings. SecurityContextConfig is set to "Restricted" (a secu...
No-Sensitive-Data-In-Logs ✅ Passed No logging statements present in the changed file. PR only updates CatalogSource poll interval constants (10m→240m), containing no sensitive data, credentials, or logging calls.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label Jun 16, 2026
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shijadha
Once this PR has been reviewed and has the lgtm label, please assign jparrill for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 16, 2026
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Hi @shijadha. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/olm/catalogs.go`:
- Around line 44-45: Add a unit test to verify the 240-minute polling interval
configuration change for the CatalogSource. Create a test function that
instantiates a CatalogSource, calls the function that modifies it (likely
ReconcileRedHatOperatorsCatalogSource based on the context), and then asserts
that both the RawInterval field equals "240m" and the Interval.Duration field
equals 240*time.Minute as shown in the provided example test structure. This
test should be placed in the appropriate test file for the catalogs.go module.
- Around line 44-45: The polling interval changes for the four default catalog
sources (certified, community, marketplace, red-hat-operators) in the
catalogs.go file lack accompanying unit tests. Add unit tests that verify the
RawInterval and Interval fields are correctly configured for each of the four
default catalog sources with the 240-minute polling interval. The tests should
validate that the interval values are properly set and match across both the
string representation and the metav1.Duration object for each catalog source.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7d0e7fc3-d98e-44a5-ab91-4db377db2305

📥 Commits

Reviewing files that changed from the base of the PR and between 392fd5a and d83ffee.

📒 Files selected for processing (1)
  • control-plane-operator/hostedclusterconfigoperator/controllers/resources/olm/catalogs.go

Comment on lines +44 to +45
RawInterval: "240m",
Interval: &metav1.Duration{Duration: 240 * time.Minute},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add unit tests for the interval change.

The coding guidelines require unit tests for code changes in control-plane-operator/**/*.go. While this is a configuration change, it significantly alters the polling behavior (from 10 minutes to 240 minutes). Consider adding a unit test that verifies both RawInterval and Interval fields are set to the expected 240-minute values in the reconciled CatalogSource.

🧪 Example test structure
func TestReconcileCatalogSourceRegistryPollInterval(t *testing.T) {
	cs := &operatorsv1alpha1.CatalogSource{}
	params := &OperatorLifecycleManagerParams{
		RedHatOperatorsImage: "test-image",
		OLMCatalogPlacement:  hyperv1.GuestOLMCatalogPlacement,
	}
	
	ReconcileRedHatOperatorsCatalogSource(cs, params)
	
	require.NotNil(t, cs.Spec.UpdateStrategy)
	require.NotNil(t, cs.Spec.UpdateStrategy.RegistryPoll)
	assert.Equal(t, "240m", cs.Spec.UpdateStrategy.RegistryPoll.RawInterval)
	assert.Equal(t, 240*time.Minute, cs.Spec.UpdateStrategy.RegistryPoll.Interval.Duration)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/olm/catalogs.go`
around lines 44 - 45, Add a unit test to verify the 240-minute polling interval
configuration change for the CatalogSource. Create a test function that
instantiates a CatalogSource, calls the function that modifies it (likely
ReconcileRedHatOperatorsCatalogSource based on the context), and then asserts
that both the RawInterval field equals "240m" and the Interval.Duration field
equals 240*time.Minute as shown in the provided example test structure. This
test should be placed in the appropriate test file for the catalogs.go module.

Source: Coding guidelines


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

operator-framework api RegistryPoll RawInterval field format requirements version 0.37.0

💡 Result:

In the Operator Lifecycle Manager (OLM) API, the registry poll interval for a CatalogSource is configured via the interval field within the registryPoll updateStrategy [1][2]. This field expects a duration string format [3], such as "45m" or "10m" [1][2]. Regarding the RawInterval field: 1. Technical Role: The RawInterval field was introduced as an internal mechanism to handle the parsing of this interval [4][5]. It serves as a raw string representation of the duration before it is unmarshalled into the structured interval used by the OLM controller [4]. 2. Usage: This field is generally not intended for manual configuration by end-users in their YAML manifests [1][2]. Users should continue to use the interval field, which accepts standard duration strings [1][2]. 3. Version 0.37.0 context: The version 0.37.0 release of the operator-framework/api module refers to updates within the OLM API and associated ecosystem [6][7]. While this version includes updates for new fields (such as a new release field for CSVs) [6][7], the fundamental format requirements for catalog polling intervals remain based on standard duration parsing (e.g., Go's time.ParseDuration). In summary, users should specify the polling frequency using the interval field in their CatalogSource YAML using standard duration notation (e.g., 15m, 1h) [1][2]. The RawInterval field is an internal implementation detail and is not required or recommended for end-user manifests [4][5].

Citations:


🏁 Script executed:

#!/bin/bash
# Description: Search for RegistryPoll usage patterns and validation in the codebase

# Look for existing RegistryPoll configurations to confirm string format patterns
rg -n -C2 'RawInterval.*".*m"' --type=go

# Look for any validation or parsing of RawInterval
rg -n -C3 'RawInterval' --type=go -g '!vendor/' | head -50

Repository: openshift/hypershift

Length of output: 1592


Add unit tests for the catalog source polling interval changes.

The RawInterval string format "240m" is valid for Go duration parsing and is correct. However, this change modifies the polling behavior across all four default catalog sources (certified, community, marketplace, red-hat-operators) without accompanying unit tests. Control-plane-operator changes require unit tests per coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/olm/catalogs.go`
around lines 44 - 45, The polling interval changes for the four default catalog
sources (certified, community, marketplace, red-hat-operators) in the
catalogs.go file lack accompanying unit tests. Add unit tests that verify the
RawInterval and Interval fields are correctly configured for each of the four
default catalog sources with the 240-minute polling interval. The tests should
validate that the interval values are properly set and match across both the
string representation and the metav1.Duration object for each catalog source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants