Skip to content

feat(controllers): skip reconciliation for terminals on hibernated shoots#512

Draft
petersutter wants to merge 3 commits into
masterfrom
feat/skip-hibernated-shoots
Draft

feat(controllers): skip reconciliation for terminals on hibernated shoots#512
petersutter wants to merge 3 commits into
masterfrom
feat/skip-hibernated-shoots

Conversation

@petersutter

@petersutter petersutter commented Jun 26, 2026

Copy link
Copy Markdown
Member

/area robustness
/kind enhancement

What this PR does / why we need it:

When a Shoot referenced by a Terminal's host or target credentials is hibernated, the terminal controller now skips reconciliation and emits a Hibernated event. A Shoot watch with a wake-up predicate (IsHibernated: true → false) automatically re-enqueues affected Terminals when the Shoot comes back.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:

Terminal reconciliation is now skipped while a referenced Shoot is hibernated, reducing unnecessary errors and retries. Terminals are automatically re-reconciled when the Shoot wakes up.

Summary by CodeRabbit

  • New Features

    • Terminal reconciliation now recognizes hibernated shoots and pauses with a clear “hibernated” state until the shoot wakes up.
    • Terminals are automatically re-queued when a referenced shoot becomes active again.
  • Bug Fixes

    • Improved status updates during terminal processing, including clearer success, error, and waiting states.
    • Fixed cleanup handling so terminal deletion behaves correctly when a referenced shoot is hibernated.
    • Improved controller efficiency and startup reliability.

Replaces #511 (renamed branch due to cc-utils 50-char ref-length limit blocking image push).

@gardener-prow gardener-prow Bot added area/robustness Robustness, reliability, resilience related kind/enhancement Enhancement, improvement, extension labels Jun 26, 2026
@gardener-prow

gardener-prow Bot commented Jun 26, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign petersutter for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow Bot added the cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. label Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds hibernation-aware Terminal reconciliation. The controller now indexes Terminals by referenced Shoots, watches Shoot hibernation wake-up transitions, applies Shoot cache transforms and RBAC updates, and changes reconcile and delete status handling to wait for hibernated Shoots.

Changes

Terminal hibernation reconciliation

Layer / File(s) Summary
Event and shoot index
api/v1alpha1/terminal_types.go, controllers/terminal_controller.go, controllers/terminal_hibernation_test.go
Adds EventHibernated, the TerminalShootRef index key/indexer, and test helpers plus unit tests for indexing terminals by referenced Shoots.
Manager and test wiring
main.go, controllers/controller_suite_test.go, test/common.go, charts/terminal/charts/application/templates/controller-manager/clusterrole.yaml
Configures Shoot cache transforms, registers the Terminal shoot-reference index at startup, adds the Shoot RBAC rule, and mirrors the index setup in controller and envtest setup.
Shoot wake-up watch
controllers/terminal_controller.go, controllers/terminal_hibernation_test.go
Adds the Shoot update watch that filters hibernated-to-awake transitions and maps matching Shoots to Terminal reconcile requests.
Hibernation-aware reconcile
controllers/terminal_controller.go, controllers/terminal_hibernation_test.go
Checks referenced Shoots while building client sets, patches Terminal status for reconcile success and error paths, and adds tests for hibernated client construction, reconcile blocking, and wake-up progression.
Hibernated delete flow
controllers/terminal_controller.go, controllers/terminal_hibernation_test.go
Handles hibernated referenced Shoots during deletion by recording EventHibernated, keeping delete operations processing, requeueing after the hibernation delay, and fixing external-dependency error aggregation, with deletion tests.

Sequence Diagram(s)

Wake-up routing

sequenceDiagram
  participant Shoot as gardencorev1beta1.Shoot
  participant Predicate as shootWakeUpPredicate
  participant Mapper as mapShootToTerminals
  participant Index as TerminalShootRef
  participant Controller as TerminalController
  Shoot->>Predicate: update event with IsHibernated true→false
  Predicate-->>Mapper: accept update
  Mapper->>Index: list Terminal objects by Shoot reference
  Index-->>Mapper: matching Terminal keys
  Mapper-->>Controller: enqueue reconcile requests
Loading

Hibernated delete flow

sequenceDiagram
  participant DeleteTerminal as deleteTerminal
  participant ClientSet as newClientSetFromClusterCredentials
  participant HibernationCheck as isShootRefHibernated
  DeleteTerminal->>ClientSet: create host and target clients
  ClientSet->>HibernationCheck: read referenced Shoot status
  HibernationCheck-->>ClientSet: errShootHibernated when IsHibernated=true
  ClientSet-->>DeleteTerminal: hibernation error
  DeleteTerminal->>DeleteTerminal: patchTerminalOperationProcessing
  DeleteTerminal->>DeleteTerminal: requeue after shootHibernationRequeueAfter
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hopped through Shoots both snoozy and bright,
Then woke them with a controller delight.
Indexes jingled, and caches went clear,
The Terminal thumped: “I’m hibernating here!”
Now wakes and deletes do a softer ballet,
With carrot crumbs dancing the dawn-hop way.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: terminals now skip reconciliation when a referenced Shoot is hibernated.
Description check ✅ Passed The description follows the template and covers categorization, purpose, issue, notes, and release note, though the issue reference is left blank.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/skip-hibernated-shoots

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gardener-prow gardener-prow Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
controllers/controller_suite_test.go (1)

112-114: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Mirror the production Shoot cache transform in envtest.

This updates the field index, but the test manager still uses the default Shoot cache while main.go now strips cached Shoots down to metadata plus status.isHibernated. Without the same transform in tests, accidental reads of stripped Shoot fields will pass here and only fail in production.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/controller_suite_test.go` around lines 112 - 114, The envtest
setup in controller_suite_test.go only registers the Terminal field index and
still relies on the default Shoot cache behavior, so it should mirror the
production cache transform used in main.go. Update the test manager setup to
apply the same Shoot cache transform that strips Shoots down to metadata plus
status.isHibernated, using the relevant cache/config setup alongside
GetFieldIndexer().IndexField so tests catch accidental reads of removed Shoot
fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@controllers/terminal_controller.go`:
- Around line 303-313: The hibernation wait path in the terminal reconciliation
logic is being treated as a successful reconcile, which causes the status update
to be marked succeeded even when work was skipped. Update the flow in the
Terminal controller’s reconcile path around isAnyReferencedShootHibernated so
that the hibernated case returns a non-successful/skip result or otherwise
bypasses the success patch, while still recording the event and log. Ensure the
unconditional success status update does not run for the hibernated wait branch.
- Around line 116-120: The new Shoot watch in TerminalController needs RBAC for
reading shoots, but the ClusterRole currently only covers projects and
shoots/adminkubeconfig. Update the controller-manager ClusterRole to grant list
and watch on core.gardener.cloud/shoots so the watch in Watches(...,
handler.EnqueueRequestsFromMapFunc(r.mapShootToTerminals),
shootWakeUpPredicate()) can start successfully; also verify whether the shoot
lookup used by TerminalController methods such as mapShootToTerminals or any
direct Shoot fetches needs get permission and add it if required.

In `@controllers/terminal_hibernation_test.go`:
- Around line 833-850: The hibernation test is bypassing the actual Shoot
wake-up path and can pass even if the Shoot watch wiring is broken. Update the
test around the Terminal reconciling logic to trigger wake-up by changing the
Shoot state only, then assert the Terminal is reconciled via the
Shoot-to-Terminal mapping path instead of mutating a Terminal annotation. Use
the existing `IsHibernated` patch flow and the relevant
`mapShootToTerminals`/watch-based reconciliation behavior to verify the wake-up
contract directly.

---

Nitpick comments:
In `@controllers/controller_suite_test.go`:
- Around line 112-114: The envtest setup in controller_suite_test.go only
registers the Terminal field index and still relies on the default Shoot cache
behavior, so it should mirror the production cache transform used in main.go.
Update the test manager setup to apply the same Shoot cache transform that
strips Shoots down to metadata plus status.isHibernated, using the relevant
cache/config setup alongside GetFieldIndexer().IndexField so tests catch
accidental reads of removed Shoot fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df3fa6a1-acab-4193-8243-9710f5acf92c

📥 Commits

Reviewing files that changed from the base of the PR and between 3c76f00 and 67f2054.

📒 Files selected for processing (6)
  • api/v1alpha1/terminal_types.go
  • controllers/controller_suite_test.go
  • controllers/terminal_controller.go
  • controllers/terminal_hibernation_test.go
  • main.go
  • test/common.go

Comment thread controllers/terminal_controller.go
Comment thread controllers/terminal_controller.go Outdated
Comment thread controllers/terminal_hibernation_test.go Outdated
@petersutter petersutter marked this pull request as draft June 26, 2026 08:32
@gardener-prow gardener-prow Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 26, 2026
@petersutter petersutter force-pushed the feat/skip-hibernated-shoots branch from e2f223d to 52d5907 Compare June 26, 2026 16:35
@gardener-prow gardener-prow Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 26, 2026
…oots

Terminals referencing a hibernated shoot previously failed with
credential errors since the shoot API server is down. The controller
now checks shoot hibernation status before building cluster clients
and short-circuits with a dedicated error.

- Watch Shoots via a wake-up predicate (IsHibernated true→false) and
  re-enqueue referencing Terminals through a field index
- During deletion, clean up available clusters and defer hibernated
  cluster cleanup with a 1h requeue
- Strip cached Shoots to metadata + hibernation status via TransformFunc
@petersutter petersutter force-pushed the feat/skip-hibernated-shoots branch from 52d5907 to 8fc1a62 Compare June 26, 2026 16:42
@petersutter

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@controllers/terminal_controller.go`:
- Around line 304-305: Treat errShootHibernated as a wait-state outcome in
handleRequest and the reconcile path instead of letting it fall through the
generic error handler. Update the TerminalController flow around
newClientSetFromClusterCredentials, handleRequest, and the Hibernated event
handling so hibernated Shoots set the Terminal status appropriately, emit the
promised Hibernated event, and return without surfacing a reconcile error or
triggering controller-runtime retries. Keep the existing errShootHibernated
signal from the host/target client set helpers, but intercept it before the
normal error path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 221973b0-0ef4-4e12-aa6d-4c03c18ad4a1

📥 Commits

Reviewing files that changed from the base of the PR and between 67f2054 and 8fc1a62.

📒 Files selected for processing (7)
  • api/v1alpha1/terminal_types.go
  • charts/terminal/charts/application/templates/controller-manager/clusterrole.yaml
  • controllers/controller_suite_test.go
  • controllers/terminal_controller.go
  • controllers/terminal_hibernation_test.go
  • main.go
  • test/common.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/common.go
  • controllers/controller_suite_test.go
  • api/v1alpha1/terminal_types.go
  • main.go

Comment on lines +304 to +305
hostClientSet, hostClientSetErr := r.newClientSetFromClusterCredentials(ctx, gardenClientSet, t.Spec.Host.Credentials, cfg.HonourServiceAccountRefHostCluster, cfg.Controllers.Terminal.TokenRequestExpirationSeconds)
targetClientSet, targetClientSetErr := r.newClientSetFromClusterCredentials(ctx, gardenClientSet, t.Spec.Target.Credentials, cfg.HonourServiceAccountRefTargetCluster, cfg.Controllers.Terminal.TokenRequestExpirationSeconds)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Treat hibernated Shoots as a wait state instead of a reconcile error.

These new helpers return errShootHibernated, but the normal reconcile path still bubbles that up through handleRequest's generic error handling. In practice that marks the Terminal as errored, skips the promised Hibernated event on reconcile, and keeps controller-runtime retrying while the Shoot sleeps—exactly the retry pattern this PR is trying to avoid.

Suggested direction
 hostClientSet, hostClientSetErr := r.newClientSetFromClusterCredentials(...)
 targetClientSet, targetClientSetErr := r.newClientSetFromClusterCredentials(...)

+hostHibernated := isShootHibernatedError(hostClientSetErr)
+targetHibernated := isShootHibernatedError(targetClientSetErr)
+
+if hostHibernated || targetHibernated {
+	r.recordEventAndLog(ctx, t, corev1.EventTypeNormal, extensionsv1alpha1.EventHibernated,
+		"Referenced shoot is hibernated, waiting for wake-up before reconciling")
+	return ctrl.Result{}, r.patchTerminalOperationProcessing(
+		ctx,
+		t,
+		extensionsv1alpha1.LastOperationTypeReconcile,
+		"Terminal reconciliation is waiting for a referenced shoot to wake from hibernation.",
+	)
+}
+
 if hostClientSetErr != nil {
   ...
 }

Also applies to: 501-516

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/terminal_controller.go` around lines 304 - 305, Treat
errShootHibernated as a wait-state outcome in handleRequest and the reconcile
path instead of letting it fall through the generic error handler. Update the
TerminalController flow around newClientSetFromClusterCredentials,
handleRequest, and the Hibernated event handling so hibernated Shoots set the
Terminal status appropriately, emit the promised Hibernated event, and return
without surfacing a reconcile error or triggering controller-runtime retries.
Keep the existing errShootHibernated signal from the host/target client set
helpers, but intercept it before the normal error path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/robustness Robustness, reliability, resilience related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/enhancement Enhancement, improvement, extension size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant