fix(backend): drain Maestro REST client connection pool on teardown#5673
fix(backend): drain Maestro REST client connection pool on teardown#5673rhamitarora wants to merge 2 commits into
Conversation
newRESTClient clones http.DefaultTransport, creating a per-client connection pool. The orphan/cleanup controllers rebuild Maestro clients every sync cycle and tore them down via context cancellation, which only stopped the gRPC client and never drained the REST transport's idle connections. This leaked TCP connections and file descriptors over time. Return the underlying *http.Client from newRESTClient and register CloseIdleConnections via context.AfterFunc so the existing cancellation-based teardown also drains the pool.
|
Hi @rhamitarora. Thanks for your PR. I'm waiting for a Azure member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
SubscriptionCollector.Run created a time.Ticker but never stopped it. The loop exits on ctx.Done(), leaving the ticker dangling. Add defer t.Stop() to release it on shutdown, matching the pattern used elsewhere in the codebase.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rhamitarora The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR focuses on preventing resource leaks by ensuring long-lived timers and HTTP connection pools are properly cleaned up when no longer needed.
Changes:
- Stop the
time.TickerinSubscriptionCollector.Runto avoid ticker leaks. - Update Maestro REST client construction to expose the underlying
*http.Client. - Drain the REST client’s idle HTTP connections on context cancellation to avoid leaking TCP connections/FDs across repeated client creation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| frontend/pkg/metrics/metrics.go | Stops the ticker to prevent resource leakage in a long-running loop. |
| backend/pkg/maestro/maestro_client.go | Returns underlying HTTP client and closes idle connections on ctx cancellation to prevent connection/FD leaks. |
| restClient, restHTTPClient := newRESTClient(maestroRESTAPIEndpoint) | ||
| // The REST client uses a cloned http.Transport, which owns its own | ||
| // connection pool. Callers tear the client down by cancelling ctx, so we | ||
| // drain the pool's idle connections on cancellation to avoid leaking TCP | ||
| // connections and file descriptors across repeated client creations. | ||
| context.AfterFunc(ctx, restHTTPClient.CloseIdleConnections) |
| // connection pool. Callers tear the client down by cancelling ctx, so we | ||
| // drain the pool's idle connections on cancellation to avoid leaking TCP | ||
| // connections and file descriptors across repeated client creations. | ||
| context.AfterFunc(ctx, restHTTPClient.CloseIdleConnections) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
backend/pkg/maestro/maestro_client.go:84
- On the error path, the REST
http.Clienthas already been created (and theAfterFuncregistered), but the function returns immediately without actively draining idle connections. If callers don’t cancelctxonNewClientfailure, this can keep idle conns/Fds around longer than intended. Consider callingrestHTTPClient.CloseIdleConnections()before returning the error (in addition to the ctx-cancellation hook).
grpcClient, err := newGRPCSourceWorkClient(ctx, maestroGRPCAPIEndpoint, restClient, maestroSourceID)
if err != nil {
return nil, utils.TrackError(fmt.Errorf("failed to create maestro grpc source work client: %w", err))
}
| // connection pool. Callers tear the client down by cancelling ctx, so we | ||
| // drain the pool's idle connections on cancellation to avoid leaking TCP | ||
| // connections and file descriptors across repeated client creations. | ||
| context.AfterFunc(ctx, restHTTPClient.CloseIdleConnections) |
There was a problem hiding this comment.
Thanks for flagging this. context.AfterFunc was added in Go 1.21, and this module targets Go 1.25.7 (see go 1.25.7 in backend/go.mod), so it's well within the supported toolchain and compiles fine — no compile-time break. The repo doesn't support Go < 1.21, so the goroutine fallback isn't needed here. Keeping context.AfterFunc as the idiomatic approach.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
backend/pkg/maestro/maestro_client.go:84
- If
newGRPCSourceWorkClient(...)returns an error,NewClientreturns immediately, but therestHTTPClientidle pool will only be drained whenctxis canceled. If any REST calls were made during GRPC client setup (or if this pattern is copied elsewhere), this can leave idle connections open longer than necessary. Consider capturing the stop func returned bycontext.AfterFuncand, on the error path, stopping the after-func and explicitly callingrestHTTPClient.CloseIdleConnections()before returning.
restClient, restHTTPClient := newRESTClient(maestroRESTAPIEndpoint)
// The REST client uses a cloned http.Transport, which owns its own
// connection pool. Callers tear the client down by cancelling ctx, so we
// drain the pool's idle connections on cancellation to avoid leaking TCP
// connections and file descriptors across repeated client creations.
context.AfterFunc(ctx, restHTTPClient.CloseIdleConnections)
grpcClient, err := newGRPCSourceWorkClient(ctx, maestroGRPCAPIEndpoint, restClient, maestroSourceID)
if err != nil {
return nil, utils.TrackError(fmt.Errorf("failed to create maestro grpc source work client: %w", err))
}
newRESTClient clones http.DefaultTransport, creating a per-client connection pool. The orphan/cleanup controllers rebuild Maestro clients every sync cycle and tore them down via context cancellation, which only stopped the gRPC client and never drained the REST transport's idle connections. This leaked TCP connections and file descriptors over time.
Return the underlying *http.Client from newRESTClient and register CloseIdleConnections via context.AfterFunc so the existing cancellation-based teardown also drains the pool.
What
Why
Testing
Special notes for your reviewer
PR Checklist