fix(mcp): keep strong reference to episode queue worker task to prevent silent GC#1575
Open
rneissl wants to merge 1 commit into
Open
fix(mcp): keep strong reference to episode queue worker task to prevent silent GC#1575rneissl wants to merge 1 commit into
rneissl wants to merge 1 commit into
Conversation
asyncio.create_task() results must be referenced or the event loop's weak reference allows the GC to collect the worker mid-execution. Under streamable-http transport this manifests as add_memory queueing episodes that are never processed, with no error logged. Store the task in _worker_tasks and drop the reference when the worker exits.
Contributor
|
I have read the CLA Document and I hereby sign the CLA behalf on myself, e-mail: [email protected] or I have read the CLA Document and I hereby sign the CLA behalf of my company, e-mail: [email protected] Signature is valid for 6 months. This bot will be retriggered when the Contributor License Agreement comment has been provided. Posted by the CLA Assistant Lite bot. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: Fix queue worker garbage-collection under streamable-http transport
What
Stores a strong reference to the asyncio.Task created by
QueueService.add_episode_taskso the event loop can't garbage-collect the worker mid-execution.Why
Per Python asyncio docs:
Without anchoring, under high-GC-pressure conditions (streamable-http request handling) the worker task can be collected before its first
await self._episode_queues[group_id].get()suspends it. Result:add_memoryqueues episodes but no worker ever processes them — silent failure, no error logs.Reported in #1574. Fixes #1574.
How
Three small changes in
mcp_server/src/services/queue_service.py:__init__: addself._worker_tasks: dict[str, asyncio.Task] = {}— strong-ref storageadd_episode_task: store the task:self._worker_tasks[group_id] = asyncio.create_task(...)_process_episode_queuefinally-block:self._worker_tasks.pop(group_id, None)— clean up reference when worker exitsVerification
Tested on
zepai/knowledge-graph-mcp:1.0.2-standalonewith FalkorDB backend, streamable-http transport, in a Kubernetes Deployment.Before patch:
add_memoryreturns "queued" but no log activity, no LLM calls, no new graph nodes, no error.After patch:
Graph: Episodic node count went from 569 → 570, new entities and edges extracted from the test episode.
Backwards compatibility
None broken. Public API of
QueueServiceunchanged. Only adds an internal_worker_tasksdict and stores/cleans up references inside existing methods. Behavior under stdio transport is identical to before (the GC race didn't manifest there because handler completion didn't trigger the same GC pattern).Tests
The MCP server doesn't ship asyncio-stress tests for the queue worker today. Adding a regression test would require simulating GC pressure, which is non-trivial. Manual reproduction steps are in the linked issue. Happy to add a test if maintainers point at a similar testing pattern in the repo.
Patch developed and verified by Roland Neissl (BAB IKT, Austria) while migrating the graphiti-mcp service from the ToolHive operator to a plain GitOps deployment. Originally verified on
zepai/knowledge-graph-mcp:1.0.2-standalone(2026-05-10), re-based and re-verified againstmainon 2026-06-11.