buffer task-started events that arrive before task-sent by tulioz · Pull Request #31 · aviator-co/celerymon

tulioz · 2026-04-29T04:27:51Z

celery's task-sent and task-started ride independent broker connections with no ordering guarantee, so a fast pickup can flip their order. those tasks were leaking to the in-flight TTL and silently dropping queue_wait samples.

aviator-app · 2026-04-29T04:27:54Z

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged using Aviator.

See the real-time status of this PR on the Aviator webapp.

Use the Aviator Chrome Extension to see the status of your PR within GitHub.

gemini-code-assist

Code Review

This pull request introduces a buffer to handle out-of-order Celery events where a 'task-started' event arrives before its corresponding 'task-sent' event. By storing these 'orphan' started events in an OrderedDict, the system can correctly calculate queue wait times when the 'task-sent' event eventually arrives, preventing potential memory leaks in the in-flight cache. Feedback suggests increasing the TTL for these orphan events to 60 seconds to better accommodate clock skew between workers and the monitor. Additionally, it is recommended to implement thread synchronization (locks) when accessing shared state like the orphan buffer and in-flight cache to avoid race conditions between the event receiver and the pruning threads.

gemini-code-assist · 2026-04-29T04:30:14Z


 _WORKER_HEARTBEAT_TTL_SEC = 120
 _PRUNE_INTERVAL_SEC = 30
+_ORPHAN_STARTED_TTL_SEC = 10


A 10-second TTL might be too aggressive given that the pruning logic in _prune compares the worker-generated event timestamp with the monitor's local time. If a worker's clock is behind the monitor's clock by more than 10 seconds, its task-started events will be pruned immediately upon the next _prune cycle, defeating the purpose of the buffer. Since the cache size is already capped at 10,000 items (which has a negligible memory footprint), consider increasing this TTL to 60 or 120 seconds to better account for clock skew and broker latency.

Suggested change

_ORPHAN_STARTED_TTL_SEC = 10

_ORPHAN_STARTED_TTL_SEC = 60

gemini-code-assist · 2026-04-29T04:30:15Z

+            while len(self._orphan_started) > _ORPHAN_STARTED_CACHE_SIZE:
+                self._orphan_started.popitem(last=False)


While OrderedDict is generally thread-safe for single operations in CPython, this compound check-and-pop logic is not atomic. Since _record_task_started is called from the event receiver thread and _prune runs in a separate timer thread, there is a small race condition where the size could change between the len() check and popitem(). Given the existing patterns in this class, this might be acceptable, but for better robustness, consider wrapping accesses to shared state like _orphan_started and _in_flight with a threading.Lock.

buffer task-started events that arrive before task-sent

9e35ae6

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

jainankit approved these changes Apr 29, 2026

View reviewed changes

aviator-app Bot merged commit 7c1f185 into main Apr 29, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

buffer task-started events that arrive before task-sent#31

buffer task-started events that arrive before task-sent#31
aviator-app[bot] merged 1 commit into
mainfrom
orphan-started-buffer

tulioz commented Apr 29, 2026

Uh oh!

aviator-app Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		while len(self._orphan_started) > _ORPHAN_STARTED_CACHE_SIZE:
		self._orphan_started.popitem(last=False)

Uh oh!

Conversation

tulioz commented Apr 29, 2026

Uh oh!

aviator-app Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current Aviator status

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aviator-app Bot commented Apr 29, 2026 •

edited

Loading