fix(cursor): judge cursor staleness by write-time, not message-time by brandwe · Pull Request #96 · microsoft/entrabot

brandwe · 2026-06-27T01:08:22Z

Problem

The background Teams poll re-delivered weeks-old messages as if they were new — a flood on every MCP restart. Diagnosis showed ~50 watched-chat cursors "frozen" at late-May timestamps while a freshly-sent message still arrived correctly.

Root cause

chat_cursors.is_stale() measured cursor age from last_ts (the newest-message watermark) instead of last_written_at (when the cursor was actually persisted).

Any chat idle longer than the 24h cap therefore had a "stale" cursor even though it had just been written. _register_watched_chat discarded it and re-ran _bootstrap_chat on every restart — and _bootstrap_chat deliberately leaves the newest message unseen so it fires once. So each re-bootstrap re-pushed that chat's weeks-old newest message as if it were live. With ~50 idle chats and frequent restarts (amplified by the open MCP-disconnect issue) this produced a continuous flood of stale replays.

This is a follow-up to the original issue #17 cursor-persistence work — persistence was working fine; the staleness check keyed off the wrong field.

Fix

is_stale() now takes last_written_at. Both call sites pass the write timestamp:

mcp_server._register_watched_chat — the rehydrate-vs-bootstrap decision.
body_bootstrap._cursor_freshness — the cursors_stale telemetry, which was miscounting for the same reason (this caller was not in the original investigation; found by grep).

The 24h cap still does its real job: if the server was down longer than the cap, messages may have been missed and the seen-set is untrustworthy, so we re-baseline via a fresh bootstrap.

Tests (TDD)

New regression test test_idle_chat_recent_write_rehydrates_despite_old_last_ts — red before the fix, green after.
Two existing tests that encoded the buggy behavior corrected: they saved cursors with an old last_ts but save_cursor stamps last_written_at=now, so under the correct logic they were actually fresh. Rewritten to craft genuinely-stale cursors by writing an old last_written_at directly.
Full suite: 1527 passed, 8 skipped. ruff check . clean.

Deployment note

Takes effect on the next entrabot MCP restart. Idle chats will then rehydrate (judged by write-time) instead of re-bootstrapping, so the perpetual every-restart flood stops. Worst case, a chat whose cursor was saved mid-bootstrap fires its single newest message one last time, then goes quiet — correct-by-design, not the bug.

🤖 Generated with Claude Code

The background Teams poll re-delivered weeks-old messages on every MCP restart. Root cause: chat_cursors.is_stale() measured age from last_ts (the newest-MESSAGE watermark) instead of last_written_at (when the cursor was persisted). Any chat idle longer than the 24h cap therefore had a "stale" cursor even when it had just been written, so _register_watched_chat discarded it and re-ran _bootstrap_chat on every restart. _bootstrap_chat deliberately leaves the newest message unseen so it fires once -- so each re-bootstrap re-pushed that chat's weeks-old newest message as if it were live. With ~50 idle chats and frequent restarts (amplified by the open MCP-disconnect issue) this produced a flood of stale replays. Fix: is_stale() now takes last_written_at. Both call sites pass the write timestamp: - mcp_server._register_watched_chat (rehydrate-vs-bootstrap decision) - body_bootstrap._cursor_freshness (cursors_stale telemetry, which was itself miscounting for the same reason) The 24h cap still does its real job: if the server was actually down longer than the cap, messages may have been missed and the seen-set is untrustworthy, so we re-baseline via a fresh bootstrap. TDD: added test_idle_chat_recent_write_rehydrates_despite_old_last_ts (red before the fix, green after). Corrected two existing tests that encoded the buggy behavior by crafting cursors with an old last_written_at directly. Full suite: 1527 passed, ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cursor): judge cursor staleness by write-time, not message-time#96

fix(cursor): judge cursor staleness by write-time, not message-time#96
brandwe wants to merge 1 commit into
mainfrom
fix/cursor-staleness-uses-write-time

brandwe commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

brandwe commented Jun 27, 2026

Problem

Root cause

Fix

Tests (TDD)

Deployment note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant