Skip to content

feat(check-inbox): add watermark to prevent stale message re-delivery#171

Closed
tayhiga-prog wants to merge 2 commits into
fujibee:mainfrom
tayhiga-prog:fix/check-inbox-watermark
Closed

feat(check-inbox): add watermark to prevent stale message re-delivery#171
tayhiga-prog wants to merge 2 commits into
fujibee:mainfrom
tayhiga-prog:fix/check-inbox-watermark

Conversation

@tayhiga-prog

@tayhiga-prog tayhiga-prog commented Jun 20, 2026

Copy link
Copy Markdown

Summary

  • Introduces a per-agent watermark file to track the highest delivered message id in check-inbox.sh
  • Prevents unread messages that accumulated during agent downtime from being replayed on reconnect
  • Aligns turn-mode delivery (check-inbox.sh) with monitor-mode delivery (watch.sh), which already uses a watermark

Problem

check-inbox.sh filters unread messages with WHERE read_at IS NULL and no lower-bound on id. For agents using turn delivery mode (Antigravity / Gemini PostToolUse, Codex Stop hook), the hook fires only when the agent is active. Any messages sent while the agent was offline accumulate in the database with read_at IS NULL. When the agent next becomes active and the hook fires, all accumulated messages are delivered at once, no matter how old they are.

This means an agent can receive and act on instructions that were relevant to a session that ended hours or days ago, leading to unintended behaviour.

watch.sh (used in monitor mode) already avoids this by setting a watermark to MAX(id) at startup and only streaming messages with id > watermark. check-inbox.sh had no equivalent protection.

Solution

Add a per-agent watermark persisted at SKILL_DIR/run/check-inbox.agent.watermark:

  • Initialisation: on first invocation (no watermark file), seed the watermark from MAX(id) WHERE to_agent='$AGENT' AND read_at IS NOT NULL — the latest already-read message for this agent. This ensures that current unread messages are delivered on the first check, unlike watch.sh which seeds at MAX(id) (monitor mode only cares about post-attach messages). This is an intentional divergence: turn-mode's first check legitimately needs to drain pending unread.
  • Query filter: append AND id > LOOP_WM to both the SELECT and UPDATE statements, where LOOP_WM is the watermark snapshotted before the team loop begins.
  • Watermark advance: after all teams are processed, query MAX(id) WHERE to_agent = AGENT AND id > LOOP_WM across all teams and persist the result. Advancing once after the loop (rather than per-team) ensures agents belonging to multiple teams never skip messages whose ids fall between two teams maxima.
  • Safety valve: if the stored watermark exceeds the DB's current MAX(id) WHERE to_agent='$AGENT' (DB wiped and recreated), reset to that agent-scoped max to prevent permanent message silence. Both the seed and the safety valve are scoped to to_agent to avoid being influenced by other agents' message ids.
  • Windows CR strip: agmsg_sqlite() (the on-disk DB wrapper) now pipes through tr -d '\r', mirroring agmsg_sqlite_mem(). On Windows, sqlite3.exe emits \r\n; without stripping, the watermark integer validation rejects the value and silently resets to 0, making the watermark inert.

No other files are modified. Existing cooldown, actas-lock, and monitor-deferral logic is unchanged.

Behaviour change

The first turn-mode check now delivers current unread messages (seeded from the latest read id, not the global max). This differs from the original implementation which seeded at MAX(id) and delivered nothing on the first check. The new behaviour matches what users and the existing CI tests expect: send a message, start the agent, first check delivers it.

Testing

Basic replay prevention

  1. Join an agent to a team with turn delivery mode.
  2. While the agent is not running, send several messages to it via send.sh.
  3. Start the agent and trigger check-inbox.sh manually.
  4. Verify: the watermark file is created and current unread messages are delivered on the first check.
  5. Send a new message while the agent is active.
  6. Trigger check-inbox.sh again; verify the new message is delivered and the watermark advances.

Multi-team correctness

  1. Register the same agent in two teams.
  2. Send messages to both teams (both above the current watermark).
  3. Invoke check-inbox.sh; verify both messages are delivered in the same run.

Safety valve (DB recreation)

  1. Write a watermark value larger than the current DB MAX(id).
  2. Invoke check-inbox.sh; verify the watermark is reset and delivery resumes normally.

Per-agent isolation

  1. Register two agents (A and B) in the same team.
  2. Send a message to A only.
  3. Verify B's watermark is not influenced by A's message id.

@fujibee fujibee left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tayhiga-prog

Nice change — aligning turn-mode check-inbox.sh with the watermark model watch.sh already uses is the right call, and the implementation is careful: integer validation on every read, the DB-shrink clamp, and the single LOOP_WM snapshot reused for the SELECT, the mark-read UPDATE, and the post-loop advance. One thing worth changing before merge, plus two notes.

1. Seed the watermark per-agent, not globally (can drop this agent's mail).
The initializer uses the global max:

WATERMARK="$(agmsg_sqlite "$DB" "SELECT COALESCE(MAX(id), 0) FROM messages;" ...)"

but watch.sh scopes its watermark to the subscription (WHERE $WHERE_PAIRS), and the post-loop advance here already scopes to WHERE to_agent='$AGENT'. With the global MAX(id), if another agent has posted newer messages, this agent's first check seeds its watermark above its own unread, and those messages are then skipped forever (the mark-read UPDATE is also gated on id > LOOP_WM, so they never clear either). Scoping the seed to WHERE to_agent='$AGENT' matches both watch.sh and your own advance step.

2. Windows: the watermark silently no-ops under Git Bash.
agmsg_sqlite (the on-disk store wrapper) doesn't strip the trailing CR that sqlite3.exe emits on Windows, so MAX(id) comes back as "<n>\r", the *[!0-9]* guard rejects it, and the watermark resets to 0 → every check delivers everything. Not a crash, just inert on Windows. A tr -d '\r' on the integer reads (or routing them through a CR-safe read) would let it engage there too. Context: #180 added CR-stripping for the in-memory path but not this store wrapper.

3. Behaviour note (doc-only).
After this lands, the first turn-mode check seeds the watermark and delivers nothing — intended, and consistent with monitor mode, but it's a change from "first check drains the backlog." Worth a line in the PR body / docs so a user who sends a message and then starts an agent isn't surprised by the empty first turn.

Happy to re-review once (1) is in. Thanks for tightening this up! 🙏

@fujibee

fujibee commented Jun 21, 2026

Copy link
Copy Markdown
Owner

@tayhiga-prog — CI's in now, and it surfaces exactly the seeding concern from point (1): the required bats (ubuntu/macos) legs fail on three check-inbox tests —

  • check-inbox (copilot): emits decision=block JSON when new messages arrive
  • check-inbox: still delivers when the lock is owned by this session
  • storage: stop-hook delivery works when the default db dir is absent but the override is populated

All three send a message and then expect the next check-inbox to deliver it. With the watermark seeded at MAX(id) on the first run, that just-sent message is id <= watermark and gets skipped — so the deliver-on-check contract those tests (and users) rely on is broken on the very first check.

This is the crux: watch.sh can seed at MAX because monitor mode is real-time and only cares about messages after it attached, but turn mode's first check legitimately needs to drain the currently-pending unread. A couple of directions:

  • Seed from already-read state rather than all messages — e.g. MAX(id) of rows where read_at IS NOT NULL for this agent (deliver current unread once, then watermark forward), or simply seed at 0 and let the first check drain + advance.
  • If the goal is specifically to drop stale mail, an age-based cutoff (created_at older than N) expresses that intent more directly than an id watermark.

Either way, scoping to to_agent='$AGENT' (point 1) still applies. Once the seeding is reworked the three tests should go green again. Glad to look at whichever direction you pick.

tayhiga-prog and others added 2 commits June 22, 2026 17:54
Address review feedback from fujibee on PR fujibee#171:

1. Seed the watermark from the latest *read* message for this agent
   (WHERE to_agent='$AGENT' AND read_at IS NOT NULL) instead of the
   global MAX(id). This ensures the first turn-mode check drains
   current unread messages, fixing the three CI test failures where
   a just-sent message was skipped. The safety-valve clamp is also
   scoped to to_agent.

2. Add `| tr -d '\r'` to agmsg_sqlite() (on-disk wrapper), mirroring
   agmsg_sqlite_mem(). On Windows, sqlite3.exe emits \r\n; without
   stripping, the watermark integer validation rejects the value and
   the watermark silently resets to 0 on every check.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@fujibee

fujibee commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Superseded by #199, which rebases this onto current main with the commits and authorship preserved — the watermark review is happening there. Closing in favor of #199. Thanks @tayhiga-prog.

@fujibee fujibee closed this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants