review: drip-334 batch 1 — opencode+codex (3 PRs)

Bojun-Vvibe · Bojun-Vvibe · commit 06e623924bbe · 2026-05-04T16:21:04.000+08:00
anomalyco/opencode#25683 (acp drain end_turn), #25680 (hashline schema+args), openai/codex#20912 (synchronize agent control tools).
diff --git a/reviews/drip-334/openai-codex-20912.md b/reviews/drip-334/openai-codex-20912.md
@@ -0,0 +1,29 @@
+# openai/codex#20912 — synchronize agent control tools
+
+- **Head SHA:** `84a34f29926d77408859f1fab92c824dba3a5ad1`
+- **Author:** internal contributor
+- **Size:** +1615 / −100, 19 files (codex-rs/core agent + tools)
+
+## Summary
+
+Keeps the agent-control tool surface aligned across root agents, forked agents, and watchdog helper forks. Motivation: stable model-visible tool list preserves prompt-cache sharing across agent forks. Restores watchdog helper control tools as eager tools, preserves parent-messaging support needed by watchdogs, all without changing the semantics of existing tools.
+
+## Specific citations
+
+- `core/src/agent/control.rs:91-101` — new `WatchdogParentCompactionResult` enum with three variants: `NotWatchdogHelper`, `ParentBusy { parent_thread_id }`, and `Submitted { parent_thread_id, submission_id }`. Clean tagged result type, much better than a multi-bool return.
+- `core/src/agent/control.rs:61-62` — two new constants `WATCHDOG_BOOT_TOOL_SEARCH_CALL_ID` and `WATCHDOG_BOOT_LIST_AGENTS_CALL_ID` with `synthetic_*` prefixes. Worth verifying nothing in the rollout-replay path treats `synthetic_*` as a reserved namespace; if not, document it as such.
+- `core/src/agent/control.rs:174-209` — `synthetic_watchdog_tool_search_items()` constructs synthetic `RolloutItem::ResponseItem(ToolSearchCall)` + `ToolSearchOutput` pairs from `create_compact_parent_context_tool` + `create_watchdog_close_self_tool` + `create_watchdog_snooze_tool`. Bails with empty `Vec` if `serde_json::to_value(namespace)` fails (`:179-181`) — silent failure mode; a `tracing::warn!` would help if this ever trips in production.
+- `core/src/agent/control.rs:167-172` — `unix_timestamp_seconds()` returns `0` on `duration_since(UNIX_EPOCH)` error via `.unwrap_or_default()`. System clock pre-1970 is exotic but the silent zero could be misleading in logs; consider `tracing::warn!` on the err arm.
+- `core/src/agent/control_tests.rs`, `core/src/tools/handlers/multi_agents_tests.rs` — both updated. Couldn't audit the full 1500+ diff in this slice. The validation command `CODEX_SKIP_VENDORED_BWRAP=1 cargo test -p codex-core thread_rollout_truncation` covers the rollout-truncation interaction, which is the highest-risk path given the synthetic ResponseItems.
+- `core/src/tools/spec.rs`, `tools/src/agent_tool.rs` — modify the tool catalog. These are the contact surface with prompt-cache sharing; changes here are the load-bearing part of the PR's stated motivation.
+
+## Verdict: `needs-discussion`
+
+The mechanism (synthetic tool-search rollout items to ensure consistent tool visibility for prompt-cache stability) is reasonable but introduces non-trivial new state. Concerns to discuss before merge:
+
+1. **Synthetic rollout items semantics.** Inserting `ToolSearchCall` / `ToolSearchOutput` rollout items that the model never actually executed creates a divergence between the rollout-as-recording and the rollout-as-replay. Replaying these synthetics on a fresh thread will *also* synthesize them, but if the synthesis logic ever changes, prompt-cache invalidation could cascade unexpectedly. Worth documenting that the synthetic prefix is a stable contract.
+2. **1615 added LOC across 19 files in one PR is a lot for "synchronize agent control tools"**. Specifically, `multi_agents_v2/{followup_task,message_tool,send_message,spawn}.rs` and the `watchdog_self_close.rs` / `compact_parent_context.rs` handlers are all touched. The PR description doesn't enumerate which behaviors changed in each. Recommend either (a) splitting into a synthetic-rollout-items PR, a tools-spec sync PR, and a watchdog helper PR, or (b) expanding the description to cover what each file's change is doing.
+3. **`unwrap_or_default()` on `duration_since`** returning 0 silently — minor, but consider logging.
+4. **No assertion that the synthetic items are skipped from analytics / token-counting.** If they're counted toward token budgets they'll inflate numbers; verify in `multi_agents_tests.rs`.
+
+The change appears intentional and the tests look thorough at the file-name level; it's the scope and the synthetic-replay contract that need explicit sign-off.
diff --git a/reviews/drip-334/sst-opencode-25680.md b/reviews/drip-334/sst-opencode-25680.md
@@ -0,0 +1,27 @@
+# sst/opencode#25680 — fix: propagate hashline tool.execute.before args + ref params
+
+- **Head SHA:** `8cc7db039417c31a92699d8732f65c5ceffd1560`
+- **Author:** community
+- **Size:** +20 / −2, 1 file (`packages/opencode/src/session/prompt.ts`)
+- **Closes:** #25674
+
+## Summary
+
+Two coupled fixes in `prompt.ts`:
+
+1. Inject hashline ref-based edit parameters (`startRef`, `endRef`, `ref`, `operation`, `content`, `fileRev`, `expectedFileHash`, `safeReapply`, `operations`) into the `edit` tool's JSON Schema so models with strict schema validation (e.g. DeepSeek) can use them.
+2. Make `tool.execute.before` plugin hook output (`{ args }`) actually flow into the subsequent `item.execute(args, ctx)` call, so plugin mutations of `args` take effect.
+
+## Specific citations
+
+- `prompt.ts:421-438` — schema-injection block guarded by `if (item.id === "edit" && schema?.properties)`. Adds 9 properties plus optional `oldString`/`newString` removal from `schema.required` (`:436-438`). Mutation is in place on the schema returned from `ProviderTransform.schema(...)` — relies on that returning a fresh object, which a quick check is warranted on (see nit below).
+- `prompt.ts:445-449` — introduces `const hookOutput = { args }` and passes it as the third arg to `plugin.trigger("tool.execute.before", ..., hookOutput)`, then `item.execute(hookOutput.args, ctx)`. This is the actual behavior fix: previously a plugin's mutation of `hookOutput.args` was discarded because `item.execute(args, ctx)` re-read the original `args`.
+
+## Verdict: `request-changes`
+
+The intent is right, but two issues:
+
+1. **Hardcoded schema-injection bypasses extensibility.** Hashline-specific properties are inlined into a generic `if (item.id === "edit")` branch in core `prompt.ts`. Any future tool with similar needs will require the same surgical edit. A cleaner shape: have the `edit` tool itself (or a `schemaExtensions` registry on the tool definition) own these properties so the prompt builder doesn't need to know about hashline. This also fixes the silent coupling where `prompt.ts` and the actual hashline edit implementation must agree on parameter names.
+2. **Schema mutation may corrupt cache.** `schema.properties.content = {...}` and `schema.required = schema.required.filter(...)` mutate the object returned from `ProviderTransform.schema(input.model, EffectZod.toJsonSchema(item.parameters))`. If `EffectZod.toJsonSchema` or `ProviderTransform.schema` ever memoizes, subsequent calls will see the augmented schema. Recommend `const schema = structuredClone(...)` before the `if` block, or build a new `properties`/`required` rather than mutating in place.
+
+Plugin-args propagation fix (`:445-449`) is correct on its own and could be split into its own PR.
diff --git a/reviews/drip-334/sst-opencode-25683.md b/reviews/drip-334/sst-opencode-25683.md
@@ -0,0 +1,29 @@
+# sst/opencode#25683 — fix(acp): drain message events before returning end_turn
+
+- **Head SHA:** `edae14d0eb6655d45bef212e7d4b48439b5188ad`
+- **Author:** community
+- **Size:** +53 / −0, 1 file (`packages/opencode/src/acp/agent.ts`)
+- **Closes:** #25421
+
+## Summary
+
+`Agent.prompt()` was returning `stopReason: "end_turn"` as soon as `sdk.session.prompt()` resolved, but trailing `message.part.delta` events were still queued in the SDK event stream. Those deltas were forwarded as `agent_message_chunk` after the RPC reply landed — a wire-level ACP protocol violation where text appeared after `end_turn`. The fix tracks per-message completion via `message.updated` and awaits it (with a 5s timeout) before returning.
+
+## Specific citations
+
+- `agent.ts:150-151` — adds `messageCompletionResolvers: Map<string, () => void>` and `completedAssistantMessageIds: Set<string>`. State on the Agent instance, no per-call cleanup besides the `delete()` calls in resolver/finish — see risk below.
+- `agent.ts:275-285` — new `message.updated` case that, when `info.role === "assistant"` and `info.time.completed !== undefined`, marks the id complete and fires any pending resolver. Correct event to gate on; matches SDK semantics.
+- `agent.ts:549-575` — `waitForMessageCompletion(messageId, timeoutMs)` returns immediately if already completed, else races a one-shot resolver against `setTimeout(finish, timeoutMs)`. The internal `settled` guard prevents double-resolve.
+- `agent.ts:1527-1530` and `agent.ts:1556-1559` — two prompt-return sites both call `await this.waitForMessageCompletion(msg.id, 5000)` before `sendUsageUpdate` / returning. Good, both branches are covered.
+
+## Verdict: `merge-after-nits`
+
+The core mechanism is sound and the comment in `agent.ts:546-558` is unusually clear about *why* the drain is necessary (sequential `for await` in `runEventSubscription`).
+
+Nits worth addressing before merge:
+
+1. **Unbounded growth of `completedAssistantMessageIds`.** The set is added to on every assistant `message.updated` with `time.completed` (`agent.ts:278`) and never cleared. Long-lived agent processes will accumulate one entry per turn forever. Either prune after `waitForMessageCompletion` consumes it, or bound it (e.g., LRU/recent ids only) since once `prompt()` has drained for a given `msg.id` the entry is dead weight.
+2. **5000 ms timeout is silent on miss.** If the SDK never emits a completion event (regression, server bug), `prompt()` returns successfully with `end_turn` after a 5s stall and zero log signal. A `log.warn("waitForMessageCompletion timeout", { messageId })` inside `finish()` when triggered by the timer would make this debuggable in the field.
+3. **No test.** The PR description references manual repro but no unit/integration test guards regression. A fake event stream that emits `message.part.delta` then `message.updated` could assert ordering against a recording `connection`.
+
+None of these block correctness; they're hardening. Mechanism itself is the right fix.