diff --git a/README.md b/README.md index 740f159..d25e4b2 100644 --- a/README.md +++ b/README.md @@ -307,10 +307,50 @@ system prompts; unknown versions error with the available list. **Long context.** `AgentOptions.Compactor` + `CompactionThreshold`. `glue.KeepRecentMessages(n)` is the zero-dependency default; -`SummarizingCompactor` is token-aware +`SummarizingCompactor` is token-aware and produces a **structured +state snapshot** (goal / constraints / progress / next steps, exact +paths and errors preserved) with a prompt-injection firewall, a +cumulative read/modified **file ledger** that survives repeated +compactions, splits that never sever a tool-call/result pair, and an +inflation guard; `KeepRecentTokens` keeps the recent tail by token +budget instead of message count ([ADR-0002](docs/adr/0002-context-compaction.md), [ADR-0007](docs/adr/0007-memory-layer.md)). +## Harness reliability + +The loop absorbs the failure shapes that waste agent turns — on by +default, each with an opt-out +([docs/coding-harness-roadmap.md](docs/coding-harness-roadmap.md) +records the analysis behind them): + +- **History hardening.** Every run repairs the transcript first: + dangling tool calls from an interrupted turn get synthesized error + results, orphaned results and empty turns are dropped, and turns + from a different model lose their thinking signatures — the things + providers reject with opaque 400s (`loop.HardenHistory`). +- **Classified retries.** Transient provider failures (429/5xx, + dropped streams) retry with backoff, honoring `Retry-After` / + Gemini `RetryInfo` hints; auth and invalid-request errors fail + fast. Context overflow surfaces as a typed `*loop.OverflowError` + that sessions answer by compacting once and retrying once. Opt out + with `RunRequest.Retry.Disabled` + ([ADR-0017](docs/adr/0017-loop-retry-overflow-recovery.md)). +- **Guardrails.** Repeating the same tool call with identical + arguments, or burning consecutive all-error tool rounds, first + draws a corrective message and then halts the run with a typed + error (`RunRequest.Guardrails`). +- **Stall recovery.** `AgentOptions.AutoContinue` nudges a model that + narrates "I will now…" and stops without acting — bounded to twice + per run; the `glue` binary enables it for providers that declare + the stall in the capability registry. + +**Per-model capabilities.** Providers declare harness-relevant facts +at registration — context window, parallel-tool safety, prompt +variant, auto-continue proneness — queried via +`providers.CapabilitiesFor(name)` instead of if-provider-name +switches. + ## Coding tools `tools/coding.Tools(...)` assembles a permission-gated local coding @@ -324,6 +364,26 @@ go run ./cmd/glue run --provider codex --coding --work . \ --prompt "Run the tests and fix the first failure." ``` +The bundle is built to tolerate model sloppiness instead of bouncing +it back: + +- **`edit_file` repairs near-miss matches** — a deterministic ladder + (whitespace → indentation, with the replacement re-indented to the + file's real indentation → smart-quote/dash folding → block-anchor) + plus over-escape repair; non-exact matches are named in the result, + and success echoes the updated lines so the model doesn't re-read + the file. CRLF and BOMs are preserved. +- **`shell_exec` keeps head *and* tail** of long output with an + omitted-bytes marker and the complete stream spooled to a named + temp file; timeouts keep the partial output. `read_file` pages by + line offset and says exactly how to continue. +- **The system prompt is assembled from the active toolset** + (`coding.SystemPrompt`): one line per registered tool plus their + usage guidelines, in a terse variant for frontier models and an + explicit variant for open-weight ones — it cannot drift from the + tools actually available. Tools contribute their own text via + `ToolSpec.PromptSnippet` / `PromptGuidelines`. + Side-effecting tools (`write_file`, `edit_file`, `shell_exec`) are permission-gated; reads and navigation are not. Execution defaults to the local process via `glue.Executor` — not a sandbox. Implement your diff --git a/docs/building-agents.md b/docs/building-agents.md index d84eb2d..c7b4f1f 100644 --- a/docs/building-agents.md +++ b/docs/building-agents.md @@ -387,6 +387,15 @@ and asserting on the `ToolResult` (including `IsError` for the recovery path). Keep live provider tests gated behind an env-var check and out of CI. +Two loop defaults to know when scripting *failures* in tests: transient- +looking provider errors ("rate limit", "503", dropped streams) are +retried with backoff, and pathological tool patterns (the same call +repeated, all-error rounds) trigger guardrails. When a fake provider +should fail *fast*, use error text that classifies as fatal (e.g. +"invalid request"); callers driving `loop.Run` directly can also pass +`Retry: loop.RetryPolicy{Disabled: true}` or +`Guardrails: loop.GuardrailPolicy{Disabled: true}`. + ## Going further You now have the full shape of a Glue agent. The advanced surfaces, each diff --git a/docs/coding-harness-roadmap.md b/docs/coding-harness-roadmap.md index 09d94dc..9fc2dcc 100644 --- a/docs/coding-harness-roadmap.md +++ b/docs/coding-harness-roadmap.md @@ -2,6 +2,11 @@ Date: 2026-06-09. Tracker: [#110](https://github.com/erain/glue/issues/110). +> **Status: shipped.** All eight queue items below landed the same day +> as [v1.13.0](https://github.com/erain/glue/releases/tag/v1.13.0) +> (PRs #346–#353). This document remains the record of the analysis +> and of what was deliberately deferred (the P3 notes). + This is a source-verified analysis of four reference coding-agent harnesses — [pi](https://github.com/earendil-works/pi), [Cline](https://github.com/cline/cline), @@ -251,16 +256,17 @@ fallback (v1.8.0). What Gemini CLI additionally does that we don't: ## Implementation order -Filed as one-issue-one-PR items under tracker #110: - -1. P0.1 edit_file repair ladder + instructive errors (+ escape repair) — [#338](https://github.com/erain/glue/issues/338). -2. P0.2 structured truncation for shell_exec / read_file — [#339](https://github.com/erain/glue/issues/339). -3. P0.3 history hardening before send/resume — [#340](https://github.com/erain/glue/issues/340). -4. P1.4 retry/overflow state machine — [#341](https://github.com/erain/glue/issues/341). -5. P1.6 compaction upgrade — [#342](https://github.com/erain/glue/issues/342). -6. P2.7 Gemini next-speaker check + stall recovery — [#343](https://github.com/erain/glue/issues/343). -7. P2.8 loop & mistake guardrails — [#344](https://github.com/erain/glue/issues/344). -8. P1.5 per-model capability registry + tool-owned prompt snippets — [#345](https://github.com/erain/glue/issues/345). +Filed as one-issue-one-PR items under tracker #110 — **all shipped in +v1.13.0**: + +1. ✅ P0.1 edit_file repair ladder + instructive errors (+ escape repair) — [#338](https://github.com/erain/glue/issues/338), PR #346. +2. ✅ P0.2 structured truncation for shell_exec / read_file — [#339](https://github.com/erain/glue/issues/339), PR #347. +3. ✅ P0.3 history hardening before send/resume — [#340](https://github.com/erain/glue/issues/340), PR #348. +4. ✅ P1.4 retry/overflow state machine — [#341](https://github.com/erain/glue/issues/341), PR #349, [ADR-0017](adr/0017-loop-retry-overflow-recovery.md). +5. ✅ P1.6 compaction upgrade — [#342](https://github.com/erain/glue/issues/342), PR #350. +6. ✅ P2.7 Gemini next-speaker check + stall recovery — [#343](https://github.com/erain/glue/issues/343), PR #351 (surrogate sub-item verified moot per #313). +7. ✅ P2.8 loop & mistake guardrails — [#344](https://github.com/erain/glue/issues/344), PR #352. +8. ✅ P1.5 per-model capability registry + tool-owned prompt snippets — [#345](https://github.com/erain/glue/issues/345), PR #353. Items 1–3 are pure-Go, dependency-free, and benefit every provider; they go first. Item 8 touches public API shape (registry), so it goes diff --git a/docs/design.md b/docs/design.md index b4cd7d0..5546679 100644 --- a/docs/design.md +++ b/docs/design.md @@ -179,12 +179,24 @@ until the provider stops or the context is canceled. tool-call IDs are normalized, and turns from a different model lose their thinking blocks and provider signatures — providers reject all of these with opaque 400s otherwise. -2. Ask the provider to stream an assistant response. +2. Ask the provider to stream an assistant response. Transient + failures (429/5xx, dropped streams) retry with classified backoff + under `RunRequest.Retry`; context overflow surfaces as a typed + `*loop.OverflowError` that the session layer answers by compacting + once and retrying once + ([ADR-0017](adr/0017-loop-retry-overflow-recovery.md)). 3. Emit text/tool/lifecycle events as provider events arrive. -4. Append the final assistant message to the transcript. +4. Append the final assistant message to the transcript. A turn that + narrates a future action without calling a tool gets a bounded + "Please continue." nudge when `RunRequest.AutoContinue` is set + (the Gemini narrate-then-stop stall). 5. If the assistant requested tools, execute the requested tools. 6. Append tool result messages in deterministic order. -7. Repeat from step 2 until no tool calls remain. +7. Guardrails inspect the round: repeated identical calls or + consecutive all-error rounds first draw a corrective injected + message, then halt the run with a typed error + (`RunRequest.Guardrails`). +8. Repeat from step 2 until no tool calls remain. The concrete P0 entry point is `loop.Run(ctx, loop.RunRequest)`. It returns a `loop.RunResult` containing both the full transcript and the messages produced by diff --git a/docs/project-plan.md b/docs/project-plan.md index 44b6428..0cc6aba 100644 --- a/docs/project-plan.md +++ b/docs/project-plan.md @@ -165,14 +165,19 @@ milestone). agent ships as its own product face with a homepage (, repo [glue-coding-agent-site](https://github.com/erain/glue-coding-agent-site)). - Next: **harness quality** — a source-verified analysis of pi, Cline, - Codex CLI, and Gemini CLI distilled into - [`coding-harness-roadmap.md`](coding-harness-roadmap.md) (edit-repair - ladder, structured truncation, history hardening, retry/overflow - recovery, compaction upgrades, Gemini loop polish), prioritized for - Gemini 3.x first and open-weight OpenRouter/NVIDIA models second. - Still planned beyond that: daemon goal endpoints, a sandboxed - `Executor` backend (container/VM), and TUI-on-`glue connect`. + The **harness-quality phase shipped as `v1.13.0`**: a source-verified + analysis of pi, Cline, Codex CLI, and Gemini CLI + ([`coding-harness-roadmap.md`](coding-harness-roadmap.md)) landed as + eight PRs — edit-repair ladder, structured truncation, history + hardening, retry/overflow recovery + ([ADR 0017](adr/0017-loop-retry-overflow-recovery.md)), compaction + upgrades, next-speaker stall recovery, loop guardrails, and the + per-model capability registry with tool-owned prompt assembly — + prioritized for Gemini 3.x first and open-weight OpenRouter/NVIDIA + models second. Still planned: daemon goal endpoints, a sandboxed + `Executor` backend (container/VM), TUI-on-`glue connect`, and the + roadmap's deferred P3 notes (XML tool-calling fallback, parallel-tool + read/write locking, goal-loop budget wind-down). - **Track B — Peggy.** Peggy v0.1–v0.5 plus dogfood hardening (M1–M6) shipped: single-prompt CLI, Telegram channel, durable sqlite+FTS5 memory with curated recall, opt-in coding tools, MCP servers, the diff --git a/docs/provider-guide.md b/docs/provider-guide.md index 45f51db..c91b31a 100644 --- a/docs/provider-guide.md +++ b/docs/provider-guide.md @@ -175,6 +175,36 @@ for the `glue.Provider` implementation. Reference upstream open-source CLIs as the protocol spec rather than copying code, and quarantine all vendor-specific headers and base URLs in the package. +## Registering with the driver registry + +Shipped providers register themselves in `init()` so callers can +construct them by name through `providers.New("")` (this is how +the `glue` binary's `--provider` flag works). Registration also +declares **capabilities** — harness-relevant facts the loop and CLIs +query through `providers.CapabilitiesFor(name)` instead of switching +on provider names: + +```go +func init() { + providers.Register("acme", providers.Factory{ + New: func() loop.Provider { return New(Options{}) }, + DefaultModel: DefaultModel, + EnvKey: "ACME_API_KEY", + Capabilities: providers.Capabilities{ + ContextWindow: 131_072, // default model's window; 0 = unknown + ParallelTools: false, // safe to run tool calls concurrently? + PromptVariant: "", // "" explicit (open-weight), "terse" frontier + AutoContinue: false, // prone to the narrate-then-stop stall? + }, + }) +} +``` + +Declare conservatively: the zero value means "assume nothing", and +consumers treat unknown capabilities as the safe default. Out-of-tree +providers do not have to register at all — construct them directly and +pass them to `glue.NewAgent`. + ## Common mistakes - **Aliasing the same `Message` across `Start` and `Done`.** The loop