Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 61 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,10 +307,50 @@ system prompts; unknown versions error with the available list.

**Long context.** `AgentOptions.Compactor` + `CompactionThreshold`.
`glue.KeepRecentMessages(n)` is the zero-dependency default;
`SummarizingCompactor` is token-aware
`SummarizingCompactor` is token-aware and produces a **structured
state snapshot** (goal / constraints / progress / next steps, exact
paths and errors preserved) with a prompt-injection firewall, a
cumulative read/modified **file ledger** that survives repeated
compactions, splits that never sever a tool-call/result pair, and an
inflation guard; `KeepRecentTokens` keeps the recent tail by token
budget instead of message count
([ADR-0002](docs/adr/0002-context-compaction.md),
[ADR-0007](docs/adr/0007-memory-layer.md)).

## Harness reliability

The loop absorbs the failure shapes that waste agent turns — on by
default, each with an opt-out
([docs/coding-harness-roadmap.md](docs/coding-harness-roadmap.md)
records the analysis behind them):

- **History hardening.** Every run repairs the transcript first:
dangling tool calls from an interrupted turn get synthesized error
results, orphaned results and empty turns are dropped, and turns
from a different model lose their thinking signatures — the things
providers reject with opaque 400s (`loop.HardenHistory`).
- **Classified retries.** Transient provider failures (429/5xx,
dropped streams) retry with backoff, honoring `Retry-After` /
Gemini `RetryInfo` hints; auth and invalid-request errors fail
fast. Context overflow surfaces as a typed `*loop.OverflowError`
that sessions answer by compacting once and retrying once. Opt out
with `RunRequest.Retry.Disabled`
([ADR-0017](docs/adr/0017-loop-retry-overflow-recovery.md)).
- **Guardrails.** Repeating the same tool call with identical
arguments, or burning consecutive all-error tool rounds, first
draws a corrective message and then halts the run with a typed
error (`RunRequest.Guardrails`).
- **Stall recovery.** `AgentOptions.AutoContinue` nudges a model that
narrates "I will now…" and stops without acting — bounded to twice
per run; the `glue` binary enables it for providers that declare
the stall in the capability registry.

**Per-model capabilities.** Providers declare harness-relevant facts
at registration — context window, parallel-tool safety, prompt
variant, auto-continue proneness — queried via
`providers.CapabilitiesFor(name)` instead of if-provider-name
switches.

## Coding tools

`tools/coding.Tools(...)` assembles a permission-gated local coding
Expand All @@ -324,6 +364,26 @@ go run ./cmd/glue run --provider codex --coding --work . \
--prompt "Run the tests and fix the first failure."
```

The bundle is built to tolerate model sloppiness instead of bouncing
it back:

- **`edit_file` repairs near-miss matches** — a deterministic ladder
(whitespace → indentation, with the replacement re-indented to the
file's real indentation → smart-quote/dash folding → block-anchor)
plus over-escape repair; non-exact matches are named in the result,
and success echoes the updated lines so the model doesn't re-read
the file. CRLF and BOMs are preserved.
- **`shell_exec` keeps head *and* tail** of long output with an
omitted-bytes marker and the complete stream spooled to a named
temp file; timeouts keep the partial output. `read_file` pages by
line offset and says exactly how to continue.
- **The system prompt is assembled from the active toolset**
(`coding.SystemPrompt`): one line per registered tool plus their
usage guidelines, in a terse variant for frontier models and an
explicit variant for open-weight ones — it cannot drift from the
tools actually available. Tools contribute their own text via
`ToolSpec.PromptSnippet` / `PromptGuidelines`.

Side-effecting tools (`write_file`, `edit_file`, `shell_exec`) are
permission-gated; reads and navigation are not. Execution defaults to
the local process via `glue.Executor` — not a sandbox. Implement your
Expand Down
9 changes: 9 additions & 0 deletions docs/building-agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,15 @@ and asserting on the `ToolResult` (including `IsError` for the recovery
path). Keep live provider tests gated behind an env-var check and out of
CI.

Two loop defaults to know when scripting *failures* in tests: transient-
looking provider errors ("rate limit", "503", dropped streams) are
retried with backoff, and pathological tool patterns (the same call
repeated, all-error rounds) trigger guardrails. When a fake provider
should fail *fast*, use error text that classifies as fatal (e.g.
"invalid request"); callers driving `loop.Run` directly can also pass
`Retry: loop.RetryPolicy{Disabled: true}` or
`Guardrails: loop.GuardrailPolicy{Disabled: true}`.

## Going further

You now have the full shape of a Glue agent. The advanced surfaces, each
Expand Down
26 changes: 16 additions & 10 deletions docs/coding-harness-roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

Date: 2026-06-09. Tracker: [#110](https://github.com/erain/glue/issues/110).

> **Status: shipped.** All eight queue items below landed the same day
> as [v1.13.0](https://github.com/erain/glue/releases/tag/v1.13.0)
> (PRs #346–#353). This document remains the record of the analysis
> and of what was deliberately deferred (the P3 notes).

This is a source-verified analysis of four reference coding-agent
harnesses — [pi](https://github.com/earendil-works/pi),
[Cline](https://github.com/cline/cline),
Expand Down Expand Up @@ -251,16 +256,17 @@ fallback (v1.8.0). What Gemini CLI additionally does that we don't:

## Implementation order

Filed as one-issue-one-PR items under tracker #110:

1. P0.1 edit_file repair ladder + instructive errors (+ escape repair) — [#338](https://github.com/erain/glue/issues/338).
2. P0.2 structured truncation for shell_exec / read_file — [#339](https://github.com/erain/glue/issues/339).
3. P0.3 history hardening before send/resume — [#340](https://github.com/erain/glue/issues/340).
4. P1.4 retry/overflow state machine — [#341](https://github.com/erain/glue/issues/341).
5. P1.6 compaction upgrade — [#342](https://github.com/erain/glue/issues/342).
6. P2.7 Gemini next-speaker check + stall recovery — [#343](https://github.com/erain/glue/issues/343).
7. P2.8 loop & mistake guardrails — [#344](https://github.com/erain/glue/issues/344).
8. P1.5 per-model capability registry + tool-owned prompt snippets — [#345](https://github.com/erain/glue/issues/345).
Filed as one-issue-one-PR items under tracker #110 — **all shipped in
v1.13.0**:

1. ✅ P0.1 edit_file repair ladder + instructive errors (+ escape repair) — [#338](https://github.com/erain/glue/issues/338), PR #346.
2. ✅ P0.2 structured truncation for shell_exec / read_file — [#339](https://github.com/erain/glue/issues/339), PR #347.
3. ✅ P0.3 history hardening before send/resume — [#340](https://github.com/erain/glue/issues/340), PR #348.
4. ✅ P1.4 retry/overflow state machine — [#341](https://github.com/erain/glue/issues/341), PR #349, [ADR-0017](adr/0017-loop-retry-overflow-recovery.md).
5. ✅ P1.6 compaction upgrade — [#342](https://github.com/erain/glue/issues/342), PR #350.
6. ✅ P2.7 Gemini next-speaker check + stall recovery — [#343](https://github.com/erain/glue/issues/343), PR #351 (surrogate sub-item verified moot per #313).
7. ✅ P2.8 loop & mistake guardrails — [#344](https://github.com/erain/glue/issues/344), PR #352.
8. ✅ P1.5 per-model capability registry + tool-owned prompt snippets — [#345](https://github.com/erain/glue/issues/345), PR #353.

Items 1–3 are pure-Go, dependency-free, and benefit every provider;
they go first. Item 8 touches public API shape (registry), so it goes
Expand Down
18 changes: 15 additions & 3 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,12 +179,24 @@ until the provider stops or the context is canceled.
tool-call IDs are normalized, and turns from a different model lose
their thinking blocks and provider signatures — providers reject all
of these with opaque 400s otherwise.
2. Ask the provider to stream an assistant response.
2. Ask the provider to stream an assistant response. Transient
failures (429/5xx, dropped streams) retry with classified backoff
under `RunRequest.Retry`; context overflow surfaces as a typed
`*loop.OverflowError` that the session layer answers by compacting
once and retrying once
([ADR-0017](adr/0017-loop-retry-overflow-recovery.md)).
3. Emit text/tool/lifecycle events as provider events arrive.
4. Append the final assistant message to the transcript.
4. Append the final assistant message to the transcript. A turn that
narrates a future action without calling a tool gets a bounded
"Please continue." nudge when `RunRequest.AutoContinue` is set
(the Gemini narrate-then-stop stall).
5. If the assistant requested tools, execute the requested tools.
6. Append tool result messages in deterministic order.
7. Repeat from step 2 until no tool calls remain.
7. Guardrails inspect the round: repeated identical calls or
consecutive all-error rounds first draw a corrective injected
message, then halt the run with a typed error
(`RunRequest.Guardrails`).
8. Repeat from step 2 until no tool calls remain.

The concrete P0 entry point is `loop.Run(ctx, loop.RunRequest)`. It returns a
`loop.RunResult` containing both the full transcript and the messages produced by
Expand Down
21 changes: 13 additions & 8 deletions docs/project-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,14 +165,19 @@ milestone).
agent ships as its own product face with a homepage
(<https://glue-coding-agent-site.vercel.app>, repo
[glue-coding-agent-site](https://github.com/erain/glue-coding-agent-site)).
Next: **harness quality** — a source-verified analysis of pi, Cline,
Codex CLI, and Gemini CLI distilled into
[`coding-harness-roadmap.md`](coding-harness-roadmap.md) (edit-repair
ladder, structured truncation, history hardening, retry/overflow
recovery, compaction upgrades, Gemini loop polish), prioritized for
Gemini 3.x first and open-weight OpenRouter/NVIDIA models second.
Still planned beyond that: daemon goal endpoints, a sandboxed
`Executor` backend (container/VM), and TUI-on-`glue connect`.
The **harness-quality phase shipped as `v1.13.0`**: a source-verified
analysis of pi, Cline, Codex CLI, and Gemini CLI
([`coding-harness-roadmap.md`](coding-harness-roadmap.md)) landed as
eight PRs — edit-repair ladder, structured truncation, history
hardening, retry/overflow recovery
([ADR 0017](adr/0017-loop-retry-overflow-recovery.md)), compaction
upgrades, next-speaker stall recovery, loop guardrails, and the
per-model capability registry with tool-owned prompt assembly —
prioritized for Gemini 3.x first and open-weight OpenRouter/NVIDIA
models second. Still planned: daemon goal endpoints, a sandboxed
`Executor` backend (container/VM), TUI-on-`glue connect`, and the
roadmap's deferred P3 notes (XML tool-calling fallback, parallel-tool
read/write locking, goal-loop budget wind-down).
- **Track B — Peggy.** Peggy v0.1–v0.5 plus dogfood hardening (M1–M6)
shipped: single-prompt CLI, Telegram channel, durable sqlite+FTS5
memory with curated recall, opt-in coding tools, MCP servers, the
Expand Down
30 changes: 30 additions & 0 deletions docs/provider-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,36 @@ for the `glue.Provider` implementation. Reference upstream
open-source CLIs as the protocol spec rather than copying code, and
quarantine all vendor-specific headers and base URLs in the package.

## Registering with the driver registry

Shipped providers register themselves in `init()` so callers can
construct them by name through `providers.New("<name>")` (this is how
the `glue` binary's `--provider` flag works). Registration also
declares **capabilities** — harness-relevant facts the loop and CLIs
query through `providers.CapabilitiesFor(name)` instead of switching
on provider names:

```go
func init() {
providers.Register("acme", providers.Factory{
New: func() loop.Provider { return New(Options{}) },
DefaultModel: DefaultModel,
EnvKey: "ACME_API_KEY",
Capabilities: providers.Capabilities{
ContextWindow: 131_072, // default model's window; 0 = unknown
ParallelTools: false, // safe to run tool calls concurrently?
PromptVariant: "", // "" explicit (open-weight), "terse" frontier
AutoContinue: false, // prone to the narrate-then-stop stall?
},
})
}
```

Declare conservatively: the zero value means "assume nothing", and
consumers treat unknown capabilities as the safe default. Out-of-tree
providers do not have to register at all — construct them directly and
pass them to `glue.NewAgent`.

## Common mistakes

- **Aliasing the same `Message` across `Start` and `Done`.** The loop
Expand Down
Loading