Skip to content

fix(cli): recover from context-limit errors by compacting + retrying#86

Merged
juacker merged 1 commit into
mainfrom
fix/cli-context-limit-recovery
Jun 30, 2026
Merged

fix(cli): recover from context-limit errors by compacting + retrying#86
juacker merged 1 commit into
mainfrom
fix/cli-context-limit-recovery

Conversation

@juacker

@juacker juacker commented Jun 30, 2026

Copy link
Copy Markdown
Owner

The bug

A Codex CLI run could fail outright with:

provider error: provider request failed: Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.

The conversation then surfaced the error to the user. Re-sending the message sometimes appeared to recover, but only incidentally (a later run might cross the proactive-compaction threshold, or the failed run's empty placeholder got discarded) — there was no deliberate recovery.

Root cause

Context-window recovery existed only on the HTTP provider path (engine.rs:318 — on a context-limit provider error it compacts with CompactionTrigger::ErrorRecovery and retries once). The CLI path (local_agent.rs, shared by Codex / Claude Code / OpenCode) had no equivalent branch. Its only retry was for lost CLI sessions (no rollout found, thread/resume failed), which does not match a context-window error.

Claude Code rarely surfaces this because it auto-compacts internally before CLAI sees an error; Codex returns the failure to us instead. But the gap was provider-agnostic — any CLI provider returning a context-limit error would have failed the same way.

The fix (provider-agnostic, local_agent.rs only)

Inside the existing bounded CLI retry loop, add a second recovery branch: on a LocalAgentRunError::Failed whose message matches compaction::is_context_limit_error, do exactly what the HTTP path does —

  1. compact local history with CompactionTrigger::ErrorRecovery (force),
  2. reset the CLI session for rotation (so Codex/OpenCode start a fresh thread and Claude mints a new UUID, with the compaction summary baked into the new prompt),
  3. emit the existing SessionCompacted event,
  4. retry once, reusing the same assistant message slot (no duplicate/blank bubble).

Bounded: the context-recovery retry, like the session-lost retry, fires at most once (own boolean flag) — max 2 attempts. Usage/rate-limit errors and ordinary CLI failures are unaffected and still surface.

Tests

Classifier unit tests for all three CLI providers (Codex / Claude Code / OpenCode), positive (context-limit phrasings recover) and negative (usage-limit, session-lost, generic exit codes do not trigger context recovery).

Verification

  • cargo fmt --check, cargo clippy -- -D warnings clean.
  • Full assistant suite: 516 tests pass (incl. the new classifier tests).

Independent review

Static review verdict: production_quality — no blocker/major findings. Non-blocking minors noted (dead _provider_runtime param kept for symmetry with is_session_lost_error and future per-provider needles; recovery pattern now mirrored in HTTP + CLI paths, flagged for a possible future shared helper).

Note

Needs a Codex run that actually overflows the context window to confirm end-to-end; the classifier + retry wiring is unit-tested and mirrors the proven HTTP path.

@juacker juacker marked this pull request as ready for review June 30, 2026 16:34
@juacker juacker merged commit dde6a22 into main Jun 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant