feat: persist conversations, isolate sessions, and fix request-pipeline reliability by RvVeen · Pull Request #109 · tickernelz/opencode-kiro-auth

RvVeen · 2026-07-01T18:13:13Z

Summary

This PR bundles the request/response pipeline work that all touches the same core
files (request.ts, request-handler.ts, sdk-client.ts). It persists Kiro
conversations across turns, isolates parallel sessions, routes Pro accounts to
the runtime endpoint, and fixes several reliability issues around streaming,
retries, and history trimming.

Problem

Context loss: every request minted a fresh conversationId via
crypto.randomUUID(), so each agentic turn started a brand-new Kiro session
and the model lost prior context.
History trimmed far too aggressively: the payload was trimmed at 600KB,
over 6x below Kiro's real limit, discarding older turns prematurely.
Parallel sessions collided: multiple OpenCode sessions in one workspace
shared conversation state.
Pro-only models unavailable: glm-5, minimax, etc. are only served by
runtime.kiro.dev, but all requests went to q.amazonaws.com.
Streaming/tool-call edge cases: tool calls could be double-emitted, long
tool names exceeded Kiro's limit, and thinking output was not surfaced.
Retry backoff counted against the request timeout budget, causing
premature failures on rate-limited accounts.

Solution

Conversation persistence: mint a stable conversationId +
agentContinuationId per (workspace, fingerprint) and persist them in
SQLite so agentic turns continue the same Kiro conversation. Reset a stale
conversationId on a 400 ValidationException.
Session isolation: key conversation state on the OpenCode x-session-id
header so parallel sessions in one workspace stay distinct.
Configurable history trim: raise the trim threshold to a configurable
max_payload_bytes (default 4MB). Kiro's hard limit is structure-dependent
(verified against the live API): a single message survives up to ~7.6MB, but
many-entry histories are rejected as low as ~5.9MB, so the default stays
safely below the lowest observed failure. Adds [PAYLOAD]/[TRIM] debug
logging.
Pro endpoint routing: accounts with a profileArn use runtime.kiro.dev
(which serves the additional models); free Builder ID accounts keep using
q.amazonaws.com, since the runtime endpoint 400s without a profileArn.
Streaming/tool-call hardening: robust inline + bracket tool-call parsing
without double-emission, per-tool stop tracking, tool-name shortening within
Kiro's limit, and thinking passthrough.
Retry budget fix: exclude backoff sleep from the request timeout budget.
Add a cross-process reauth lock (pid + TTL) so concurrent instances don't
re-authenticate simultaneously, with a bounded OAuth timeout.

Configuration

// kiro.json
{
  "max_payload_bytes": 4000000
}

Testing

bun test — all tests pass (adds coverage for the request pipeline,
streaming/tool-call parsing, image cache, and the conversation/reauth-lock
parts of the SQLite store)
bun run typecheck — clean
bun run build — clean
Live end-to-end verification against runtime.eu-central-1.kiro.dev,
including multi-turn context retention and payload-trim behavior on oversized
histories.

Accounts with a profileArn (Kiro Pro / Q Developer Pro) use the runtime.kiro.dev endpoint, which serves additional models (glm-5, minimax, etc.) not available on q.amazonaws.com. Free Builder ID accounts (no profileArn) keep using q.amazonaws.com, since runtime.kiro.dev returns 400 without a profileArn. Adds resolveKiroEndpoint() and tears down stale cached SDK clients on token rotation or endpoint change. (cherry picked from commit 0c0320c)

Handle inline content_block_start tool calls and bracket-format tool calls without double-emitting, track per-tool stop state, and map shortened Kiro tool names back to their original names on the way out. Shorten long tool names when building history to stay within Kiro's limits, and thread thinkingRequested through the response handler so thinking output is surfaced when requested. (cherry picked from commit da71f35)

Track slept time (excludedMs) so long backoff waits don't count against the overall retry deadline, and thread it through the error handler. Harden locked-operations against contention and tidy account selection. (cherry picked from commit 9b5e23b)

OpenCode strips image parts from conversation state on later turns. Cache converted Kiro images per (workspace, fingerprint) and re-attach them to currentMessage when a turn arrives without fresh images, so the model keeps seeing them. Never restores onto tool-result turns (Kiro 400s on images+toolResults). Gated by the image_carry_forward config flag (default on); Kiro bills per request, not per token, so re-sending has no billing impact. (cherry picked from commit 0416dcc)

Add THINKING_BUDGETS mapping (low/medium/high/default) capped at Kiro's 200k max_thinking_length, used to translate OpenCode's reasoningEffort selection into an explicit thinking budget. (cherry picked from commit aad98f5)

Mint a stable conversationId + agentContinuationId per (workspace, fingerprint) and persist them in SQLite so agentic turns continue the same Kiro conversation instead of starting fresh each request. Isolate parallel sessions in one workspace via the x-session-id header, and reset a stale conversationId on 400 ValidationException. Add a cross-process reauth lock (pid + TTL) so concurrent instances don't re-authenticate simultaneously, with a bounded oauth timeout. Dispose SDK clients, image cache, and DB on plugin shutdown. (cherry picked from commit b25be3a)

Add unit tests for the SDK client endpoint routing, stream/tool-call transformer, response handler, error/retry handling, event-stream parser, image cache, and the conversation/reauth-lock parts of the SQLite store.

We trimmed conversation history at 600KB, over 6x too aggressive, which discarded older turns prematurely and caused context loss in longer agentic sessions. Kiro rejects oversized payloads with CONTENT_LENGTH_EXCEEDS_THRESHOLD; the hard limit is structure-dependent (verified against the live API): a single message survives up to ~7.6MB, but many-entry histories are rejected as low as ~5.9MB. Raise the trim threshold to a configurable max_payload_bytes (default 4MB), which stays safely below the lowest observed failure regardless of history shape while allowing far more context. Add [PAYLOAD] and [TRIM] debug logging so payload growth and trim events are visible. (cherry picked from commit ef53269)

RvVeen added 8 commits July 1, 2026 09:48

feat(effort): map reasoning effort to thinking token budgets

79e20ea

Add THINKING_BUDGETS mapping (low/medium/high/default) capped at Kiro's 200k max_thinking_length, used to translate OpenCode's reasoningEffort selection into an explicit thinking budget. (cherry picked from commit aad98f5)

test: cover request pipeline, streaming, images, conversation store

f3bf705

Add unit tests for the SDK client endpoint routing, stream/tool-call transformer, response handler, error/retry handling, event-stream parser, image cache, and the conversation/reauth-lock parts of the SQLite store.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: persist conversations, isolate sessions, and fix request-pipeline reliability#109

feat: persist conversations, isolate sessions, and fix request-pipeline reliability#109
RvVeen wants to merge 8 commits into
tickernelz:masterfrom
Servoy:pr/request-and-conversation

RvVeen commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RvVeen commented Jul 1, 2026

Summary

Problem

Solution

Configuration

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant