feat: persist conversations, isolate sessions, and fix request-pipeline reliability#109
Open
RvVeen wants to merge 8 commits into
Open
feat: persist conversations, isolate sessions, and fix request-pipeline reliability#109RvVeen wants to merge 8 commits into
RvVeen wants to merge 8 commits into
Conversation
Accounts with a profileArn (Kiro Pro / Q Developer Pro) use the runtime.kiro.dev endpoint, which serves additional models (glm-5, minimax, etc.) not available on q.amazonaws.com. Free Builder ID accounts (no profileArn) keep using q.amazonaws.com, since runtime.kiro.dev returns 400 without a profileArn. Adds resolveKiroEndpoint() and tears down stale cached SDK clients on token rotation or endpoint change. (cherry picked from commit 0c0320c)
Handle inline content_block_start tool calls and bracket-format tool calls without double-emitting, track per-tool stop state, and map shortened Kiro tool names back to their original names on the way out. Shorten long tool names when building history to stay within Kiro's limits, and thread thinkingRequested through the response handler so thinking output is surfaced when requested. (cherry picked from commit da71f35)
Track slept time (excludedMs) so long backoff waits don't count against the overall retry deadline, and thread it through the error handler. Harden locked-operations against contention and tidy account selection. (cherry picked from commit 9b5e23b)
OpenCode strips image parts from conversation state on later turns. Cache converted Kiro images per (workspace, fingerprint) and re-attach them to currentMessage when a turn arrives without fresh images, so the model keeps seeing them. Never restores onto tool-result turns (Kiro 400s on images+toolResults). Gated by the image_carry_forward config flag (default on); Kiro bills per request, not per token, so re-sending has no billing impact. (cherry picked from commit 0416dcc)
Add THINKING_BUDGETS mapping (low/medium/high/default) capped at Kiro's 200k max_thinking_length, used to translate OpenCode's reasoningEffort selection into an explicit thinking budget. (cherry picked from commit aad98f5)
Mint a stable conversationId + agentContinuationId per (workspace, fingerprint) and persist them in SQLite so agentic turns continue the same Kiro conversation instead of starting fresh each request. Isolate parallel sessions in one workspace via the x-session-id header, and reset a stale conversationId on 400 ValidationException. Add a cross-process reauth lock (pid + TTL) so concurrent instances don't re-authenticate simultaneously, with a bounded oauth timeout. Dispose SDK clients, image cache, and DB on plugin shutdown. (cherry picked from commit b25be3a)
Add unit tests for the SDK client endpoint routing, stream/tool-call transformer, response handler, error/retry handling, event-stream parser, image cache, and the conversation/reauth-lock parts of the SQLite store.
We trimmed conversation history at 600KB, over 6x too aggressive, which discarded older turns prematurely and caused context loss in longer agentic sessions. Kiro rejects oversized payloads with CONTENT_LENGTH_EXCEEDS_THRESHOLD; the hard limit is structure-dependent (verified against the live API): a single message survives up to ~7.6MB, but many-entry histories are rejected as low as ~5.9MB. Raise the trim threshold to a configurable max_payload_bytes (default 4MB), which stays safely below the lowest observed failure regardless of history shape while allowing far more context. Add [PAYLOAD] and [TRIM] debug logging so payload growth and trim events are visible. (cherry picked from commit ef53269)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR bundles the request/response pipeline work that all touches the same core
files (
request.ts,request-handler.ts,sdk-client.ts). It persists Kiroconversations across turns, isolates parallel sessions, routes Pro accounts to
the runtime endpoint, and fixes several reliability issues around streaming,
retries, and history trimming.
Problem
conversationIdviacrypto.randomUUID(), so each agentic turn started a brand-new Kiro sessionand the model lost prior context.
over 6x below Kiro's real limit, discarding older turns prematurely.
shared conversation state.
glm-5,minimax, etc. are only served byruntime.kiro.dev, but all requests went toq.amazonaws.com.tool names exceeded Kiro's limit, and thinking output was not surfaced.
premature failures on rate-limited accounts.
Solution
conversationId+agentContinuationIdper(workspace, fingerprint)and persist them inSQLite so agentic turns continue the same Kiro conversation. Reset a stale
conversationIdon a 400ValidationException.x-session-idheader so parallel sessions in one workspace stay distinct.
max_payload_bytes(default 4MB). Kiro's hard limit is structure-dependent(verified against the live API): a single message survives up to ~7.6MB, but
many-entry histories are rejected as low as ~5.9MB, so the default stays
safely below the lowest observed failure. Adds
[PAYLOAD]/[TRIM]debuglogging.
profileArnuseruntime.kiro.dev(which serves the additional models); free Builder ID accounts keep using
q.amazonaws.com, since the runtime endpoint 400s without aprofileArn.without double-emission, per-tool stop tracking, tool-name shortening within
Kiro's limit, and thinking passthrough.
re-authenticate simultaneously, with a bounded OAuth timeout.
Configuration
Testing
bun test— all tests pass (adds coverage for the request pipeline,streaming/tool-call parsing, image cache, and the conversation/reauth-lock
parts of the SQLite store)
bun run typecheck— cleanbun run build— cleanruntime.eu-central-1.kiro.dev,including multi-turn context retention and payload-trim behavior on oversized
histories.