Skip to content

feat: persist conversations, isolate sessions, and fix request-pipeline reliability#109

Open
RvVeen wants to merge 8 commits into
tickernelz:masterfrom
Servoy:pr/request-and-conversation
Open

feat: persist conversations, isolate sessions, and fix request-pipeline reliability#109
RvVeen wants to merge 8 commits into
tickernelz:masterfrom
Servoy:pr/request-and-conversation

Conversation

@RvVeen

@RvVeen RvVeen commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR bundles the request/response pipeline work that all touches the same core
files (request.ts, request-handler.ts, sdk-client.ts). It persists Kiro
conversations across turns, isolates parallel sessions, routes Pro accounts to
the runtime endpoint, and fixes several reliability issues around streaming,
retries, and history trimming.

Problem

  • Context loss: every request minted a fresh conversationId via
    crypto.randomUUID(), so each agentic turn started a brand-new Kiro session
    and the model lost prior context.
  • History trimmed far too aggressively: the payload was trimmed at 600KB,
    over 6x below Kiro's real limit, discarding older turns prematurely.
  • Parallel sessions collided: multiple OpenCode sessions in one workspace
    shared conversation state.
  • Pro-only models unavailable: glm-5, minimax, etc. are only served by
    runtime.kiro.dev, but all requests went to q.amazonaws.com.
  • Streaming/tool-call edge cases: tool calls could be double-emitted, long
    tool names exceeded Kiro's limit, and thinking output was not surfaced.
  • Retry backoff counted against the request timeout budget, causing
    premature failures on rate-limited accounts.

Solution

  • Conversation persistence: mint a stable conversationId +
    agentContinuationId per (workspace, fingerprint) and persist them in
    SQLite so agentic turns continue the same Kiro conversation. Reset a stale
    conversationId on a 400 ValidationException.
  • Session isolation: key conversation state on the OpenCode x-session-id
    header so parallel sessions in one workspace stay distinct.
  • Configurable history trim: raise the trim threshold to a configurable
    max_payload_bytes (default 4MB). Kiro's hard limit is structure-dependent
    (verified against the live API): a single message survives up to ~7.6MB, but
    many-entry histories are rejected as low as ~5.9MB, so the default stays
    safely below the lowest observed failure. Adds [PAYLOAD]/[TRIM] debug
    logging.
  • Pro endpoint routing: accounts with a profileArn use runtime.kiro.dev
    (which serves the additional models); free Builder ID accounts keep using
    q.amazonaws.com, since the runtime endpoint 400s without a profileArn.
  • Streaming/tool-call hardening: robust inline + bracket tool-call parsing
    without double-emission, per-tool stop tracking, tool-name shortening within
    Kiro's limit, and thinking passthrough.
  • Retry budget fix: exclude backoff sleep from the request timeout budget.
  • Add a cross-process reauth lock (pid + TTL) so concurrent instances don't
    re-authenticate simultaneously, with a bounded OAuth timeout.

Configuration

// kiro.json
{
  "max_payload_bytes": 4000000
}

Testing

  • bun test — all tests pass (adds coverage for the request pipeline,
    streaming/tool-call parsing, image cache, and the conversation/reauth-lock
    parts of the SQLite store)
  • bun run typecheck — clean
  • bun run build — clean
  • Live end-to-end verification against runtime.eu-central-1.kiro.dev,
    including multi-turn context retention and payload-trim behavior on oversized
    histories.

RvVeen added 8 commits July 1, 2026 09:48
Accounts with a profileArn (Kiro Pro / Q Developer Pro) use the
runtime.kiro.dev endpoint, which serves additional models (glm-5,
minimax, etc.) not available on q.amazonaws.com. Free Builder ID
accounts (no profileArn) keep using q.amazonaws.com, since
runtime.kiro.dev returns 400 without a profileArn.

Adds resolveKiroEndpoint() and tears down stale cached SDK clients on
token rotation or endpoint change.

(cherry picked from commit 0c0320c)
Handle inline content_block_start tool calls and bracket-format tool
calls without double-emitting, track per-tool stop state, and map
shortened Kiro tool names back to their original names on the way out.
Shorten long tool names when building history to stay within Kiro's
limits, and thread thinkingRequested through the response handler so
thinking output is surfaced when requested.

(cherry picked from commit da71f35)
Track slept time (excludedMs) so long backoff waits don't count against
the overall retry deadline, and thread it through the error handler.
Harden locked-operations against contention and tidy account selection.

(cherry picked from commit 9b5e23b)
OpenCode strips image parts from conversation state on later turns.
Cache converted Kiro images per (workspace, fingerprint) and re-attach
them to currentMessage when a turn arrives without fresh images, so the
model keeps seeing them. Never restores onto tool-result turns (Kiro
400s on images+toolResults). Gated by the image_carry_forward config
flag (default on); Kiro bills per request, not per token, so re-sending
has no billing impact.

(cherry picked from commit 0416dcc)
Add THINKING_BUDGETS mapping (low/medium/high/default) capped at Kiro's
200k max_thinking_length, used to translate OpenCode's reasoningEffort
selection into an explicit thinking budget.

(cherry picked from commit aad98f5)
Mint a stable conversationId + agentContinuationId per (workspace,
fingerprint) and persist them in SQLite so agentic turns continue the
same Kiro conversation instead of starting fresh each request. Isolate
parallel sessions in one workspace via the x-session-id header, and
reset a stale conversationId on 400 ValidationException. Add a
cross-process reauth lock (pid + TTL) so concurrent instances don't
re-authenticate simultaneously, with a bounded oauth timeout. Dispose
SDK clients, image cache, and DB on plugin shutdown.

(cherry picked from commit b25be3a)
Add unit tests for the SDK client endpoint routing, stream/tool-call
transformer, response handler, error/retry handling, event-stream
parser, image cache, and the conversation/reauth-lock parts of the
SQLite store.
We trimmed conversation history at 600KB, over 6x too aggressive, which
discarded older turns prematurely and caused context loss in longer
agentic sessions. Kiro rejects oversized payloads with
CONTENT_LENGTH_EXCEEDS_THRESHOLD; the hard limit is structure-dependent
(verified against the live API): a single message survives up to ~7.6MB,
but many-entry histories are rejected as low as ~5.9MB.

Raise the trim threshold to a configurable max_payload_bytes (default
4MB), which stays safely below the lowest observed failure regardless of
history shape while allowing far more context. Add [PAYLOAD] and [TRIM]
debug logging so payload growth and trim events are visible.

(cherry picked from commit ef53269)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant