Skip to content

feat(adapters): add Kimi (Moonshot) HTTP adapter with thinking support#3087

Draft
georgeharker wants to merge 3 commits intoolimorris:mainfrom
georgeharker:kimi-adapter
Draft

feat(adapters): add Kimi (Moonshot) HTTP adapter with thinking support#3087
georgeharker wants to merge 3 commits intoolimorris:mainfrom
georgeharker:kimi-adapter

Conversation

@georgeharker
Copy link
Copy Markdown
Contributor

Description

Add a dedicated kimi adapter for Moonshot's Kimi K2 family. Although Moonshot's API is OpenAI-compatible enough to work via openai_compatible for simple chats, the K2-thinking variants impose a strict round-trip requirement that breaks tool-calling chats:

When think is enabled, every assistant message carrying tool_calls
must also carry a reasoning_content field on replay. Omitting it
yields a 400 — "thinking is enabled but reasoning_content is missing in assistant tool call message at index N" — on the second turn of any
tool-using conversation.

OpenAI's Chat Completions schema has no notion of reasoning_content on the wire (it nests reasoning behind a reasoning object, populated by adapters such as Copilot for gemini-3 or DeepSeek's reasoner), so the generic OpenAI form_messages cannot satisfy Kimi's validator.

This adapter wires the round-trip end-to-end:

  • chat_output (delegated to OpenAI) already routes non-standard streaming delta fields through extra, so delta.reasoning_content chunks land in extra.reasoning_content for free.
  • parse_message_meta lifts those fragments onto data.output.reasoning.content, the same shape DeepSeek and Copilot use, so CC stores it as msg.reasoning on the assistant message.
  • form_messages post-processes OpenAI's output: it rewrites the nested m.reasoning into Moonshot's flat reasoning_content string on assistant messages, and inserts reasoning_content = "" for assistant tool-call messages that have no captured reasoning (chat history that pre-dates the adapter, edited messages, model swaps mid-conversation). The empty-string fallback satisfies the validator without fabricating reasoning content.

Other K2 quirks captured in the schema:

  • temperature defaults to 1 (kimi-k2-thinking rejects any other value).
  • top_p defaults to 0.95 (same — pinned by the model).
  • think schema field (boolean, default true) so the model is actually asked to reason; the adapter only produces a wire payload Kimi accepts when this is on for k2-thinking variants.

Models cover the current K2 line per https://platform.kimi.ai/docs/modelskimi-k2.6 (default), kimi-k2.5, kimi-k2-thinking{,-turbo}, kimi-k2-turbo-preview, kimi-k2-{0905,0711}-preview. Older moonshot-v1-* and vision-preview models are intentionally omitted because they don't support tool calling and would need extra setup-time gating. Schema follows the anthropic/openai static-choices convention.

Structure mirrors mistral.lua for review-friendliness: same top-level field order, same handler-key order with parse_message_meta slotted between chat_output and tools, same delegate-to-openai pattern for all standard handlers. The only kimi-specific handler bodies are the two reasoning-handling additions described above.

tests: cover Kimi (Moonshot) HTTP adapter

Mirrors the structure of test_mistral.lua: a top-level adapter set with a
pre_case hook that resolves the kimi adapter, then three nested groups —
form_messages, Streaming, and No Streaming — using the same hook shape and
test phrasing where behaviour overlaps.

Standard tests (mirrored from mistral):

  • form_messages: basic, with tools, and form_tools after extend()
  • Streaming: chat-buffer output (stream = true pre_case)
  • No Streaming: chat-buffer output, tools, inline assistant (stream = false
    pre_case)

Kimi-specific additions cover the reasoning_content round-trip the adapter
exists for:

  • form_messages rewrites m.reasoning into a flat reasoning_content string
    on assistant messages.
  • When think=true, an empty-string reasoning_content is inserted on
    assistant tool-call messages with no captured reasoning, satisfying
    Moonshot's validator on history that pre-dates the adapter.
  • When think=false, no fallback is inserted (negative case).
  • Streaming "can process thinking" walks chat_output → parse_message_meta
    and asserts both content and reasoning aggregate correctly.

Stubs follow the OpenAI Chat-Completions wire format (the streaming stub
uses delta.reasoning_content; the tools-no-streaming stub includes a flat
reasoning_content string on the assistant message). Files added:

  • tests/adapters/http/test_kimi.lua
  • tests/adapters/http/stubs/kimi_streaming.txt
  • tests/adapters/http/stubs/kimi_no_streaming.txt
  • tests/adapters/http/stubs/kimi_tools_no_streaming.txt

docs: document the Kimi (Moonshot) HTTP adapter

Adds the kimi adapter to the supported-LLMs list (README, doc/index.md,
regenerated doc/codecompanion.txt) and a Setup Examples entry in
doc/configuration/adapters-http.md.

The setup example covers:

  • Minimal config (just MOONSHOT_API_KEY plus interactions.chat.adapter).
  • Overriding the API-key source via the cmd: prefix (1Password CLI
    example) and switching the URL for region-specific endpoints
    (e.g. api.moonshot.cn).
  • Schema overrides for model and think so users can pick a non-
    thinking K2 model or disable thinking on the K2-thinking variants.
  • An IMPORTANT callout that kimi-k2-thinking pins temperature=1 and
    top_p=0.95 server-side, matching the adapter's defaults.

The example is placed between the llama.cpp and Ollama sections — both
neighbours involve OpenAI-compatible reasoning configuration, which keeps
the page topically grouped.

AI Usage

Claude code to help determine correct format for adapters and wiring for the thinking_content, help generate tests

Related Issue(s)

N/a

Screenshots

N/a

Checklist

  • I've read the contributing guidelines and have adhered to them in this PR
  • I confirm that this PR has been majority created by me, and not AI (unless stated in the "AI Usage" section above)
  • I've run make all to ensure docs are generated, tests pass and StyLua has formatted the code
  • (optional) I've added test coverage for this fix/feature
  • (optional) I've updated the README and/or relevant docs pages

Add a dedicated `kimi` adapter for Moonshot's Kimi K2 family.  Although
Moonshot's API is OpenAI-compatible enough to work via `openai_compatible`
for simple chats, the K2-thinking variants impose a strict round-trip
requirement that breaks tool-calling chats:

  When `think` is enabled, every assistant message carrying `tool_calls`
  must also carry a `reasoning_content` field on replay.  Omitting it
  yields a 400 — `"thinking is enabled but reasoning_content is missing in
  assistant tool call message at index N"` — on the second turn of any
  tool-using conversation.

OpenAI's Chat Completions schema has no notion of `reasoning_content` on
the wire (it nests reasoning behind a `reasoning` object, populated by
adapters such as Copilot for gemini-3 or DeepSeek's reasoner), so the
generic OpenAI form_messages cannot satisfy Kimi's validator.

This adapter wires the round-trip end-to-end:

  - `chat_output` (delegated to OpenAI) already routes non-standard
    streaming delta fields through `extra`, so `delta.reasoning_content`
    chunks land in `extra.reasoning_content` for free.
  - `parse_message_meta` lifts those fragments onto
    `data.output.reasoning.content`, the same shape DeepSeek and Copilot
    use, so CC stores it as `msg.reasoning` on the assistant message.
  - `form_messages` post-processes OpenAI's output: it rewrites the
    nested `m.reasoning` into Moonshot's flat `reasoning_content` string
    on assistant messages, and inserts `reasoning_content = ""` for
    assistant tool-call messages that have no captured reasoning
    (chat history that pre-dates the adapter, edited messages, model
    swaps mid-conversation).  The empty-string fallback satisfies the
    validator without fabricating reasoning content.

Other K2 quirks captured in the schema:

  - `temperature` defaults to `1` (kimi-k2-thinking rejects any other
    value).
  - `top_p` defaults to `0.95` (same — pinned by the model).
  - `think` schema field (boolean, default `true`) so the model is
    actually asked to reason; the adapter only produces a wire payload
    Kimi accepts when this is on for k2-thinking variants.

Models cover the current K2 line per https://platform.kimi.ai/docs/models
— `kimi-k2.6` (default), `kimi-k2.5`, `kimi-k2-thinking{,-turbo}`,
`kimi-k2-turbo-preview`, `kimi-k2-{0905,0711}-preview`.  Older
`moonshot-v1-*` and vision-preview models are intentionally omitted
because they don't support tool calling and would need extra setup-time
gating.  Schema follows the anthropic/openai static-`choices` convention.

Structure mirrors `mistral.lua` for review-friendliness: same top-level
field order, same handler-key order with `parse_message_meta` slotted
between `chat_output` and `tools`, same delegate-to-openai pattern for
all standard handlers.  The only kimi-specific handler bodies are the
two reasoning-handling additions described above.
Mirrors the structure of test_mistral.lua: a top-level adapter set with a
pre_case hook that resolves the kimi adapter, then three nested groups —
form_messages, Streaming, and No Streaming — using the same hook shape and
test phrasing where behaviour overlaps.

Standard tests (mirrored from mistral):

  - form_messages: basic, with tools, and form_tools after extend()
  - Streaming: chat-buffer output (stream = true pre_case)
  - No Streaming: chat-buffer output, tools, inline assistant (stream = false
    pre_case)

Kimi-specific additions cover the reasoning_content round-trip the adapter
exists for:

  - form_messages rewrites m.reasoning into a flat reasoning_content string
    on assistant messages.
  - When think=true, an empty-string reasoning_content is inserted on
    assistant tool-call messages with no captured reasoning, satisfying
    Moonshot's validator on history that pre-dates the adapter.
  - When think=false, no fallback is inserted (negative case).
  - Streaming "can process thinking" walks chat_output → parse_message_meta
    and asserts both content and reasoning aggregate correctly.

Stubs follow the OpenAI Chat-Completions wire format (the streaming stub
uses delta.reasoning_content; the tools-no-streaming stub includes a flat
reasoning_content string on the assistant message).  Files added:

  - tests/adapters/http/test_kimi.lua
  - tests/adapters/http/stubs/kimi_streaming.txt
  - tests/adapters/http/stubs/kimi_no_streaming.txt
  - tests/adapters/http/stubs/kimi_tools_no_streaming.txt
Adds the kimi adapter to the supported-LLMs list (README, doc/index.md,
regenerated doc/codecompanion.txt) and a Setup Examples entry in
doc/configuration/adapters-http.md.

The setup example covers:

  - Minimal config (just MOONSHOT_API_KEY plus interactions.chat.adapter).
  - Overriding the API-key source via the cmd: prefix (1Password CLI
    example) and switching the URL for region-specific endpoints
    (e.g. api.moonshot.cn).
  - Schema overrides for `model` and `think` so users can pick a non-
    thinking K2 model or disable thinking on the K2-thinking variants.
  - An IMPORTANT callout that kimi-k2-thinking pins temperature=1 and
    top_p=0.95 server-side, matching the adapter's defaults.

The example is placed between the llama.cpp and Ollama sections — both
neighbours involve OpenAI-compatible reasoning configuration, which keeps
the page topically grouped.
@georgeharker georgeharker marked this pull request as draft May 3, 2026 11:32
@olimorris
Copy link
Copy Markdown
Owner

Hey @georgeharker. Appreciate you taking the time to make a PR for this.

I'm at the limit of what I want to merge into main for both ACP and HTTP adapters. The adapter system is robust enough that I any new additions to be created and shared by the community. I don't want CodeCompanion to turn into that plugin that becomes a repository for adapters as I noted in #3053 when I closed that.

@georgeharker
Copy link
Copy Markdown
Contributor Author

georgeharker commented May 3, 2026

Totally understood!

In which case I'll re-spin this as a plugin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants