feat(adapters): add Kimi (Moonshot) HTTP adapter with thinking support#3087
Draft
georgeharker wants to merge 3 commits intoolimorris:mainfrom
Draft
feat(adapters): add Kimi (Moonshot) HTTP adapter with thinking support#3087georgeharker wants to merge 3 commits intoolimorris:mainfrom
georgeharker wants to merge 3 commits intoolimorris:mainfrom
Conversation
Add a dedicated `kimi` adapter for Moonshot's Kimi K2 family. Although
Moonshot's API is OpenAI-compatible enough to work via `openai_compatible`
for simple chats, the K2-thinking variants impose a strict round-trip
requirement that breaks tool-calling chats:
When `think` is enabled, every assistant message carrying `tool_calls`
must also carry a `reasoning_content` field on replay. Omitting it
yields a 400 — `"thinking is enabled but reasoning_content is missing in
assistant tool call message at index N"` — on the second turn of any
tool-using conversation.
OpenAI's Chat Completions schema has no notion of `reasoning_content` on
the wire (it nests reasoning behind a `reasoning` object, populated by
adapters such as Copilot for gemini-3 or DeepSeek's reasoner), so the
generic OpenAI form_messages cannot satisfy Kimi's validator.
This adapter wires the round-trip end-to-end:
- `chat_output` (delegated to OpenAI) already routes non-standard
streaming delta fields through `extra`, so `delta.reasoning_content`
chunks land in `extra.reasoning_content` for free.
- `parse_message_meta` lifts those fragments onto
`data.output.reasoning.content`, the same shape DeepSeek and Copilot
use, so CC stores it as `msg.reasoning` on the assistant message.
- `form_messages` post-processes OpenAI's output: it rewrites the
nested `m.reasoning` into Moonshot's flat `reasoning_content` string
on assistant messages, and inserts `reasoning_content = ""` for
assistant tool-call messages that have no captured reasoning
(chat history that pre-dates the adapter, edited messages, model
swaps mid-conversation). The empty-string fallback satisfies the
validator without fabricating reasoning content.
Other K2 quirks captured in the schema:
- `temperature` defaults to `1` (kimi-k2-thinking rejects any other
value).
- `top_p` defaults to `0.95` (same — pinned by the model).
- `think` schema field (boolean, default `true`) so the model is
actually asked to reason; the adapter only produces a wire payload
Kimi accepts when this is on for k2-thinking variants.
Models cover the current K2 line per https://platform.kimi.ai/docs/models
— `kimi-k2.6` (default), `kimi-k2.5`, `kimi-k2-thinking{,-turbo}`,
`kimi-k2-turbo-preview`, `kimi-k2-{0905,0711}-preview`. Older
`moonshot-v1-*` and vision-preview models are intentionally omitted
because they don't support tool calling and would need extra setup-time
gating. Schema follows the anthropic/openai static-`choices` convention.
Structure mirrors `mistral.lua` for review-friendliness: same top-level
field order, same handler-key order with `parse_message_meta` slotted
between `chat_output` and `tools`, same delegate-to-openai pattern for
all standard handlers. The only kimi-specific handler bodies are the
two reasoning-handling additions described above.
Mirrors the structure of test_mistral.lua: a top-level adapter set with a
pre_case hook that resolves the kimi adapter, then three nested groups —
form_messages, Streaming, and No Streaming — using the same hook shape and
test phrasing where behaviour overlaps.
Standard tests (mirrored from mistral):
- form_messages: basic, with tools, and form_tools after extend()
- Streaming: chat-buffer output (stream = true pre_case)
- No Streaming: chat-buffer output, tools, inline assistant (stream = false
pre_case)
Kimi-specific additions cover the reasoning_content round-trip the adapter
exists for:
- form_messages rewrites m.reasoning into a flat reasoning_content string
on assistant messages.
- When think=true, an empty-string reasoning_content is inserted on
assistant tool-call messages with no captured reasoning, satisfying
Moonshot's validator on history that pre-dates the adapter.
- When think=false, no fallback is inserted (negative case).
- Streaming "can process thinking" walks chat_output → parse_message_meta
and asserts both content and reasoning aggregate correctly.
Stubs follow the OpenAI Chat-Completions wire format (the streaming stub
uses delta.reasoning_content; the tools-no-streaming stub includes a flat
reasoning_content string on the assistant message). Files added:
- tests/adapters/http/test_kimi.lua
- tests/adapters/http/stubs/kimi_streaming.txt
- tests/adapters/http/stubs/kimi_no_streaming.txt
- tests/adapters/http/stubs/kimi_tools_no_streaming.txt
Adds the kimi adapter to the supported-LLMs list (README, doc/index.md,
regenerated doc/codecompanion.txt) and a Setup Examples entry in
doc/configuration/adapters-http.md.
The setup example covers:
- Minimal config (just MOONSHOT_API_KEY plus interactions.chat.adapter).
- Overriding the API-key source via the cmd: prefix (1Password CLI
example) and switching the URL for region-specific endpoints
(e.g. api.moonshot.cn).
- Schema overrides for `model` and `think` so users can pick a non-
thinking K2 model or disable thinking on the K2-thinking variants.
- An IMPORTANT callout that kimi-k2-thinking pins temperature=1 and
top_p=0.95 server-side, matching the adapter's defaults.
The example is placed between the llama.cpp and Ollama sections — both
neighbours involve OpenAI-compatible reasoning configuration, which keeps
the page topically grouped.
Owner
|
Hey @georgeharker. Appreciate you taking the time to make a PR for this. I'm at the limit of what I want to merge into |
Contributor
Author
|
Totally understood! In which case I'll re-spin this as a plugin |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add a dedicated
kimiadapter for Moonshot's Kimi K2 family. Although Moonshot's API is OpenAI-compatible enough to work viaopenai_compatiblefor simple chats, the K2-thinking variants impose a strict round-trip requirement that breaks tool-calling chats:When
thinkis enabled, every assistant message carryingtool_callsmust also carry a
reasoning_contentfield on replay. Omitting ityields a 400 —
"thinking is enabled but reasoning_content is missing in assistant tool call message at index N"— on the second turn of anytool-using conversation.
OpenAI's Chat Completions schema has no notion of
reasoning_contenton the wire (it nests reasoning behind areasoningobject, populated by adapters such as Copilot for gemini-3 or DeepSeek's reasoner), so the generic OpenAI form_messages cannot satisfy Kimi's validator.This adapter wires the round-trip end-to-end:
chat_output(delegated to OpenAI) already routes non-standard streaming delta fields throughextra, sodelta.reasoning_contentchunks land inextra.reasoning_contentfor free.parse_message_metalifts those fragments ontodata.output.reasoning.content, the same shape DeepSeek and Copilot use, so CC stores it asmsg.reasoningon the assistant message.form_messagespost-processes OpenAI's output: it rewrites the nestedm.reasoninginto Moonshot's flatreasoning_contentstring on assistant messages, and insertsreasoning_content = ""for assistant tool-call messages that have no captured reasoning (chat history that pre-dates the adapter, edited messages, model swaps mid-conversation). The empty-string fallback satisfies the validator without fabricating reasoning content.Other K2 quirks captured in the schema:
temperaturedefaults to1(kimi-k2-thinking rejects any other value).top_pdefaults to0.95(same — pinned by the model).thinkschema field (boolean, defaulttrue) so the model is actually asked to reason; the adapter only produces a wire payload Kimi accepts when this is on for k2-thinking variants.Models cover the current K2 line per https://platform.kimi.ai/docs/models —
kimi-k2.6(default),kimi-k2.5,kimi-k2-thinking{,-turbo},kimi-k2-turbo-preview,kimi-k2-{0905,0711}-preview. Oldermoonshot-v1-*and vision-preview models are intentionally omitted because they don't support tool calling and would need extra setup-time gating. Schema follows the anthropic/openai static-choicesconvention.Structure mirrors
mistral.luafor review-friendliness: same top-level field order, same handler-key order withparse_message_metaslotted betweenchat_outputandtools, same delegate-to-openai pattern for all standard handlers. The only kimi-specific handler bodies are the two reasoning-handling additions described above.tests: cover Kimi (Moonshot) HTTP adapter
Mirrors the structure of test_mistral.lua: a top-level adapter set with a
pre_case hook that resolves the kimi adapter, then three nested groups —
form_messages, Streaming, and No Streaming — using the same hook shape and
test phrasing where behaviour overlaps.
Standard tests (mirrored from mistral):
pre_case)
Kimi-specific additions cover the reasoning_content round-trip the adapter
exists for:
on assistant messages.
assistant tool-call messages with no captured reasoning, satisfying
Moonshot's validator on history that pre-dates the adapter.
and asserts both content and reasoning aggregate correctly.
Stubs follow the OpenAI Chat-Completions wire format (the streaming stub
uses delta.reasoning_content; the tools-no-streaming stub includes a flat
reasoning_content string on the assistant message). Files added:
docs: document the Kimi (Moonshot) HTTP adapter
Adds the kimi adapter to the supported-LLMs list (README, doc/index.md,
regenerated doc/codecompanion.txt) and a Setup Examples entry in
doc/configuration/adapters-http.md.
The setup example covers:
example) and switching the URL for region-specific endpoints
(e.g. api.moonshot.cn).
modelandthinkso users can pick a non-thinking K2 model or disable thinking on the K2-thinking variants.
top_p=0.95 server-side, matching the adapter's defaults.
The example is placed between the llama.cpp and Ollama sections — both
neighbours involve OpenAI-compatible reasoning configuration, which keeps
the page topically grouped.
AI Usage
Claude code to help determine correct format for adapters and wiring for the thinking_content, help generate tests
Related Issue(s)
N/a
Screenshots
N/a
Checklist
make allto ensure docs are generated, tests pass and StyLua has formatted the code