fix(frontend): auto-detect force_reasoning when chat template appends <think>#8240
fix(frontend): auto-detect force_reasoning when chat template appends <think>#8240navmarri14 wants to merge 2 commits intoai-dynamo:mainfrom
Conversation
|
👋 Hi navmarri14! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughA new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
components/src/dynamo/frontend/sglang_prepost.py (1)
61-73: Consider adding a constant and type hint for clarity.The function is well-implemented, but a couple of minor improvements would enhance readability:
- The magic number
10could be documented or made a constant- The
tokenizerparameter lacks a type hint (though this may be intentional due to multiple tokenizer types)♻️ Optional: Add constant and brief inline comment
+# Lookback tokens to decode for <think> detection; covers typical tokenizations +_THINK_DETECT_LOOKBACK = 10 + + def detect_force_reasoning(tokenizer, prompt_token_ids: list[int]) -> bool: """Check if the chat template's generation prompt ends with ``<think>``. When the template appends ``<think>`` to the prompt, the model output starts inside a reasoning block without an explicit opening tag. The reasoning parser must be told to begin in reasoning mode (``force_reasoning=True``) so that it correctly separates reasoning content from normal content. """ if not prompt_token_ids: return False - tail = tokenizer.decode(prompt_token_ids[-10:], skip_special_tokens=False) + tail = tokenizer.decode( + prompt_token_ids[-_THINK_DETECT_LOOKBACK:], skip_special_tokens=False + ) return tail.rstrip().endswith("<think>")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/frontend/sglang_prepost.py` around lines 61 - 73, The detect_force_reasoning function uses a magic number 10 and lacks a tokenizer type hint; introduce a module-level constant (e.g., PROMPT_TAIL_TOKEN_WINDOW = 10) and replace the literal 10 in detect_force_reasoning with that constant, add a brief comment above the constant explaining it controls how many tail tokens to inspect, and add a permissive type hint for tokenizer (e.g., TokenizerLike or Any) on the detect_force_reasoning signature to document expected type while preserving compatibility with multiple tokenizer implementations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@components/src/dynamo/frontend/sglang_prepost.py`:
- Around line 61-73: The detect_force_reasoning function uses a magic number 10
and lacks a tokenizer type hint; introduce a module-level constant (e.g.,
PROMPT_TAIL_TOKEN_WINDOW = 10) and replace the literal 10 in
detect_force_reasoning with that constant, add a brief comment above the
constant explaining it controls how many tail tokens to inspect, and add a
permissive type hint for tokenizer (e.g., TokenizerLike or Any) on the
detect_force_reasoning signature to document expected type while preserving
compatibility with multiple tokenizer implementations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 427fe291-98f3-44c4-91b9-e5fdfd957a26
📒 Files selected for processing (2)
components/src/dynamo/frontend/sglang_prepost.pycomponents/src/dynamo/frontend/sglang_processor.py
|
/ok to test 52248ee |
| """ | ||
| if not prompt_token_ids: | ||
| return False | ||
| tail = tokenizer.decode(prompt_token_ids[-10:], skip_special_tokens=False) |
There was a problem hiding this comment.
can you explain this magic number?
|
Can you please attach a brief before and after in the PR description to help us debug in the future? |
| def detect_force_reasoning(tokenizer, prompt_token_ids: list[int]) -> bool: | ||
| """Check if the chat template's generation prompt ends with ``<think>``. | ||
|
|
||
| When the template appends ``<think>`` to the prompt, the model output | ||
| starts inside a reasoning block without an explicit opening tag. | ||
| The reasoning parser must be told to begin in reasoning mode | ||
| (``force_reasoning=True``) so that it correctly separates reasoning | ||
| content from normal content. | ||
| """ | ||
| if not prompt_token_ids: | ||
| return False | ||
| tail = tokenizer.decode(prompt_token_ids[-10:], skip_special_tokens=False) | ||
| return tail.rstrip().endswith("<think>") |
There was a problem hiding this comment.
If possible, add unit testing for detect_content_reasoning that uses a lightweight or mock tokenizer. Some possible test cases:
- Empty prompt →
False - Prompt ending with
<think>→True - Prompt ending with
<think>\n(whitespace) →True - Prompt NOT ending with
<think>→False
| if force_reasoning: | ||
| kwargs["force_reasoning"] = True |
There was a problem hiding this comment.
nit: Could remove the double force_reasoning derivation and just do something like:
if reasoning_parser_name and tokenizer and prompt_token_ids:
kwargs["force_reasoning"] = detect_force_reasoning(tokenizer, prompt_token_ids)
Would require adding tokenizer and prompt_token_ids to arg list.
|
@ishandhanani before notice after notice |
|
Does sglang also do this? I don't think so TBH. I think we should not diverge from SGLang's chat implementation. Because a lot of other models are relying on this processor as well. Is there a specific case that sglang can do but dynamo cannot? we need to find out where the real divergence happened. |
|
What's the sglang's output of the request? If this is a bug in sglang, we should fix the upstream first. |
sglang handles this as well although slightly differently:
|
Thanks! Yeah, let's align with sglang. Because someone could simply use vibe coding and say "align the dynamo sglang chat processor with sglang" in the prompt and the coding agent could get your change reverted, and no one will notice it. We'd better align closely. |
richardhuo-nv
left a comment
There was a problem hiding this comment.
What's the sglang's output of the request?
If this is a bug in sglang, we should fix the upstream first.
|
@navmarri14 LGTM! could you sign your commits? also could you resolve the issue in the pre-commit hooks: https://github.com/ai-dynamo/dynamo/actions/runs/24859235642/job/72780270461?pr=8240 Thanks for your contribution! |
|
/ok to test e8f0006 |
|
Sorry, I think there is merge confilicts now, could you rebase and rerun? |
|
/ok to test 6693101 |
|
@navmarri14 can you test locally with these three tests and see if it's solveable? Thanks! |
… <think> Signed-off-by: Naveen Marri <[email protected]> Signed-off-by: Liangjun Feng <[email protected]>
Signed-off-by: Liangjun Feng <[email protected]>
|
@richardhuo-nv (navmarri14 is on leave so taking over the work) I've fixed the failed test cases in the new commit. It turned out that the test model is Qwen3, with this change we will enable reasoning by default and model will have the output in the reasoning part instead of the normal response which will be parsed by tool call parser. So fixing the issue by disabling the reasoning parser if toolcall is set to required or a specific function (which aligns with sglang behavior). Also add more unit test for the new logic. PTAL and run the tests, thank you. |
|
/ok to test 4daad7b |
thank you so much! |
…
Overview:
auto-detect force_reasoning when chat template appends
<think>Details:
When a chat template's generation prompt ends with
<think>, thereasoning parser must start in reasoning mode so it correctly separates
thinking content from normal output. Detect this by inspecting the tail
of the tokenized prompt and pass
force_reasoning=Trueto theReasoningParser. Also pass
return_dict=Falsetoapply_chat_templatefor consistent tokenizer output.
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Improvements