fix(frontend): auto-detect force_reasoning when chat template appends <think> by navmarri14 · Pull Request #8240 · ai-dynamo/dynamo

navmarri14 · 2026-04-15T18:34:34Z

…

Overview:

auto-detect force_reasoning when chat template appends <think>

Details:

When a chat template's generation prompt ends with <think>, the
reasoning parser must start in reasoning mode so it correctly separates
thinking content from normal output. Detect this by inspecting the tail
of the tokenized prompt and pass force_reasoning=True to the
ReasoningParser. Also pass return_dict=False to apply_chat_template
for consistent tokenizer output.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Automatic detection of reasoning-enabled prompts to ensure proper parsing.
- Enhanced reasoning parser configuration for improved handling of edge cases and complex scenarios.
Improvements
- Optimized tokenizer template handling for more efficient prompt processing.
- Refined multi-process reasoning support to maintain consistency across request processing.

copy-pr-bot · 2026-04-15T18:34:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-04-15T18:35:08Z

👋 Hi navmarri14! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2026-04-15T18:39:38Z

Walkthrough

A new detect_force_reasoning helper function checks if prompts end with "<think>" tokens. The function is applied in preprocessing to detect force reasoning intent, and the result is passed through parser creation methods. Tokenizer template application was also updated to disable dictionary output.

Changes

Cohort / File(s)	Summary
Force Reasoning Detection `components/src/dynamo/frontend/sglang_prepost.py`	Added `detect_force_reasoning()` helper function that decodes prompt tail to detect `"<think>"` endings. Updated `create_parsers()` signature with `force_reasoning` parameter and modified `preprocess_chat_request()` to compute and pass force reasoning status to parser creation. Changed tokenizer template application to include `return_dict: False`.
Multi-Process Integration `components/src/dynamo/frontend/sglang_processor.py`	Imported `detect_force_reasoning` and integrated force reasoning computation in `_generator_inner_pool()`. Updated `create_parsers()` call in the multi-process generation path to forward the computed `force_reasoning` flag.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: auto-detecting force_reasoning when a chat template appends , which aligns directly with the primary objective and file changes.
Description check	✅ Passed	The description covers the overview, details of changes, and implementation approach, though it lacks specific file recommendations and contains a placeholder for related issues.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

components/src/dynamo/frontend/sglang_prepost.py (1)

61-73: Consider adding a constant and type hint for clarity.

The function is well-implemented, but a couple of minor improvements would enhance readability:

The magic number 10 could be documented or made a constant
The tokenizer parameter lacks a type hint (though this may be intentional due to multiple tokenizer types)

♻️ Optional: Add constant and brief inline comment

+# Lookback tokens to decode for <think> detection; covers typical tokenizations
+_THINK_DETECT_LOOKBACK = 10
+
+
 def detect_force_reasoning(tokenizer, prompt_token_ids: list[int]) -> bool:
     """Check if the chat template's generation prompt ends with ``<think>``.
 
     When the template appends ``<think>`` to the prompt, the model output
     starts inside a reasoning block without an explicit opening tag.
     The reasoning parser must be told to begin in reasoning mode
     (``force_reasoning=True``) so that it correctly separates reasoning
     content from normal content.
     """
     if not prompt_token_ids:
         return False
-    tail = tokenizer.decode(prompt_token_ids[-10:], skip_special_tokens=False)
+    tail = tokenizer.decode(
+        prompt_token_ids[-_THINK_DETECT_LOOKBACK:], skip_special_tokens=False
+    )
     return tail.rstrip().endswith("<think>")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/frontend/sglang_prepost.py` around lines 61 - 73, The
detect_force_reasoning function uses a magic number 10 and lacks a tokenizer
type hint; introduce a module-level constant (e.g., PROMPT_TAIL_TOKEN_WINDOW =
10) and replace the literal 10 in detect_force_reasoning with that constant, add
a brief comment above the constant explaining it controls how many tail tokens
to inspect, and add a permissive type hint for tokenizer (e.g., TokenizerLike or
Any) on the detect_force_reasoning signature to document expected type while
preserving compatibility with multiple tokenizer implementations.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@components/src/dynamo/frontend/sglang_prepost.py`:
- Around line 61-73: The detect_force_reasoning function uses a magic number 10
and lacks a tokenizer type hint; introduce a module-level constant (e.g.,
PROMPT_TAIL_TOKEN_WINDOW = 10) and replace the literal 10 in
detect_force_reasoning with that constant, add a brief comment above the
constant explaining it controls how many tail tokens to inspect, and add a
permissive type hint for tokenizer (e.g., TokenizerLike or Any) on the
detect_force_reasoning signature to document expected type while preserving
compatibility with multiple tokenizer implementations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 427fe291-98f3-44c4-91b9-e5fdfd957a26

📥 Commits

Reviewing files that changed from the base of the PR and between 2ac22df and 8b06e79.

📒 Files selected for processing (2)

components/src/dynamo/frontend/sglang_prepost.py
components/src/dynamo/frontend/sglang_processor.py

ishandhanani · 2026-04-22T17:43:32Z

/ok to test 52248ee

ishandhanani · 2026-04-22T20:53:36Z

+    """
+    if not prompt_token_ids:
+        return False
+    tail = tokenizer.decode(prompt_token_ids[-10:], skip_special_tokens=False)


can you explain this magic number?

ishandhanani · 2026-04-22T20:54:19Z

Can you please attach a brief before and after in the PR description to help us debug in the future?

KrishnanPrash · 2026-04-22T23:17:25Z

+def detect_force_reasoning(tokenizer, prompt_token_ids: list[int]) -> bool:
+    """Check if the chat template's generation prompt ends with ``<think>``.
+
+    When the template appends ``<think>`` to the prompt, the model output
+    starts inside a reasoning block without an explicit opening tag.
+    The reasoning parser must be told to begin in reasoning mode
+    (``force_reasoning=True``) so that it correctly separates reasoning
+    content from normal content.
+    """
+    if not prompt_token_ids:
+        return False
+    tail = tokenizer.decode(prompt_token_ids[-10:], skip_special_tokens=False)
+    return tail.rstrip().endswith("<think>")


If possible, add unit testing for detect_content_reasoning that uses a lightweight or mock tokenizer. Some possible test cases:

Empty prompt → False

Prompt ending with <think> → True

Prompt ending with <think>\n (whitespace) → True

Prompt NOT ending with <think> → False

KrishnanPrash · 2026-04-22T23:24:22Z

+        if force_reasoning:
+            kwargs["force_reasoning"] = True


nit: Could remove the double force_reasoning derivation and just do something like:

if reasoning_parser_name and tokenizer and prompt_token_ids: kwargs["force_reasoning"] = detect_force_reasoning(tokenizer, prompt_token_ids)

Would require adding tokenizer and prompt_token_ids to arg list.

navmarri14 · 2026-04-22T23:58:07Z

@ishandhanani
payload

curl -s http://localhost:8000/v1/chat/completions     -H "Content-Type: application/json"     -d '{
  "model": "/tmp-nvme/models/glm5-nvfp4",
  "messages": [
    {"role": "user", "content": "Compute 1+1!"}
  ],
  "stream": false,
  "max_tokens": 200,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "calculate_sum",
        "description": "Calculates the sum of two numbers.",
        "parameters": {
          "type": "object",
          "properties": {
            "a": {"type": "number", "description": "The first number to add."},
            "b": {"type": "number", "description": "The second number to add."}
          },
          "required": ["a", "b"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}'

before

{"id":"chatcmpl-98a27d7a-7b7a-4056-88cd-66c3a999853f","choices":[{"index":0,"message":{"content":"The user is asking me to compute 1+1. This is a simple addition problem. I have a function called \"calculate_sum\" that takes two numbers and returns their sum. Since this is exactly what the user is asking for, I should use this function.\n\nThe parameters needed are:\n- a: 1 (the first number)\n- b: 1 (the second number)\n\nBoth parameters are provided implicitly in the request \"1+1\".</think>I'll help you compute 1+1 using the available calculation function.<tool_call>calculate_sum<arg_key>a</arg_key><arg_value>1</arg_value><arg_key>b</arg_key><arg_value>1</arg_value></tool_call>","role":"assistant","reasoning_content":null},"finish_reason":"stop"}],"created":1776202733,"model":"/mnt/models","object":"chat.completion","usage":{"prompt_tokens":173,"completion_tokens":125,"total_tokens":298},"nvext":{"worker_id":{"prefill_worker_id":782000407404662,"prefill_dp_rank":0,"decode_worker_id":782000407404662,"decode_dp_rank":0},"timing":{"request_received_ms":1776202733323,"prefill_wait_time_ms":0.8677819999999999,"prefill_time_ms":171.25239,"ttft_ms":172.120172,"total_time_ms":1700.60159,"kv_hit_rate":0.0,"router_queue_depth":0}}}

notice reasoning_content is null

after

{"id":"969fe798d036b549","choices":[{"index":0,"message":{"content":"I'll calculate 1+1 for you using the sum function.","tool_calls":[{"id":"call_a387f52be1b5be7e","type":"function","function":{"name":"calculate_sum","arguments":"{\"a\": 1, \"b\": 1}"}}],"role":"assistant","reasoning_content":"The user wants me to compute 1+1. I have a function called calculate_sum that can calculate the sum of two numbers. I should use this function to compute 1+1.\n\nLooking at the function parameters:\n- a: number (required) - I'll use 1\n- b: number (required) - I'll use 1\n\nBoth required parameters are provided, so I can make the function call."},"finish_reason":"tool_calls"}],"created":1776276679,"model":"/tmp-nvme/models/glm5-nvfp4","object":"chat.completion","usage":{"prompt_tokens":181,"completion_tokens":119,"total_tokens":300}}

notice reasoning content is populated.

richardhuo-nv · 2026-04-23T00:20:06Z

Does sglang also do this? I don't think so TBH.

I think we should not diverge from SGLang's chat implementation. Because a lot of other models are relying on this processor as well.

Is there a specific case that sglang can do but dynamo cannot? we need to find out where the real divergence happened.

richardhuo-nv · 2026-04-23T00:24:24Z

What's the sglang's output of the request?

If this is a bug in sglang, we should fix the upstream first.

navmarri14 · 2026-04-23T01:19:51Z

Does sglang also do this? I don't think so TBH.

I think we should not diverge from SGLang's chat implementation. Because a lot of other models are relying on this processor as well.

Is there a specific case that sglang can do but dynamo cannot? we need to find out where the real divergence happened.

sglang handles this as well although slightly differently:

Template-level static flag populated once on TemplateManager.force_reasoning (ref) used in serving_chat.py.
Per-request override in _get_reasoning_from_request, which is just a parser lookup table .
Both are then OR'd together and gated by request.separate_reasoning (ref).
I can re-align the implementation to be close to sglang.

richardhuo-nv · 2026-04-23T16:26:47Z

Does sglang also do this? I don't think so TBH.
I think we should not diverge from SGLang's chat implementation. Because a lot of other models are relying on this processor as well.
Is there a specific case that sglang can do but dynamo cannot? we need to find out where the real divergence happened.

sglang handles this as well although slightly differently:

Template-level static flag populated once on TemplateManager.force_reasoning (ref) used in serving_chat.py.

Per-request override in _get_reasoning_from_request, which is just a parser lookup table .

Both are then OR'd together and gated by request.separate_reasoning (ref).
I can re-align the implementation to be close to sglang.

Thanks! Yeah, let's align with sglang. Because someone could simply use vibe coding and say "align the dynamo sglang chat processor with sglang" in the prompt and the coding agent could get your change reverted, and no one will notice it. We'd better align closely.

richardhuo-nv

What's the sglang's output of the request?

If this is a bug in sglang, we should fix the upstream first.

richardhuo-nv · 2026-04-28T05:08:40Z

@navmarri14 LGTM! could you sign your commits?

In your local branch, run: git rebase HEAD~4 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin main

also could you resolve the issue in the pre-commit hooks: https://github.com/ai-dynamo/dynamo/actions/runs/24859235642/job/72780270461?pr=8240

Thanks for your contribution!

richardhuo-nv · 2026-04-29T04:06:18Z

/ok to test e8f0006

richardhuo-nv · 2026-04-29T16:19:08Z

Sorry, I think there is merge confilicts now, could you rebase and rerun?

richardhuo-nv · 2026-04-30T18:34:21Z

/ok to test 6693101

richardhuo-nv · 2026-04-30T19:46:12Z

@navmarri14
looks like these three tool calling tests are failed pretty consistent after your change:

FAILED tests/frontend/test_tool_calling_sglang.py::TestToolCallingProtocol::test_tool_choice_required_forces_a_tool_call[chat_processor_frontend]
FAILED tests/frontend/test_tool_calling_sglang.py::TestToolCallingProtocol::test_named_tool_choice_forces_specific_function[chat_processor_frontend]
FAILED tests/frontend/test_tool_calling_sglang.py::TestToolCallingProtocol::test_array_argument_schema_valid[chat_processor_frontend]

can you test locally with these three tests and see if it's solveable? Thanks!

… <think> Signed-off-by: Naveen Marri <[email protected]> Signed-off-by: Liangjun Feng <[email protected]>

Signed-off-by: Liangjun Feng <[email protected]>

liangjuf · 2026-05-06T04:37:37Z

@richardhuo-nv (navmarri14 is on leave so taking over the work) I've fixed the failed test cases in the new commit. It turned out that the test model is Qwen3, with this change we will enable reasoning by default and model will have the output in the reasoning part instead of the normal response which will be parsed by tool call parser. So fixing the issue by disabling the reasoning parser if toolcall is set to required or a specific function (which aligns with sglang behavior). Also add more unit test for the new logic. PTAL and run the tests, thank you.

richardhuo-nv · 2026-05-06T04:39:09Z

/ok to test 4daad7b

richardhuo-nv · 2026-05-06T04:39:22Z

@richardhuo-nv (navmarri14 is on leave so taking over the work) I've fixed the failed test cases in the new commit. It turned out that the test model is Qwen3, with this change we will enable reasoning by default and model will have the output in the reasoning part instead of the normal response which will be parsed by tool call parser. So fixing the issue by disabling the reasoning parser if toolcall is set to required or a specific function (which aligns with sglang behavior). Also add more unit test for the new logic. PTAL and run the tests, thank you.

thank you so much!

navmarri14 requested review from a team as code owners April 15, 2026 18:34

pull-request-size Bot added the size/M label Apr 15, 2026

github-actions Bot added fix frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 15, 2026

github-actions Bot added the external-contribution Pull request is from an external contributor label Apr 15, 2026

navmarri14 force-pushed the main branch from 8b06e79 to 8d81fd0 Compare April 15, 2026 18:37

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

navmarri14 marked this pull request as draft April 16, 2026 05:38

navmarri14 marked this pull request as ready for review April 21, 2026 17:34

copy-pr-bot Bot temporarily deployed to GITLAB April 22, 2026 17:43 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 22, 2026 18:26 Inactive

ishandhanani approved these changes Apr 22, 2026

View reviewed changes

ishandhanani reviewed Apr 22, 2026

View reviewed changes

KrishnanPrash reviewed Apr 22, 2026

View reviewed changes

KrishnanPrash approved these changes Apr 22, 2026

View reviewed changes

pull-request-size Bot added size/L and removed size/M labels Apr 23, 2026

richardhuo-nv approved these changes Apr 28, 2026

View reviewed changes

navmarri14 force-pushed the main branch from 09bdfbc to ef6d869 Compare April 28, 2026 22:36

copy-pr-bot Bot temporarily deployed to GITLAB April 29, 2026 04:06 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 29, 2026 04:51 Inactive

navmarri14 force-pushed the main branch from e8f0006 to 6693101 Compare April 30, 2026 18:20

copy-pr-bot Bot temporarily deployed to GITLAB April 30, 2026 18:34 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 30, 2026 18:49 Inactive

navmarri14 and others added 2 commits May 6, 2026 04:30

fix(frontend): auto-detect force_reasoning when chat template appends…

eb374a4

… <think> Signed-off-by: Naveen Marri <[email protected]> Signed-off-by: Liangjun Feng <[email protected]>

fix failed test & add more tests

4daad7b

Signed-off-by: Liangjun Feng <[email protected]>

liangjuf force-pushed the main branch from 9e09035 to 4daad7b Compare May 6, 2026 04:32

copy-pr-bot Bot temporarily deployed to GITLAB May 6, 2026 04:39 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 6, 2026 04:46 Inactive

Conversation

navmarri14 commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

coderabbitai Bot commented Apr 15, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ishandhanani commented Apr 22, 2026

Uh oh!

ishandhanani Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ishandhanani commented Apr 22, 2026

Uh oh!

KrishnanPrash Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KrishnanPrash Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

navmarri14 commented Apr 22, 2026

Uh oh!

richardhuo-nv commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardhuo-nv commented Apr 23, 2026

Uh oh!

navmarri14 commented Apr 23, 2026

Uh oh!

richardhuo-nv commented Apr 23, 2026

Uh oh!

richardhuo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

richardhuo-nv commented Apr 28, 2026

Uh oh!

richardhuo-nv commented Apr 29, 2026

Uh oh!

richardhuo-nv commented Apr 29, 2026

Uh oh!

richardhuo-nv commented Apr 30, 2026

Uh oh!

richardhuo-nv commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liangjuf commented May 6, 2026

Uh oh!

richardhuo-nv commented May 6, 2026

Uh oh!

richardhuo-nv commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

navmarri14 commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

KrishnanPrash Apr 22, 2026 •

edited

Loading

KrishnanPrash Apr 22, 2026 •

edited

Loading

richardhuo-nv commented Apr 23, 2026 •

edited

Loading

richardhuo-nv commented Apr 30, 2026 •

edited

Loading