[Examples]: GPT-oss, Qwen3Moe streaming specdec example#1692
Conversation
Signed-off-by: h-guo18 <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis PR introduces streaming mode chat template support for speculative decoding, adds a comprehensive training chat template for gpt-oss-20b with tool calling capabilities, creates new multi-node DFlash and EAGLE3 pipeline configurations for both gpt-oss-20b and Qwen3-30B-A3B, and optimizes benchmark performance parameters across multiple existing Qwen3-8B and Kimi-K2.5 configurations. ChangesStreaming Chat Template Support
gpt-oss-20b Speculative Decoding Pipelines
Qwen3-30B-A3B Multi-Node Pipelines
Benchmark Optimization Updates
🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: h-guo18 <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1692 +/- ##
==========================================
- Coverage 67.72% 67.54% -0.18%
==========================================
Files 511 511
Lines 56168 56498 +330
==========================================
+ Hits 38037 38164 +127
- Misses 18131 18334 +203
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Signed-off-by: h-guo18 <[email protected]>
Signed-off-by: h-guo18 <[email protected]>
Signed-off-by: h-guo18 <[email protected]>
Signed-off-by: h-guo18 <[email protected]>
Signed-off-by: h-guo18 <[email protected]>
Signed-off-by: h-guo18 <[email protected]>
Signed-off-by: h-guo18 <[email protected]>
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tools/launcher/examples/openai/gpt-oss-20b/chat_template_train.jinja (1)
136-140: 📐 Maintainability & Code Quality | 💤 Low valueRedundant conditional: both branches output the same value.
Both the
ifandelsebranches emit",\n". Simplify to a single statement:♻️ Suggested simplification
- {%- if not loop.last %} - {{- ",\n" }} - {%- else %} - {{- ",\n" }} - {%- endif -%} + {{- ",\n" }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/openai/gpt-oss-20b/chat_template_train.jinja` around lines 136 - 140, The conditional block checking "if not loop.last" is redundant because both branches emit the same string; remove the entire if/else and replace it with a single output of the comma+newline token so the template simply emits ",\n" (preserving the existing whitespace control like the surrounding {{- / -}} markers) where the conditional currently appears.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/launcher/examples/openai/gpt-oss-20b/chat_template_train.jinja`:
- Around line 47-52: The variable has_object_variants is being set inside a
Jinja2 for-loop which creates a new local scope, so the outer flag never
changes; switch to a namespace() to persist the flag: create a namespace like ns
= namespace(has_object_variants=False), inside the for-loop assign
ns.has_object_variants = True when variant.type == "object", and later check
ns.has_object_variants instead of has_object_variants (references:
has_object_variants, param_spec.oneOf, variant.type).
In
`@tools/launcher/examples/openai/gpt-oss-20b/hf_streaming_dflash_multi_node.yaml`:
- Around line 97-98: The YAML currently hardcodes EXPORT_EXTRA_ARGS:
"--trust_remote_code", which propagates through train_eagle_streaming.sh into
export_hf_checkpoint.py and ultimately into load_vlm_or_llm()
(AutoConfig/AutoModel.from_pretrained trust_remote_code), creating an RCE risk;
remove the hardcoded value and make it opt-in by defaulting EXPORT_EXTRA_ARGS to
empty (or adding a dedicated flag like ENABLE_TRUST_REMOTE_CODE) in the YAML and
update train_eagle_streaming.sh to pass that flag only when explicitly set, and
ensure export_hf_checkpoint.py only forwards trust_remote_code when the flag is
present (or document a clear, audited exception) so load_vlm_or_llm() receives
trust_remote_code=true only when explicitly requested.
---
Nitpick comments:
In `@tools/launcher/examples/openai/gpt-oss-20b/chat_template_train.jinja`:
- Around line 136-140: The conditional block checking "if not loop.last" is
redundant because both branches emit the same string; remove the entire if/else
and replace it with a single output of the comma+newline token so the template
simply emits ",\n" (preserving the existing whitespace control like the
surrounding {{- / -}} markers) where the conditional currently appears.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 34056ab8-ec72-4e9d-b618-ab8b466cd8a1
📒 Files selected for processing (14)
examples/speculative_decoding/eagle_utils.pytools/launcher/examples/Qwen/Qwen3-30B-A3B/hf_streaming_dflash_multi_node.yamltools/launcher/examples/Qwen/Qwen3-30B-A3B/hf_streaming_eagle3_multi_node.yamltools/launcher/examples/Qwen/Qwen3-8B/eagle3_quick_check.yamltools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yamltools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3_ptq.yamltools/launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yamltools/launcher/examples/Qwen/Qwen3-8B/hf_streaming_eagle3.yamltools/launcher/examples/Qwen/Qwen3-8B/hf_streaming_eagle3_multi_node.yamltools/launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_dflash_multi_node.yamltools/launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_eagle3_multi_node.yamltools/launcher/examples/openai/gpt-oss-20b/chat_template_train.jinjatools/launcher/examples/openai/gpt-oss-20b/hf_streaming_dflash_multi_node.yamltools/launcher/examples/openai/gpt-oss-20b/hf_streaming_eagle3_multi_node.yaml
| {%- set has_object_variants = false -%} | ||
| {%- for variant in param_spec.oneOf -%} | ||
| {%- if variant.type == "object" -%} | ||
| {%- set has_object_variants = true -%} | ||
| {%- endif -%} | ||
| {%- endfor -%} |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Jinja2 scoping bug: has_object_variants never persists outside the loop.
In Jinja2, set inside a for loop creates a new local scope—the assignment won't affect the outer variable. This means has_object_variants will always be false after the loop, regardless of whether object variants exist. The condition at line 53 will never be true.
You correctly use the namespace() pattern elsewhere (lines 209, 277). Apply the same fix here:
🐛 Proposed fix
- {%- set has_object_variants = false -%}
+ {%- set ns = namespace(has_object_variants=false) -%}
{%- for variant in param_spec.oneOf -%}
{%- if variant.type == "object" -%}
- {%- set has_object_variants = true -%}
+ {%- set ns.has_object_variants = true -%}
{%- endif -%}
{%- endfor -%}
- {%- if has_object_variants and param_spec.oneOf|length > 1 -%}
+ {%- if ns.has_object_variants and param_spec.oneOf|length > 1 -%}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/examples/openai/gpt-oss-20b/chat_template_train.jinja` around
lines 47 - 52, The variable has_object_variants is being set inside a Jinja2
for-loop which creates a new local scope, so the outer flag never changes;
switch to a namespace() to persist the flag: create a namespace like ns =
namespace(has_object_variants=False), inside the for-loop assign
ns.has_object_variants = True when variant.type == "object", and later check
ns.has_object_variants instead of has_object_variants (references:
has_object_variants, param_spec.oneOf, variant.type).
|
/claude review |
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
This is an examples-only PR (launcher YAMLs + a Jinja chat template + a small Python edit). Reviewed for correctness:
Scope:
- 4 new pipeline YAMLs (gpt-oss-20b/Qwen3-30B-A3B × DFlash/EAGLE3)
- New
chat_template_train.jinjafor gpt-oss-20b (copy of upstream OpenAI Harmony template +{% generation %}tags around assistant content foranswer_only_loss=true) eagle_utils.py: installdata.chat_templateon the tokenizer for streaming mode (correctly mirrors that the streaming dataset callsapply_chat_templatewithout achat_template=kwarg, while the online collator threads it through)- Concurrency 1→32 in 6 Qwen3-8B benchmark configs; max_steps 500→2000 for Kimi-K2.5
Verified:
- Layer-id math matches helper functions:
default_eagle_aux_layer_ids(48)=[1,23,44]anddefault_eagle_aux_layer_ids(24)=[1,11,20];build_target_layer_ids(48,5)=[1,12,23,34,45]andbuild_target_layer_ids(24,5)=[1,6,11,16,21]. Capture IDs in YAML are those +1 plus final layer, as the comments document. SERVE_MAX_MODEL_LENis consumed intrain_eagle_streaming.sh:154✓EXPORT_EXTRA_ARGS="--trust_remote_code"for DFlash is intentional and gated (used identically in existing Kimi DFlash examples; the export script's--trust_remote_codeis opt-in via argparse).data.chat_templatepath resolution matches existing pattern (examples/<Org>/<Model>/chat_template_train.jinjais relative to the script's working dir, same as existing Qwen3-0.6B example).- Tokenizer mutation (
tokenizer.chat_template = chat_template) is fine — one-shot setup, called once per training run.
Notes (not blocking):
- CodeRabbit flagged a Jinja2 scoping bug in
render_typescript_type's{%- set has_object_variants -%}inside a for-loop — that's a real bug, but it's inherited from the upstream OpenAI gpt-oss-20b chat template (this template is copied from there). It only triggers on tool schemas withoneOfcontaining object variants; daring-anteater (the validated training data) has no tool calls, so the bug is dead code in this training context. Worth fixing if/when upstream does, but doesn't block. - PR description's vLLM AL results table confirms all four configs train and pass acceptance-length eval end-to-end.
What does this PR do?
Type of change: new example
Adds streaming speculative-decoding examples (EAGLE3 + DFlash) for gpt-oss-20b and Qwen3-30B-A3B to the ModelOpt launcher, mirroring the existing Qwen3-8B/Kimi examples.
tools/launcher/examples/{openai/gpt-oss-20b,Qwen/Qwen3-30B-A3B}/hf_streaming_{eagle3,dflash}_multi_node.yaml, plus gpt-osschat_template_train.jinja(generation-tagged, foranswer_only_loss).eagle_utils.py: the streaming path now installs a customdata.chat_templateon the tokenizer (the online path already did) — needed for the tagged template.Usage
Testing
Pipeline sanity test on unsynthesized data (daring-anteater), 1× H100-80GB, 12k steps. All four train and pass the vLLM acceptance-length eval:
Before your PR is "Ready for review"
CONTRIBUTING.md: N/ASummary by CodeRabbit
Release Notes
New Features
Enhancements