perf(tokenizer): retain only decode lookback tokens in DecodeStream#9204
perf(tokenizer): retain only decode lookback tokens in DecodeStream#9204tangcy98 wants to merge 1 commit intoai-dynamo:mainfrom
Conversation
Signed-off-by: tangcy98 <[email protected]>
|
👋 Hi tangcy98! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
Walkthrough
ChangesIncremental Detokenization Windowing
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Overview:
This PR reduces
DecodeStreaminitialization overhead for long prompts by retaining only the prompt-token lookback window needed for incremental detokenization.Previously,
DecodeStream::newcopied the full prompt token list into its internal buffer, even though subsequent decoding only reads fromprompt_len - INITIAL_INCREMENTAL_DETOKENIZATION_OFFSET. For long ISL requests, this made the detokenizer keep an unnecessary O(ISL) copy of prompt tokens.This change keeps only the retained decode window, reducing the initial
DecodeStreamprompt-token copy from O(ISL) to O(5), while preserving the exact token slices used byDecodeStream::step.Details:
INITIAL_INCREMENTAL_DETOKENIZATION_OFFSET.DecodeStream::all_token_idswith onlyprompt_token_ids[retained_start..].prefix_offsetandread_offsetto local offsets within the retained buffer.skip_special_tokensbehavior is unchangedThis does not change the
PreprocessedRequest.token_idspath used by the engine/router. It only avoids an extra full-prompt copy inside the detokenizer state.Where should the reviewer start?
Start with:
lib/tokenizers/src/lib.rsThe key logic is in
DecodeStream::new. The important invariant is that the slices passed todecode()inDecodeStream::step()are content-equivalent to the previous implementation, just indexed relative to the retained lookback window.Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit