DNM - dsr1 Mi355x test#1283
Conversation
…transformers v5 Transformers v5 incorrectly rebuilds pre_tokenizer/decoder components for models like DeepSeek-R1 that use LlamaTokenizerFast with a non-Llama tokenizer architecture. The sglang server fixes this at startup, but the benchmark client loads the tokenizer without these fixes, causing a ~5x token count inflation (e.g. 7000 tokens -> 35000 tokens) and false performance regressions in TTFT and throughput benchmarks. Apply the same tokenizer fixes (pre_tokenizer/decoder restoration and add_bos_token recovery) that sglang server applies, so client and server tokenize identically. No-op on transformers v4. Made-with: Cursor
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25395388503 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25397018357 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25399743027 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25401398398 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25402882560 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25403887062 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25409238836 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25413029415 |
4eab18e to
b8b305b
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25464374364 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25466303569 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25467958241 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25468709876 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25469344105 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25469344105 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25471538848 |
No description provided.