Sync-free MoE dispatch engine with CUDA-graph-safe routing for Qwen3.5-35B and Gemma4 on RTX Spark and RTX 5090
-
Updated
Jun 22, 2026 - C++
Sync-free MoE dispatch engine with CUDA-graph-safe routing for Qwen3.5-35B and Gemma4 on RTX Spark and RTX 5090
Reproducible MoE inference benchmarks for RTX Spark and RTX 5090: flash decode, grouped GEMM, end-to-end generation
Edge AI inference runtime: scheduler, memory manager, CUDA graph engine, KV cache, MoE dispatch
Add a description, image, and links to the sn74 topic page so that developers can more easily learn about it.
To associate your repository with the sn74 topic, visit your repo's landing page and select "manage topics."