Sync-free MoE dispatch engine with CUDA-graph-safe routing for Qwen3.5-35B and Gemma4 on RTX Spark and RTX 5090
-
Updated
Jun 22, 2026 - C++
Sync-free MoE dispatch engine with CUDA-graph-safe routing for Qwen3.5-35B and Gemma4 on RTX Spark and RTX 5090
Native C++/CUDA and CuTe DSL kernel library for edge MoE inference: flash decode, sync-free GroupGEMM+SwiGLU, head_dim=512 attention
Reproducible MoE inference benchmarks for RTX Spark and RTX 5090: flash decode, grouped GEMM, end-to-end generation
Edge AI inference runtime: scheduler, memory manager, CUDA graph engine, KV cache, MoE dispatch
Add a description, image, and links to the rtx-spark topic page so that developers can more easily learn about it.
To associate your repository with the rtx-spark topic, visit your repo's landing page and select "manage topics."