rtx-spark

Here are 4 public repositories matching this topic...

Sync-free MoE dispatch engine with CUDA-graph-safe routing for Qwen3.5-35B and Gemma4 on RTX Spark and RTX 5090

cuda moe mixture-of-experts edge-ai nvidia-blackwell cuda-graphs inference-runtime gittensor sn74 rtx-spark

Native C++/CUDA and CuTe DSL kernel library for edge MoE inference: flash decode, sync-free GroupGEMM+SwiGLU, head_dim=512 attention

cuda moe cutlass edge-ai flash-attention nvidia-blackwell cute-dsl gittensor grouped-gemm rtx-spark

Reproducible MoE inference benchmarks for RTX Spark and RTX 5090: flash decode, grouped GEMM, end-to-end generation

benchmarking cuda moe edge-ai llm-inference nvidia-blackwell gittensor sn74 rtx-spark

Edge AI inference runtime: scheduler, memory manager, CUDA graph engine, KV cache, MoE dispatch

cuda moe edge-ai unified-memory llm-inference nvidia-blackwell inference-runtime gittensor sn74 rtx-spark

Add a description, image, and links to the rtx-spark topic page so that developers can more easily learn about it.

To associate your repository with the rtx-spark topic, visit your repo's landing page and select "manage topics."