sn74

Here are 3 public repositories matching this topic...

Sync-free MoE dispatch engine with CUDA-graph-safe routing for Qwen3.5-35B and Gemma4 on RTX Spark and RTX 5090

cuda moe mixture-of-experts edge-ai nvidia-blackwell cuda-graphs inference-runtime gittensor sn74 rtx-spark

Reproducible MoE inference benchmarks for RTX Spark and RTX 5090: flash decode, grouped GEMM, end-to-end generation

benchmarking cuda moe edge-ai llm-inference nvidia-blackwell gittensor sn74 rtx-spark

Edge AI inference runtime: scheduler, memory manager, CUDA graph engine, KV cache, MoE dispatch

cuda moe edge-ai unified-memory llm-inference nvidia-blackwell inference-runtime gittensor sn74 rtx-spark

Add a description, image, and links to the sn74 topic page so that developers can more easily learn about it.

To associate your repository with the sn74 topic, visit your repo's landing page and select "manage topics."