Fast LLM speculative inference server for consumer hardware.
-
Updated
Jun 8, 2026 - C++
Fast LLM speculative inference server for consumer hardware.
Air.rs 70B+ inference on consumer GPU, LLM inference in Rust
A light, transparent, and modular inference & quantization engine for studying LLMs.
Add a description, image, and links to the megakernel topic page so that developers can more easily learn about it.
To associate your repository with the megakernel topic, visit your repo's landing page and select "manage topics."