chunked-prefill

Here are 3 public repositories matching this topic...

An Efficient and Versatile Inference Engine for Distributed LLM Serving

A lightweight, educational LLM inference engine for studying continuous batching, paged KV cache, chunked prefill, and online serving.

A minimal, native Metal inference engine for Qwen3-30B-A3B on Apple Silicon Macs.

c macos metal objective-c transformer moe quantization inference-engine apple-silicon local-llm llm-inference gguf qwen3 chunked-prefill metal-kernels

Add a description, image, and links to the chunked-prefill topic page so that developers can more easily learn about it.

To associate your repository with the chunked-prefill topic, visit your repo's landing page and select "manage topics."