pagedattention
Here are 12 public repositories matching this topic...
LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving
-
Updated
Apr 24, 2026 - Python
An Efficient and Versatile Inference Engine for Distributed LLM Serving
-
Updated
Jul 5, 2026 - Python
Why is LLM inference slow — and how do you make it fast? A hands-on, first-principles course: roofline → KV cache → quantization → parallelism → vLLM/SGLang, with GPU labs on open models.
-
Updated
Jul 1, 2026 - Python
A deterministic PyTorch autograd verification trap for catching silent KV-cache routing and block-alignment failures in vLLM and SGLang serving infrastructure.
-
Updated
Jun 7, 2026 - Python
LLM inference kernels from scratch in Triton: KV cache, FlashAttention, PagedAttention, RMSNorm, RoPE, SwiGLU, and benchmarks.
-
Updated
Jun 18, 2026 - Python
Deadline-aware KV-cache scheduling for protecting decode-critical request-state under long-context LLM inference pressure.
-
Updated
Jun 19, 2026 - Python
Empirical benchmarking harness mapping the boundary between compute saturation and PagedAttention KV-cache preemption cascades under streamed test-time compute scaling.
-
Updated
Jun 26, 2026 - Python
A minimal LLM inference engine implementing PagedAttention-style KV cache management on NanoGPT. Based on the "Efficient Memory Management for Large Language Model Serving with PagedAttention" paper.
-
Updated
Apr 16, 2026 - Jupyter Notebook
What to consider when running AI Inference at scale on Kubernetes
-
Updated
Jul 5, 2026
LLM inference serving prototype with continuous batching, paged KV-cache management, benchmark telemetry, and C++/ONNX Runtime gRPC backend scaffold.
-
Updated
Jun 29, 2026 - Python
From-scratch model of an LLM serving engine's systems core: paged KV-cache, continuous batching, preemption, and prefix caching — GPU-free, with reproducible benchmarks.
-
Updated
May 30, 2026 - Python
Improve this page
Add a description, image, and links to the pagedattention topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pagedattention topic, visit your repo's landing page and select "manage topics."