Going deep on the layer below the model: LLM serving engines, KV-cache and attention internals, and GPU kernels, all built from scratch.
- 🌐 jvoltci.github.io: the climb, and the log
- 📚 Mosaic: my open course on AI systems, ML compilers, and inference (7 tracks)
- 🛠 Currently building: a from-scratch LLM inference engine (mini-vLLM). Benchmarks soon.





