llm-inference-optimization

Star

Here are 3 public repositories matching this topic...

handdl / efficient-batching

Star

Optimizing LLM throughput via binned padding, sequence packing, and Flash Attention.

batching flash-att llm-inference-optimization

Updated May 11, 2026
Python

Qudsiaamir / efficiently-serving-llms

Star

Production-oriented LLM serving examples covering KV-cache decoding, batching, quantization, LoRA, multi-LoRA, testing, benchmarking, and reproducible MLOps workflows.

python docker benchmarking makefile jupyter-notebook pytorch pytest lora cicd quantization ruff mlops github-actions kv-cache huggingface-transformers llm-inference multi-lora llm-inference-optimization reproducible-ml-workflows

Updated May 26, 2026
Jupyter Notebook

handdl / extreme-offloading

Star

Layer-wise weight offloading with profiling-driven optimizations.

offloading llm-inference-optimization

Updated May 13, 2026
Python

Improve this page

Add a description, image, and links to the llm-inference-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-inference-optimization

Here are 3 public repositories matching this topic...

handdl / efficient-batching

Qudsiaamir / efficiently-serving-llms

handdl / extreme-offloading

Improve this page

Add this topic to your repo