Popular repositories Loading
-
qlora-finetuning-and-attention
qlora-finetuning-and-attention PublicQLoRA fine-tuning of two small open LLMs (decoder-only and encoder-decoder) for audience-adaptive Q&A, plus from-scratch scaled dot-product attention in PyTorch.
Jupyter Notebook 1
-
customer-service-analytics-agent-mohammad-ghanayim
customer-service-analytics-agent-mohammad-ghanayim PublicPython
-
k8s-distributed-llm-finetuning
k8s-distributed-llm-finetuning PublicMulti-node PyTorch DDP fine-tuning of a causal LM on Nebius GPU Kubernetes, with SkyPilot workload orchestration — a 2-node torchrun job with verified NCCL collectives.
Python
-
observable-vllm-text2sql-agent
observable-vllm-text2sql-agent PublicText-to-SQL agent on vLLM (Qwen3-30B-A3B) with a LangGraph verify→revise loop, instrumented on two observability planes for metric-grounded SLO diagnosis.
Python
-
gpu-cuda-inference-optimization
gpu-cuda-inference-optimization PublicThree measured notebooks on GPU inference optimization: roofline analysis, KV-cache decode optimization (4.21x), and CUDA-graph launch-overhead elimination (5.38x). Pure PyTorch.
Jupyter Notebook
If the problem persists, check the GitHub status page or contact support.