MGhanayim

MGhanayim

Popular repositories Loading

qlora-finetuning-and-attention qlora-finetuning-and-attention Public

QLoRA fine-tuning of two small open LLMs (decoder-only and encoder-decoder) for audience-adaptive Q&A, plus from-scratch scaled dot-product attention in PyTorch.

Jupyter Notebook 1
customer-service-analytics-agent-mohammad-ghanayim customer-service-analytics-agent-mohammad-ghanayim Public

Python
k8s-distributed-llm-finetuning k8s-distributed-llm-finetuning Public

Multi-node PyTorch DDP fine-tuning of a causal LM on Nebius GPU Kubernetes, with SkyPilot workload orchestration — a 2-node torchrun job with verified NCCL collectives.

Python
observable-vllm-text2sql-agent observable-vllm-text2sql-agent Public

Text-to-SQL agent on vLLM (Qwen3-30B-A3B) with a LangGraph verify→revise loop, instrumented on two observability planes for metric-grounded SLO diagnosis.

Python
gpu-cuda-inference-optimization gpu-cuda-inference-optimization Public

Three measured notebooks on GPU inference optimization: roofline analysis, KV-cache decode optimization (4.21x), and CUDA-graph launch-overhead elimination (5.38x). Pure PyTorch.

Jupyter Notebook