Skip to content
View likhith-v1's full-sized avatar

Highlights

  • Pro

Block or report likhith-v1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
likhith-v1/README.md

Hi, I'm Likhith 👋

AI/ML engineer-in-training building inference systems from scratch. B.Tech CS (AI & ML), Jain University, Bengaluru — graduating 2027.

I care about understanding systems down to the metal rather than gluing together APIs. That shows up as a preference for local-first, privacy-first, zero-external-dependency engineering — from data prep through inference serving, in industrial environments where data never leaves the perimeter.

🔧 Currently building — inferd

A from-scratch LLM inference server, and my main portfolio project:

  • QLoRA fine-tuning pipeline on Qwen3-27B for domain adaptation
  • Speculative decoding with rejection sampling — draft model distilled via sequence-level KD against the fine-tuned target's own generations
  • Paged KV-cache implemented via Triton
  • Continuous batching — benchmarked throughput scaling ~6x from one to eight concurrent users

🗂️ Other work

  • Predictive Ghost-Text Daemon (Ideation) — always-on autocomplete via a resident Unix-socket inference process, sub-100ms latency budget, prefix caching, and shell-history personalization through a fine-tuning pipeline with secrets-scrubbing
  • prr — a self-hosted code-review bot pairing an Ollama model with ruff, mypy, and bandit; results normalized through a typed Pydantic schema, posts inline PR comments on GitHub

🧠 Skills

Python · Java · SQL · PyTorch · TensorFlow · scikit-learn · XGBoost · CUDA · Triton · MLX · LoRA/QLoRA · speculative decoding · paged KV-cache · continuous batching · RAG (FAISS/ChromaDB) · FastAPI · Docker · Ollama · Jenkins · Git/GitHub Actions

Theory foundation: constrained optimization, KKT conditions, multi-armed bandits.

🎯 What I'm looking for

AI/ML engineering internships with a path to full-time — especially roles where inference performance, model serving, or applied ML infrastructure are the actual job, not an afterthought.

📫 Reach me


Pinned Loading

  1. likhith-v1 likhith-v1 Public

    AI/ML engineer — inference optimization, fine-tuning, industrial AI. This repo powers my profile README.

  2. inferd inferd Public

    Local-first LLM stack on a single RTX 5090: QLoRA fine-tuning, exact speculative decoding, paged KV-cache, and continuous batching — served via FastAPI with a live React dashboard.

    Python 2

  3. prr prr Public

    A local Python code-review CLI that pairs ruff, mypy, and bandit with an Ollama model — review files, scan projects, and post inline GitHub PR comments.

    Python 1

  4. gemma4-mlx-finetune gemma4-mlx-finetune Public

    Python