I'm an MS in Artificial Intelligence student at Northeastern University (Khoury College), working at the intersection of foundation models, multimodal learning, and applied AI systems. Most of my recent work has been on adapting large pretrained models for new modalities and domains β vision transformers for medical imaging during my undergrad, and audio/language foundation models during my MS.
- CALM (Conformer Audio-Language Model) β a cross-modal fusion architecture pairing Gemma 4's audio Conformer with its text encoder via bidirectional attention. Reaches 83.9% on FMA-Medium 16-genre classification with ~4M trainable parameters, +9.1 pp over the best audio-only baseline. Currently extending toward a paper submission.
- Local LLM serving infrastructure β a home GPU server (RTX 5060 Ti, Ubuntu) reachable over Tailscale, running Gemma and other open models via Ollama and llama.cpp for offline development workflows.
- Coronary artery segmentation with SAM2 β fine-tuned SAM2 with two PEFT approaches achieving 92.0% Dice score, deployed as an interactive Hugging Face Space. (Live demo)
Foundation models, multimodal learning, and systems that actually run in the real world. I lean toward research that produces a working artifact at the end, not just a paper β but I take papers seriously too: my undergrad work on transformer-based atrial fibrillation detection (ResFormer) was presented at ICIOT'25.
MS at Northeastern (Boston). Open to research assistant, co-op, and collaborator roles in applied ML, multimodal AI, and AI systems integration.
Python, PyTorch, Hugging Face, gRPC, Docker, Linux, SLURM/HPC (Northeastern Explorer cluster). Comfortable with Swift, TypeScript, and the local LLM stack (Ollama, llama.cpp, Continue). W&B for experiment tracking.
- π€ Hugging Face
- πΌ LinkedIn

