Skip to content
View vinhnx0's full-sized avatar

Block or report vinhnx0

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vinhnx0/README.md

Hi, I'm Xuân Vinh👋

AI Engineer / Data Science graduate focused on LLM applications, RAG systems, and practical AI solutions.
I build modular AI systems that combine retrieval, reasoning, and scalable data pipelines to solve real-world information problems.

  • Interested in: GenAI, Agentic RAG, NLP, AI Systems Engineering
  • Focused on building portfolio-driven, production-oriented AI projects
  • Currently exploring retrieval optimization, metadata-aware search, and evaluation for LLM systems

About Me

  • Built modular Financial RAG systems for SEC 10-K document QA with citation-backed retrieval
  • Experienced with hybrid retrieval, reranking, vector databases, and agentic workflows
  • Strong interest in practical AI engineering, evaluation pipelines, and scalable GenAI applications
  • Passionate about turning research concepts into deployable AI products

Featured Projects

AI Document Intelligence System (Agentic RAG)

Tech: Python, Qdrant, SentenceTransformers, FastAPI, LLMs, Docker

  • Built a modular Financial RAG pipeline for SEC 10-K document QA with citation-backed answers over Apple filings (2021–2025)
  • Implemented metadata-aware retrieval, cross-encoder reranking, and agentic retrieval workflows using Query Planner and Evidence Sufficiency Checker
  • Improved retrieval performance from Hit@1: 0.54 → 0.88 and MRR: 0.69 → 0.93 across 52 financial QA queries with ~4.9s average latency

AI Academic Guidance and Course Planning System

Tech: Python, BM25, Qdrant, Streamlit, Knowledge Graphs, ILP Solver

  • Built an academic guidance chatbot combining GenAI-based RAG for course lookup and algorithmic reasoning for curriculum planning
  • Developed a Hybrid RAG pipeline using BM25 + Qdrant, improving course lookup accuracy from 29% → 76%
  • Designed an algorithmic reasoning engine using Knowledge Graphs and ILP solvers to generate valid multi-year study plans with 100% prerequisite constraint accuracy

Tech Stack

Programming

Python SQL

AI / ML

NLP OCR LLMs RAG Machine Learning Deep Learning

Frameworks & Libraries

PyTorch TensorFlow/Keras scikit-learn SentenceTransformers

Data & Infrastructure

Qdrant PostgreSQL Docker Apache Kafka

Tools

Git FastAPI Flask Streamlit REST APIs


Currently Working On

  • Agentic RAG architectures for financial document intelligence
  • Retrieval evaluation and metadata-aware search optimization
  • LLM orchestration and low-latency AI pipelines
  • Production-ready GenAI applications with scalable backend systems

Connect with Me


Profile Direction

Building practical AI systems that combine retrieval, reasoning, and scalable data engineering for real-world applications.

📊 GitHub Analytics

Pinned Loading

  1. Agentic-RAG-Document-Intelligence Agentic-RAG-Document-Intelligence Public

    Production-style Agentic RAG system for document intelligence, built with a modular pipeline from ingestion to retrieval, focusing on structured data processing and scalable LLM applications.

    Python

  2. IUStudyGuide IUStudyGuide Public

    An Academic Guidance System featuring a dual-mind architecture: An LLM-powered RAG engine for natural document retrieval and an Algorithm-based solver (ILP + Knowledge Graph) for hallucination-free…

    Python

  3. 23092003e/Streaming-Fraud-Detection 23092003e/Streaming-Fraud-Detection Public

    A real-time fraud detection system end to end built with Python, Apache Kafla, PySpark, PosgreSQL. A live dashboard for reporting and analytics is created using PowerBI. All components are containe…

    Jupyter Notebook 2

  4. End-to-End-Fraud-Detection-Data-ML-Pipeline End-to-End-Fraud-Detection-Data-ML-Pipeline Public

    Production-grade end-to-end fraud detection pipeline leveraging a Medallion Data Lake architecture (Bronze/Silver/Gold), PySpark-based data processing, and machine learning for real-time prediction…

    Python