This repository contains a collection of 4 projects (completed as .ipynb notebooks) developed as part of a Machine Learning course. The files cover a wide range of topics, including data analysis, classical machine learning, deep learning, and modern generative AI solutions.
- Exploratory Data Analysis (EDA) & Preprocessing - Data cleaning, missing value imputation, feature engineering, and exploring correlations and distributions using data visualizations (Matplotlib/Seaborn), demonstrated on the Titanic passenger dataset.
- Classical Classification Models - Implementation, training, and evaluation of traditional machine learning algorithms, along with hyperparameter tuning using cross-validation.
- Computer Vision - Designing, compiling, and training a custom Convolutional Neural Network (CNN) in a GPU-accelerated environment for image classification and handwritten digit recognition using the MNIST dataset.
- Natural Language Processing (NLP) & LLMs - Implementation of an advanced RAG (Retrieval-Augmented Generation) system. The code covers document loading and text splitting from PDF files (
PyPDFLoader,RecursiveCharacterTextSplitter), generating semantic embeddings (HuggingFaceEmbeddings), creating a FAISS vector store, and integrating it with a Large Language Model (LLM) via the LangChain framework for intelligent question-answering.
- Language: Python
- Core ML / Data Science Libraries: Pandas, NumPy, Scikit-Learn, Pathlib
- Deep Learning & NLP: PyTorch, Transformers, LangChain, LangChain-Community, Sentence-Transformers
- Vector Databases: FAISS
- Data Visualization: Matplotlib, Seaborn
- Upload notebook to your Google Drive or open it directly in Google Colab.
- Upload the Titanic dataset files.
- Run the notebook cells sequentially.
- The notebook includes both code cells and markdown cells explaining every step to help understand the workflow.
Model performance metrics are displayed inside the notebook with visualizations and explanations.