Skip to content

AmeliaSroczynska/Machine_Learning_Course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Course - Titanic

This repository contains a collection of 4 projects (completed as .ipynb notebooks) developed as part of a Machine Learning course. The files cover a wide range of topics, including data analysis, classical machine learning, deep learning, and modern generative AI solutions.

What's inside the repository?

  • Exploratory Data Analysis (EDA) & Preprocessing - Data cleaning, missing value imputation, feature engineering, and exploring correlations and distributions using data visualizations (Matplotlib/Seaborn), demonstrated on the Titanic passenger dataset.
  • Classical Classification Models - Implementation, training, and evaluation of traditional machine learning algorithms, along with hyperparameter tuning using cross-validation.
  • Computer Vision - Designing, compiling, and training a custom Convolutional Neural Network (CNN) in a GPU-accelerated environment for image classification and handwritten digit recognition using the MNIST dataset.
  • Natural Language Processing (NLP) & LLMs - Implementation of an advanced RAG (Retrieval-Augmented Generation) system. The code covers document loading and text splitting from PDF files (PyPDFLoader, RecursiveCharacterTextSplitter), generating semantic embeddings (HuggingFaceEmbeddings), creating a FAISS vector store, and integrating it with a Large Language Model (LLM) via the LangChain framework for intelligent question-answering.

Tech Stack

  • Language: Python
  • Core ML / Data Science Libraries: Pandas, NumPy, Scikit-Learn, Pathlib
  • Deep Learning & NLP: PyTorch, Transformers, LangChain, LangChain-Community, Sentence-Transformers
  • Vector Databases: FAISS
  • Data Visualization: Matplotlib, Seaborn

How to Use in Google Colab

  1. Upload notebook to your Google Drive or open it directly in Google Colab.
  2. Upload the Titanic dataset files.
  3. Run the notebook cells sequentially.
  4. The notebook includes both code cells and markdown cells explaining every step to help understand the workflow.

Results

Model performance metrics are displayed inside the notebook with visualizations and explanations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors