Skip to content

Adilforest/ml-assignment-1

Repository files navigation

Machine Learning — Assignment 1 (AITU)

Classification of student academic performance using Decision Tree, Random Forest, and K-Nearest Neighbours on a real student performance dataset (1,000,000 records).

Python Jupyter scikit-learn pandas


Overview

Dataset: student_performance.csv — 1,000,000 student records with four behavioural/academic features and a five-class grade label (A / B / C / D / F).

Feature Description
weekly_self_study_hours Hours spent studying outside class
attendance_percentage Percentage of classes attended
class_participation Participation score
total_score Composite academic score
grade Target label: A / B / C / D / F

Goal: Build and compare multi-class classifiers that predict a student's grade category from these features.


Notebook structure

Assignment1_Adil_Ormanov.ipynb — fully executed notebook with all cell outputs saved:

Section Content
1. Imports Libraries, global settings
2. Load Data Read full CSV; draw stratified 100k sample
3. EDA Descriptive stats, missing-value check, histograms, correlation heatmap, box plots by grade
4. Preprocessing LabelEncoder for grade target; StandardScaler; 80/20 stratified split
5. Model Training DecisionTreeClassifier, RandomForestClassifier (100 trees), KNeighborsClassifier (k=5)
6. Evaluation classification_report per model; confusion-matrix plots; grouped bar chart comparison
7. Split Sensitivity Metrics vs test-size ratio (20 %–40 %), line plots per model
8. Feature Importance Horizontal bar charts for Decision Tree and Random Forest
9. Export Results saved to student_performance_results.csv

Results (80/20 test split — 20,000 test samples)

Model Accuracy Precision Recall F1-Score
Decision Tree 0.9979 0.9978 0.9979 0.9978
Random Forest 0.9981 0.9982 0.9981 0.9981
KNN (k=5) 0.9705 0.9705 0.9705 0.9704

All weighted averages. Decision Tree and Random Forest are near-identical in performance; KNN trails by ~2.7 pp on F1. Random Forest is the strongest single model (F1 = 0.9981).


Files

File Description
Assignment1_Adil_Ormanov.ipynb Fully executed notebook (outputs included)
student_performance.csv Input dataset (1,000,000 rows)
student_performance_results.csv Per-model metrics across 5 split experiments
requirements.txt Python dependencies

Getting started

# 1. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Open the notebook
jupyter lab Assignment1_Adil_Ormanov.ipynb

Adil Ormanov — GitHub

About

Student grade classification using Decision Tree, Random Forest, and KNN on a student performance dataset — EDA, preprocessing, and multi-class evaluation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors