Classification of student academic performance using Decision Tree, Random Forest, and K-Nearest Neighbours on a real student performance dataset (1,000,000 records).
Dataset: student_performance.csv — 1,000,000 student records with four behavioural/academic features and a five-class grade label (A / B / C / D / F).
| Feature | Description |
|---|---|
weekly_self_study_hours |
Hours spent studying outside class |
attendance_percentage |
Percentage of classes attended |
class_participation |
Participation score |
total_score |
Composite academic score |
grade |
Target label: A / B / C / D / F |
Goal: Build and compare multi-class classifiers that predict a student's grade category from these features.
Assignment1_Adil_Ormanov.ipynb — fully executed notebook with all cell outputs saved:
| Section | Content |
|---|---|
| 1. Imports | Libraries, global settings |
| 2. Load Data | Read full CSV; draw stratified 100k sample |
| 3. EDA | Descriptive stats, missing-value check, histograms, correlation heatmap, box plots by grade |
| 4. Preprocessing | LabelEncoder for grade target; StandardScaler; 80/20 stratified split |
| 5. Model Training | DecisionTreeClassifier, RandomForestClassifier (100 trees), KNeighborsClassifier (k=5) |
| 6. Evaluation | classification_report per model; confusion-matrix plots; grouped bar chart comparison |
| 7. Split Sensitivity | Metrics vs test-size ratio (20 %–40 %), line plots per model |
| 8. Feature Importance | Horizontal bar charts for Decision Tree and Random Forest |
| 9. Export | Results saved to student_performance_results.csv |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Decision Tree | 0.9979 | 0.9978 | 0.9979 | 0.9978 |
| Random Forest | 0.9981 | 0.9982 | 0.9981 | 0.9981 |
| KNN (k=5) | 0.9705 | 0.9705 | 0.9705 | 0.9704 |
All weighted averages. Decision Tree and Random Forest are near-identical in performance; KNN trails by ~2.7 pp on F1. Random Forest is the strongest single model (F1 = 0.9981).
| File | Description |
|---|---|
Assignment1_Adil_Ormanov.ipynb |
Fully executed notebook (outputs included) |
student_performance.csv |
Input dataset (1,000,000 rows) |
student_performance_results.csv |
Per-model metrics across 5 split experiments |
requirements.txt |
Python dependencies |
# 1. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Open the notebook
jupyter lab Assignment1_Adil_Ormanov.ipynbAdil Ormanov — GitHub