🍕 NutriClass — Food Classification System

An intelligent food classification app powered by Machine Learning and an interactive Streamlit dashboard.

📖 About the Project

NutriClass is a machine learning-powered food classification system that predicts the most likely food type based on a given nutritional profile. It ingests a synthetic food dataset, applies a full data cleaning and preprocessing pipeline, trains a Random Forest classifier with class-imbalance handling, and serves predictions through an interactive Streamlit dashboard. Nutritionists, food-tech developers, and health app builders can use it to automatically tag or recommend food items based on macro and micro-nutrient data.

🛠️ Development Process

1. 📦 Data Collection

Used a synthetic imbalanced CSV dataset (synthetic_food_dataset_imbalanced.csv) containing 10 food categories.
Dataset includes nutritional features: Calories, Protein, Fat, Carbs, Sugar, Fiber, Sodium, Cholesterol, Glycemic_Index, Water_Content, Serving_Size.
Also includes categorical features: Meal_Type, Preparation_Method, Is_Vegan, Is_Gluten_Free.

2. 🧹 Data Cleaning & Preprocessing

Removed duplicate records using drop_duplicates().
Filled missing values in all 11 numerical columns using median imputation to handle skewed distributions.
Applied IQR-based outlier capping (1.5× IQR rule) across all numerical columns — clipping rather than removing to preserve data volume.

3. 🔧 Feature Engineering & Encoding

Applied one-hot encoding (via pd.get_dummies) to Meal_Type and Preparation_Method with drop_first=True to avoid multicollinearity.
Mapped boolean columns Is_Vegan and Is_Gluten_Free to binary integers (0/1).
Label-encoded the target column Food_Name into integers 0–9, mapping: Pizza→0, Burger→1, Donut→2, Pasta→3, Sushi→4, Ice Cream→5, Steak→6, Apple→7, Banana→8, Salad→9.

4. ⚖️ Imbalance Handling

Addressed class imbalance using RandomUnderSampler (imblearn) applied only to the training set after the train-test split, preserving the integrity of the test set.
Used random_state=42 for reproducibility.

5. 📐 Data Transformation

Applied StandardScaler to the 11 numerical columns.
Scaler fitted exclusively on the undersampled training set and then used to transform the test set — preventing data leakage.

6. 🤖 Model Building

Trained a Random Forest Classifier (n_estimators=99, random_state=42).
Chose Random Forest for its robustness to multicollinearity, feature importance interpretability, and strong out-of-the-box performance on multi-class tabular data.

7. 📊 Model Evaluation

Evaluated using four weighted metrics: Accuracy, Precision, Recall, and F1 Score — all displayed live on the dashboard as KPI cards.
Weighted averaging accounts for class imbalance in the test set.

8. 🚀 Dashboard Development

Built a two-page Streamlit app with st.session_state for page navigation (home → Prediction).
Applied custom CSS animations, gradient KPI cards, hover effects, and styled number inputs and select boxes.

9. 📈 Visualization & Analysis

Dataset overview with 4 KPI cards: Total Records, Features, Food Types, Avg Calories.
Model performance displayed as 4 KPI cards: Accuracy, Precision, Recall, F1 Score.
Raw cleaned dataset shown in an interactive st.dataframe table.

10. 🍽️ Prediction System

Prediction page collects 15 user inputs (11 numerical + 4 categorical).
Runs the full preprocessing pipeline on user input before inference.
Returns Top 5 predicted food types with probability percentages and Unsplash food images displayed in a 5-column layout.

11. ⚡ Performance Optimization

Used @st.cache_data to cache raw CSV loading.
Used @st.cache_resource to cache the entire ML pipeline (cleaning → training → model artifacts), preventing redundant retraining on every rerun.

✨ Key Features

🔎 Multi-class Food Prediction

Classifies nutritional input into one of 10 food categories with probability scores — not just a top-1 answer but a full Top-5 ranking.

🧠 Random Forest Classifier

A 99-estimator Random Forest handles multi-class imbalanced classification with high accuracy and robustness to noisy nutritional data.

⚖️ Imbalance-Aware Training

RandomUnderSampler ensures the model is never biased toward majority food classes during training, producing fairer multi-class predictions.

📊 Live Model Metrics Dashboard

Accuracy, Precision, Recall, and F1 Score are computed fresh each session and displayed as animated KPI cards on the home page.

🖼️ Visual Food Recommendations

Top-5 predicted foods are displayed with curated Unsplash images and confidence percentages in a clean 5-column card layout.

🎨 Custom Animated UI

CSS animations, gradient KPI cards, hover lift effects, and styled Streamlit inputs give the app a polished, non-default look and feel.

⚡ Cached ML Pipeline

The entire data loading, cleaning, and model training pipeline is cached via @st.cache_resource, making navigation instant after the first load.

🧹 Robust Data Cleaning

IQR-based outlier capping and median imputation are applied across all 11 numerical columns before training, improving model stability.

🔬 Features (Detailed)

🏠 Home Dashboard

4 Dataset KPI Cards: Total Records, Number of Features, Unique Food Types, Average Calories.
Interactive Data Table: Full cleaned dataset displayed with st.dataframe for exploration.
4 Model KPI Cards: Weighted Accuracy, Precision, Recall, F1 Score — computed every session.
Navigation Button: Animated "Predict" button transitions to the Prediction page via session state.

🍽️ Prediction Page

15 Input Fields: 11 number inputs (nutritional values) + 4 select boxes (Meal Type, Preparation Method, Is Vegan, Is Gluten Free).
Full Pipeline Inference: User inputs are encoded, one-hot expanded, column-aligned to training features, and scaled before prediction.
Top-5 Output: Shows the 5 most probable food types with image cards and probability percentages.
Back Navigation: Returns to the dashboard without losing cached model state.

🛠️ Tech Stack

🖥️ Frontend / UI

Library	Purpose
`streamlit`	Web app framework, page layout, widgets, session state
Custom CSS	Gradient KPI cards, hover animations, styled inputs and buttons

🧠 Machine Learning

Library	Purpose
`sklearn.ensemble.RandomForestClassifier`	Multi-class food classification
`sklearn.linear_model.LogisticRegression`	Imported (available for comparison)
`sklearn.neighbors.KNeighborsClassifier`	Imported (available for comparison)
`sklearn.tree.DecisionTreeClassifier`	Imported (available for comparison)
`imblearn.under_sampling.RandomUnderSampler`	Training set imbalance correction
`sklearn.preprocessing.StandardScaler`	Numerical feature normalization
`sklearn.model_selection.train_test_split`	80/20 stratified split
`sklearn.metrics`	Accuracy, Precision, Recall, F1 Score (weighted)

📊 Data Processing & Analysis

Library	Purpose
`pandas`	Data loading, cleaning, encoding, feature alignment
`numpy`	IQR computation, outlier clipping, probability sorting

📈 Data Visualization

Library	Purpose
`plotly.express`	Imported for chart-based analysis extensions

⚙️ Caching & Performance

Decorator	What It Caches
`@st.cache_data`	Raw CSV load via `load_data()`
`@st.cache_resource`	Full pipeline: cleaned data, model, scaler, columns via `get_model_pipeline()`

⚙️ Setup & Installation

1. Clone the Repository

git clone https://github.com/your-username/NutriClass_Food_Classification.git
cd NutriClass_Food_Classification

2. Create a Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# macOS / Linux
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

Key libraries:

streamlit
pandas
numpy
scikit-learn
imbalanced-learn
plotly

4. Prepare the Dataset

Place the dataset file in the project directory (or update the path in load_data()):

synthetic_food_dataset_imbalanced.csv

The app reads it from:

pd.read_csv(r"D:\PROJECTS\Own_Project_2\NutriClass_Food_Classification-\synthetic_food_dataset_imbalanced.csv")

Update this path to match your local environment.

5. Run the Application

streamlit run app.py

6. Optional: Clear Cache

If you update the dataset or model, clear Streamlit's resource cache:

# In the Streamlit UI → top-right menu → Clear cache
# Or programmatically:
st.cache_data.clear()
st.cache_resource.clear()

💡 Use Cases

🥗 Nutrition App Tagging — Automatically classify user-logged meals by food type using only macro/micro-nutrient inputs, without requiring manual food naming.
🏥 Clinical Diet Planning — Healthcare providers can input a patient's target nutritional profile and receive ranked food recommendations that match their dietary constraints.
🛒 Grocery & Meal Kit Platforms — E-commerce food platforms can auto-tag SKUs with food category labels based on nutritional metadata from product databases.
📱 Fitness & Diet Tracking Apps — Integrate the classifier to suggest compatible foods based on a user's calorie and macronutrient goals for a given meal type.
🔬 Food Research & Benchmarking — Researchers can use the probability output to study how nutritional boundaries between food classes overlap and evolve in synthetic datasets.
🍽️ Restaurant Menu Intelligence — Restaurant chains can classify new menu items by nutritional content to ensure they meet dietary category standards (vegan, gluten-free, etc.).

🚀 Future Enhancements

🔥 Deep Learning Classifier — Replace Random Forest with a PyTorch or TensorFlow tabular model (e.g., TabNet) for improved boundary learning on overlapping nutritional profiles.
🌐 Real Nutritional API Integration — Connect to the USDA FoodData Central API to fetch real-world nutritional data instead of relying on synthetic CSVs.
🧠 SHAP Explainability — Add SHAP value plots to the prediction page to explain which nutritional features most influenced each food classification.
📊 Extended EDA Dashboard — Add a dedicated analytics page with Plotly charts (correlation heatmaps, calorie distributions per food type, nutrient radar charts).
📤 Batch Prediction Upload — Allow users to upload a CSV of nutritional profiles and download the classification results in bulk.
🗄️ Prediction History Database — Store each prediction (inputs + outputs + timestamp) to a SQLite or PostgreSQL database for audit and trend analysis.
🎯 Model Comparison Mode — Expose Logistic Regression, KNN, and Decision Tree models (already imported) for side-by-side metric comparison on the dashboard.
📱 Mobile-Responsive Layout — Optimize the Streamlit layout and CSS for smaller screens and deploy on Streamlit Community Cloud with a sharable public URL.

🏗️ How It Works

┌─────────────────────────────────────────────────────────────┐
│                     STREAMLIT UI LAYER                       │
│                                                             │
│   Page: "home"  ──────────────────►  Page: "Prediction"    │
│   (Dashboard)     st.session_state    (Input Form)          │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   CACHED PIPELINE LAYER                      │
│                                                             │
│   @st.cache_data          │   @st.cache_resource            │
│   load_data()             │   get_model_pipeline()          │
│   (Raw CSV Load)          │   (Full Pipeline Cache)         │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   DATA PIPELINE LAYER                        │
│                                                             │
│   food_Classification_Data_Cleaning(df)                     │
│   ├── drop_duplicates()                                     │
│   ├── fillna(median)  →  11 numerical columns               │
│   ├── Outlier_Detection()  →  IQR capping (1.5×)            │
│   └── Map Is_Vegan / Is_Gluten_Free → "True"/"False"        │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   ML PIPELINE LAYER                          │
│                                                             │
│   ML_model(Data)                                            │
│   ├── pd.get_dummies()  →  Meal_Type, Preparation_Method    │
│   ├── Label encode  →  Food_Name (0–9)                      │
│   ├── train_test_split  →  80% train / 20% test             │
│   ├── RandomUnderSampler  →  Balance training classes        │
│   ├── StandardScaler  →  Fit on train, transform test       │
│   └── RandomForestClassifier(n_estimators=99)               │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                   PREDICTION OUTPUT LAYER                    │
│                                                             │
│   model.predict_proba(input_df)                             │
│   ├── Top-5 food classes by probability                     │
│   ├── food_map  →  {0:"Pizza", 1:"Burger", ..., 9:"Salad"} │
│   └── Unsplash image cards  +  confidence % display         │
└─────────────────────────────────────────────────────────────┘

📋 Project Overview

NutriClass is a supervised multi-class food classification system built on a synthetic nutritional dataset containing 10 food categories — Pizza, Burger, Donut, Pasta, Sushi, Ice Cream, Steak, Apple, Banana, and Salad. The system applies a rigorous preprocessing pipeline including median imputation, IQR-based outlier capping, one-hot encoding for meal type and preparation method, and RandomUnderSampler-based class balance correction before training. A Random Forest Classifier with 99 estimators is trained on the balanced dataset and evaluated using weighted Accuracy, Precision, Recall, and F1 Score. The interactive Streamlit dashboard provides a live model performance overview on the home page and a full prediction interface where users enter 15 nutritional and categorical inputs to receive the Top-5 most probable food classifications, each displayed with a visual image card and confidence score. This system serves as a foundation for building nutrition-aware food tagging, dietary recommendation, and meal planning features in health and food-tech applications.

⭐ If you find this project useful, give it a star on GitHub and share your feedback!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
NutriClass_Project.ipynb		NutriClass_Project.ipynb
README.md		README.md
requirements.txt		requirements.txt
synthetic_food_dataset_imbalanced.csv		synthetic_food_dataset_imbalanced.csv
test.py		test.py

Folders and files

Latest commit

History

Repository files navigation

🍕 NutriClass — Food Classification System

📖 About the Project

🛠️ Development Process

1. 📦 Data Collection

2. 🧹 Data Cleaning & Preprocessing

3. 🔧 Feature Engineering & Encoding

4. ⚖️ Imbalance Handling

5. 📐 Data Transformation

6. 🤖 Model Building

7. 📊 Model Evaluation

8. 🚀 Dashboard Development

9. 📈 Visualization & Analysis

10. 🍽️ Prediction System

11. ⚡ Performance Optimization

✨ Key Features

🔎 Multi-class Food Prediction

🧠 Random Forest Classifier

⚖️ Imbalance-Aware Training

📊 Live Model Metrics Dashboard

🖼️ Visual Food Recommendations

🎨 Custom Animated UI

⚡ Cached ML Pipeline

🧹 Robust Data Cleaning

🔬 Features (Detailed)

🏠 Home Dashboard

🍽️ Prediction Page

🛠️ Tech Stack

🖥️ Frontend / UI

🧠 Machine Learning

📊 Data Processing & Analysis

📈 Data Visualization

⚙️ Caching & Performance

⚙️ Setup & Installation

1. Clone the Repository

2. Create a Virtual Environment

3. Install Dependencies

4. Prepare the Dataset

5. Run the Application

6. Optional: Clear Cache

💡 Use Cases

🚀 Future Enhancements

🏗️ How It Works

📋 Project Overview

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages