Skip to content

sarank-21/NutriClass_Food_Classification-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ• NutriClass β€” Food Classification System

An intelligent food classification app powered by Machine Learning and an interactive Streamlit dashboard.


πŸ“– About the Project

NutriClass is a machine learning-powered food classification system that predicts the most likely food type based on a given nutritional profile. It ingests a synthetic food dataset, applies a full data cleaning and preprocessing pipeline, trains a Random Forest classifier with class-imbalance handling, and serves predictions through an interactive Streamlit dashboard. Nutritionists, food-tech developers, and health app builders can use it to automatically tag or recommend food items based on macro and micro-nutrient data.


πŸ› οΈ Development Process

1. πŸ“¦ Data Collection

  • Used a synthetic imbalanced CSV dataset (synthetic_food_dataset_imbalanced.csv) containing 10 food categories.
  • Dataset includes nutritional features: Calories, Protein, Fat, Carbs, Sugar, Fiber, Sodium, Cholesterol, Glycemic_Index, Water_Content, Serving_Size.
  • Also includes categorical features: Meal_Type, Preparation_Method, Is_Vegan, Is_Gluten_Free.

2. 🧹 Data Cleaning & Preprocessing

  • Removed duplicate records using drop_duplicates().
  • Filled missing values in all 11 numerical columns using median imputation to handle skewed distributions.
  • Applied IQR-based outlier capping (1.5Γ— IQR rule) across all numerical columns β€” clipping rather than removing to preserve data volume.

3. πŸ”§ Feature Engineering & Encoding

  • Applied one-hot encoding (via pd.get_dummies) to Meal_Type and Preparation_Method with drop_first=True to avoid multicollinearity.
  • Mapped boolean columns Is_Vegan and Is_Gluten_Free to binary integers (0/1).
  • Label-encoded the target column Food_Name into integers 0–9, mapping: Pizzaβ†’0, Burgerβ†’1, Donutβ†’2, Pastaβ†’3, Sushiβ†’4, Ice Creamβ†’5, Steakβ†’6, Appleβ†’7, Bananaβ†’8, Saladβ†’9.

4. βš–οΈ Imbalance Handling

  • Addressed class imbalance using RandomUnderSampler (imblearn) applied only to the training set after the train-test split, preserving the integrity of the test set.
  • Used random_state=42 for reproducibility.

5. πŸ“ Data Transformation

  • Applied StandardScaler to the 11 numerical columns.
  • Scaler fitted exclusively on the undersampled training set and then used to transform the test set β€” preventing data leakage.

6. πŸ€– Model Building

  • Trained a Random Forest Classifier (n_estimators=99, random_state=42).
  • Chose Random Forest for its robustness to multicollinearity, feature importance interpretability, and strong out-of-the-box performance on multi-class tabular data.

7. πŸ“Š Model Evaluation

  • Evaluated using four weighted metrics: Accuracy, Precision, Recall, and F1 Score β€” all displayed live on the dashboard as KPI cards.
  • Weighted averaging accounts for class imbalance in the test set.

8. πŸš€ Dashboard Development

  • Built a two-page Streamlit app with st.session_state for page navigation (home β†’ Prediction).
  • Applied custom CSS animations, gradient KPI cards, hover effects, and styled number inputs and select boxes.

9. πŸ“ˆ Visualization & Analysis

  • Dataset overview with 4 KPI cards: Total Records, Features, Food Types, Avg Calories.
  • Model performance displayed as 4 KPI cards: Accuracy, Precision, Recall, F1 Score.
  • Raw cleaned dataset shown in an interactive st.dataframe table.

10. 🍽️ Prediction System

  • Prediction page collects 15 user inputs (11 numerical + 4 categorical).
  • Runs the full preprocessing pipeline on user input before inference.
  • Returns Top 5 predicted food types with probability percentages and Unsplash food images displayed in a 5-column layout.

11. ⚑ Performance Optimization

  • Used @st.cache_data to cache raw CSV loading.
  • Used @st.cache_resource to cache the entire ML pipeline (cleaning β†’ training β†’ model artifacts), preventing redundant retraining on every rerun.

✨ Key Features

πŸ”Ž Multi-class Food Prediction

Classifies nutritional input into one of 10 food categories with probability scores β€” not just a top-1 answer but a full Top-5 ranking.

🧠 Random Forest Classifier

A 99-estimator Random Forest handles multi-class imbalanced classification with high accuracy and robustness to noisy nutritional data.

βš–οΈ Imbalance-Aware Training

RandomUnderSampler ensures the model is never biased toward majority food classes during training, producing fairer multi-class predictions.

πŸ“Š Live Model Metrics Dashboard

Accuracy, Precision, Recall, and F1 Score are computed fresh each session and displayed as animated KPI cards on the home page.

πŸ–ΌοΈ Visual Food Recommendations

Top-5 predicted foods are displayed with curated Unsplash images and confidence percentages in a clean 5-column card layout.

🎨 Custom Animated UI

CSS animations, gradient KPI cards, hover lift effects, and styled Streamlit inputs give the app a polished, non-default look and feel.

⚑ Cached ML Pipeline

The entire data loading, cleaning, and model training pipeline is cached via @st.cache_resource, making navigation instant after the first load.

🧹 Robust Data Cleaning

IQR-based outlier capping and median imputation are applied across all 11 numerical columns before training, improving model stability.


πŸ”¬ Features (Detailed)

🏠 Home Dashboard

  • 4 Dataset KPI Cards: Total Records, Number of Features, Unique Food Types, Average Calories.
  • Interactive Data Table: Full cleaned dataset displayed with st.dataframe for exploration.
  • 4 Model KPI Cards: Weighted Accuracy, Precision, Recall, F1 Score β€” computed every session.
  • Navigation Button: Animated "Predict" button transitions to the Prediction page via session state.

🍽️ Prediction Page

  • 15 Input Fields: 11 number inputs (nutritional values) + 4 select boxes (Meal Type, Preparation Method, Is Vegan, Is Gluten Free).
  • Full Pipeline Inference: User inputs are encoded, one-hot expanded, column-aligned to training features, and scaled before prediction.
  • Top-5 Output: Shows the 5 most probable food types with image cards and probability percentages.
  • Back Navigation: Returns to the dashboard without losing cached model state.

πŸ› οΈ Tech Stack

πŸ–₯️ Frontend / UI

Library Purpose
streamlit Web app framework, page layout, widgets, session state
Custom CSS Gradient KPI cards, hover animations, styled inputs and buttons

🧠 Machine Learning

Library Purpose
sklearn.ensemble.RandomForestClassifier Multi-class food classification
sklearn.linear_model.LogisticRegression Imported (available for comparison)
sklearn.neighbors.KNeighborsClassifier Imported (available for comparison)
sklearn.tree.DecisionTreeClassifier Imported (available for comparison)
imblearn.under_sampling.RandomUnderSampler Training set imbalance correction
sklearn.preprocessing.StandardScaler Numerical feature normalization
sklearn.model_selection.train_test_split 80/20 stratified split
sklearn.metrics Accuracy, Precision, Recall, F1 Score (weighted)

πŸ“Š Data Processing & Analysis

Library Purpose
pandas Data loading, cleaning, encoding, feature alignment
numpy IQR computation, outlier clipping, probability sorting

πŸ“ˆ Data Visualization

Library Purpose
plotly.express Imported for chart-based analysis extensions

βš™οΈ Caching & Performance

Decorator What It Caches
@st.cache_data Raw CSV load via load_data()
@st.cache_resource Full pipeline: cleaned data, model, scaler, columns via get_model_pipeline()

βš™οΈ Setup & Installation

1. Clone the Repository

git clone https://github.com/your-username/NutriClass_Food_Classification.git
cd NutriClass_Food_Classification

2. Create a Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# macOS / Linux
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

Key libraries:

streamlit
pandas
numpy
scikit-learn
imbalanced-learn
plotly

4. Prepare the Dataset

Place the dataset file in the project directory (or update the path in load_data()):

synthetic_food_dataset_imbalanced.csv

The app reads it from:

pd.read_csv(r"D:\PROJECTS\Own_Project_2\NutriClass_Food_Classification-\synthetic_food_dataset_imbalanced.csv")

Update this path to match your local environment.

5. Run the Application

streamlit run app.py

6. Optional: Clear Cache

If you update the dataset or model, clear Streamlit's resource cache:

# In the Streamlit UI β†’ top-right menu β†’ Clear cache
# Or programmatically:
st.cache_data.clear()
st.cache_resource.clear()

πŸ’‘ Use Cases

  1. πŸ₯— Nutrition App Tagging β€” Automatically classify user-logged meals by food type using only macro/micro-nutrient inputs, without requiring manual food naming.

  2. πŸ₯ Clinical Diet Planning β€” Healthcare providers can input a patient's target nutritional profile and receive ranked food recommendations that match their dietary constraints.

  3. πŸ›’ Grocery & Meal Kit Platforms β€” E-commerce food platforms can auto-tag SKUs with food category labels based on nutritional metadata from product databases.

  4. πŸ“± Fitness & Diet Tracking Apps β€” Integrate the classifier to suggest compatible foods based on a user's calorie and macronutrient goals for a given meal type.

  5. πŸ”¬ Food Research & Benchmarking β€” Researchers can use the probability output to study how nutritional boundaries between food classes overlap and evolve in synthetic datasets.

  6. 🍽️ Restaurant Menu Intelligence β€” Restaurant chains can classify new menu items by nutritional content to ensure they meet dietary category standards (vegan, gluten-free, etc.).


πŸš€ Future Enhancements

  1. πŸ”₯ Deep Learning Classifier β€” Replace Random Forest with a PyTorch or TensorFlow tabular model (e.g., TabNet) for improved boundary learning on overlapping nutritional profiles.
  2. 🌐 Real Nutritional API Integration β€” Connect to the USDA FoodData Central API to fetch real-world nutritional data instead of relying on synthetic CSVs.
  3. 🧠 SHAP Explainability β€” Add SHAP value plots to the prediction page to explain which nutritional features most influenced each food classification.
  4. πŸ“Š Extended EDA Dashboard β€” Add a dedicated analytics page with Plotly charts (correlation heatmaps, calorie distributions per food type, nutrient radar charts).
  5. πŸ“€ Batch Prediction Upload β€” Allow users to upload a CSV of nutritional profiles and download the classification results in bulk.
  6. πŸ—„οΈ Prediction History Database β€” Store each prediction (inputs + outputs + timestamp) to a SQLite or PostgreSQL database for audit and trend analysis.
  7. 🎯 Model Comparison Mode β€” Expose Logistic Regression, KNN, and Decision Tree models (already imported) for side-by-side metric comparison on the dashboard.
  8. πŸ“± Mobile-Responsive Layout β€” Optimize the Streamlit layout and CSS for smaller screens and deploy on Streamlit Community Cloud with a sharable public URL.

πŸ—οΈ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     STREAMLIT UI LAYER                       β”‚
β”‚                                                             β”‚
β”‚   Page: "home"  ──────────────────►  Page: "Prediction"    β”‚
β”‚   (Dashboard)     st.session_state    (Input Form)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   CACHED PIPELINE LAYER                      β”‚
β”‚                                                             β”‚
β”‚   @st.cache_data          β”‚   @st.cache_resource            β”‚
β”‚   load_data()             β”‚   get_model_pipeline()          β”‚
β”‚   (Raw CSV Load)          β”‚   (Full Pipeline Cache)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   DATA PIPELINE LAYER                        β”‚
β”‚                                                             β”‚
β”‚   food_Classification_Data_Cleaning(df)                     β”‚
β”‚   β”œβ”€β”€ drop_duplicates()                                     β”‚
β”‚   β”œβ”€β”€ fillna(median)  β†’  11 numerical columns               β”‚
β”‚   β”œβ”€β”€ Outlier_Detection()  β†’  IQR capping (1.5Γ—)            β”‚
β”‚   └── Map Is_Vegan / Is_Gluten_Free β†’ "True"/"False"        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   ML PIPELINE LAYER                          β”‚
β”‚                                                             β”‚
β”‚   ML_model(Data)                                            β”‚
β”‚   β”œβ”€β”€ pd.get_dummies()  β†’  Meal_Type, Preparation_Method    β”‚
β”‚   β”œβ”€β”€ Label encode  β†’  Food_Name (0–9)                      β”‚
β”‚   β”œβ”€β”€ train_test_split  β†’  80% train / 20% test             β”‚
β”‚   β”œβ”€β”€ RandomUnderSampler  β†’  Balance training classes        β”‚
β”‚   β”œβ”€β”€ StandardScaler  β†’  Fit on train, transform test       β”‚
β”‚   └── RandomForestClassifier(n_estimators=99)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   PREDICTION OUTPUT LAYER                    β”‚
β”‚                                                             β”‚
β”‚   model.predict_proba(input_df)                             β”‚
β”‚   β”œβ”€β”€ Top-5 food classes by probability                     β”‚
β”‚   β”œβ”€β”€ food_map  β†’  {0:"Pizza", 1:"Burger", ..., 9:"Salad"} β”‚
β”‚   └── Unsplash image cards  +  confidence % display         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ Project Overview

NutriClass is a supervised multi-class food classification system built on a synthetic nutritional dataset containing 10 food categories β€” Pizza, Burger, Donut, Pasta, Sushi, Ice Cream, Steak, Apple, Banana, and Salad. The system applies a rigorous preprocessing pipeline including median imputation, IQR-based outlier capping, one-hot encoding for meal type and preparation method, and RandomUnderSampler-based class balance correction before training. A Random Forest Classifier with 99 estimators is trained on the balanced dataset and evaluated using weighted Accuracy, Precision, Recall, and F1 Score. The interactive Streamlit dashboard provides a live model performance overview on the home page and a full prediction interface where users enter 15 nutritional and categorical inputs to receive the Top-5 most probable food classifications, each displayed with a visual image card and confidence score. This system serves as a foundation for building nutrition-aware food tagging, dietary recommendation, and meal planning features in health and food-tech applications.


⭐ If you find this project useful, give it a star on GitHub and share your feedback!

About

πŸ• NutriClass β€” AI-powered food classifier that predicts food type from nutritional inputs (calories, protein, fat, carbs & more) using Random Forest with imbalance handling, served via a Streamlit dashboard with Top-5 predictions & food image cards.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors