An intelligent food classification app powered by Machine Learning and an interactive Streamlit dashboard.
NutriClass is a machine learning-powered food classification system that predicts the most likely food type based on a given nutritional profile. It ingests a synthetic food dataset, applies a full data cleaning and preprocessing pipeline, trains a Random Forest classifier with class-imbalance handling, and serves predictions through an interactive Streamlit dashboard. Nutritionists, food-tech developers, and health app builders can use it to automatically tag or recommend food items based on macro and micro-nutrient data.
- Used a synthetic imbalanced CSV dataset (
synthetic_food_dataset_imbalanced.csv) containing 10 food categories. - Dataset includes nutritional features:
Calories,Protein,Fat,Carbs,Sugar,Fiber,Sodium,Cholesterol,Glycemic_Index,Water_Content,Serving_Size. - Also includes categorical features:
Meal_Type,Preparation_Method,Is_Vegan,Is_Gluten_Free.
- Removed duplicate records using
drop_duplicates(). - Filled missing values in all 11 numerical columns using median imputation to handle skewed distributions.
- Applied IQR-based outlier capping (1.5Γ IQR rule) across all numerical columns β clipping rather than removing to preserve data volume.
- Applied one-hot encoding (via
pd.get_dummies) toMeal_TypeandPreparation_Methodwithdrop_first=Trueto avoid multicollinearity. - Mapped boolean columns
Is_VeganandIs_Gluten_Freeto binary integers (0/1). - Label-encoded the target column
Food_Nameinto integers 0β9, mapping: Pizzaβ0, Burgerβ1, Donutβ2, Pastaβ3, Sushiβ4, Ice Creamβ5, Steakβ6, Appleβ7, Bananaβ8, Saladβ9.
- Addressed class imbalance using RandomUnderSampler (
imblearn) applied only to the training set after the train-test split, preserving the integrity of the test set. - Used
random_state=42for reproducibility.
- Applied StandardScaler to the 11 numerical columns.
- Scaler fitted exclusively on the undersampled training set and then used to transform the test set β preventing data leakage.
- Trained a Random Forest Classifier (
n_estimators=99,random_state=42). - Chose Random Forest for its robustness to multicollinearity, feature importance interpretability, and strong out-of-the-box performance on multi-class tabular data.
- Evaluated using four weighted metrics: Accuracy, Precision, Recall, and F1 Score β all displayed live on the dashboard as KPI cards.
- Weighted averaging accounts for class imbalance in the test set.
- Built a two-page Streamlit app with
st.session_statefor page navigation (homeβPrediction). - Applied custom CSS animations, gradient KPI cards, hover effects, and styled number inputs and select boxes.
- Dataset overview with 4 KPI cards: Total Records, Features, Food Types, Avg Calories.
- Model performance displayed as 4 KPI cards: Accuracy, Precision, Recall, F1 Score.
- Raw cleaned dataset shown in an interactive
st.dataframetable.
- Prediction page collects 15 user inputs (11 numerical + 4 categorical).
- Runs the full preprocessing pipeline on user input before inference.
- Returns Top 5 predicted food types with probability percentages and Unsplash food images displayed in a 5-column layout.
- Used
@st.cache_datato cache raw CSV loading. - Used
@st.cache_resourceto cache the entire ML pipeline (cleaning β training β model artifacts), preventing redundant retraining on every rerun.
Classifies nutritional input into one of 10 food categories with probability scores β not just a top-1 answer but a full Top-5 ranking.
A 99-estimator Random Forest handles multi-class imbalanced classification with high accuracy and robustness to noisy nutritional data.
RandomUnderSampler ensures the model is never biased toward majority food classes during training, producing fairer multi-class predictions.
Accuracy, Precision, Recall, and F1 Score are computed fresh each session and displayed as animated KPI cards on the home page.
Top-5 predicted foods are displayed with curated Unsplash images and confidence percentages in a clean 5-column card layout.
CSS animations, gradient KPI cards, hover lift effects, and styled Streamlit inputs give the app a polished, non-default look and feel.
The entire data loading, cleaning, and model training pipeline is cached via @st.cache_resource, making navigation instant after the first load.
IQR-based outlier capping and median imputation are applied across all 11 numerical columns before training, improving model stability.
- 4 Dataset KPI Cards: Total Records, Number of Features, Unique Food Types, Average Calories.
- Interactive Data Table: Full cleaned dataset displayed with
st.dataframefor exploration. - 4 Model KPI Cards: Weighted Accuracy, Precision, Recall, F1 Score β computed every session.
- Navigation Button: Animated "Predict" button transitions to the Prediction page via session state.
- 15 Input Fields: 11 number inputs (nutritional values) + 4 select boxes (Meal Type, Preparation Method, Is Vegan, Is Gluten Free).
- Full Pipeline Inference: User inputs are encoded, one-hot expanded, column-aligned to training features, and scaled before prediction.
- Top-5 Output: Shows the 5 most probable food types with image cards and probability percentages.
- Back Navigation: Returns to the dashboard without losing cached model state.
| Library | Purpose |
|---|---|
streamlit |
Web app framework, page layout, widgets, session state |
| Custom CSS | Gradient KPI cards, hover animations, styled inputs and buttons |
| Library | Purpose |
|---|---|
sklearn.ensemble.RandomForestClassifier |
Multi-class food classification |
sklearn.linear_model.LogisticRegression |
Imported (available for comparison) |
sklearn.neighbors.KNeighborsClassifier |
Imported (available for comparison) |
sklearn.tree.DecisionTreeClassifier |
Imported (available for comparison) |
imblearn.under_sampling.RandomUnderSampler |
Training set imbalance correction |
sklearn.preprocessing.StandardScaler |
Numerical feature normalization |
sklearn.model_selection.train_test_split |
80/20 stratified split |
sklearn.metrics |
Accuracy, Precision, Recall, F1 Score (weighted) |
| Library | Purpose |
|---|---|
pandas |
Data loading, cleaning, encoding, feature alignment |
numpy |
IQR computation, outlier clipping, probability sorting |
| Library | Purpose |
|---|---|
plotly.express |
Imported for chart-based analysis extensions |
| Decorator | What It Caches |
|---|---|
@st.cache_data |
Raw CSV load via load_data() |
@st.cache_resource |
Full pipeline: cleaned data, model, scaler, columns via get_model_pipeline() |
git clone https://github.com/your-username/NutriClass_Food_Classification.git
cd NutriClass_Food_Classification# Windows
python -m venv venv
venv\Scripts\activate
# macOS / Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtKey libraries:
streamlit
pandas
numpy
scikit-learn
imbalanced-learn
plotly
Place the dataset file in the project directory (or update the path in load_data()):
synthetic_food_dataset_imbalanced.csv
The app reads it from:
pd.read_csv(r"D:\PROJECTS\Own_Project_2\NutriClass_Food_Classification-\synthetic_food_dataset_imbalanced.csv")Update this path to match your local environment.
streamlit run app.pyIf you update the dataset or model, clear Streamlit's resource cache:
# In the Streamlit UI β top-right menu β Clear cache
# Or programmatically:
st.cache_data.clear()
st.cache_resource.clear()-
π₯ Nutrition App Tagging β Automatically classify user-logged meals by food type using only macro/micro-nutrient inputs, without requiring manual food naming.
-
π₯ Clinical Diet Planning β Healthcare providers can input a patient's target nutritional profile and receive ranked food recommendations that match their dietary constraints.
-
π Grocery & Meal Kit Platforms β E-commerce food platforms can auto-tag SKUs with food category labels based on nutritional metadata from product databases.
-
π± Fitness & Diet Tracking Apps β Integrate the classifier to suggest compatible foods based on a user's calorie and macronutrient goals for a given meal type.
-
π¬ Food Research & Benchmarking β Researchers can use the probability output to study how nutritional boundaries between food classes overlap and evolve in synthetic datasets.
-
π½οΈ Restaurant Menu Intelligence β Restaurant chains can classify new menu items by nutritional content to ensure they meet dietary category standards (vegan, gluten-free, etc.).
- π₯ Deep Learning Classifier β Replace Random Forest with a PyTorch or TensorFlow tabular model (e.g., TabNet) for improved boundary learning on overlapping nutritional profiles.
- π Real Nutritional API Integration β Connect to the USDA FoodData Central API to fetch real-world nutritional data instead of relying on synthetic CSVs.
- π§ SHAP Explainability β Add SHAP value plots to the prediction page to explain which nutritional features most influenced each food classification.
- π Extended EDA Dashboard β Add a dedicated analytics page with Plotly charts (correlation heatmaps, calorie distributions per food type, nutrient radar charts).
- π€ Batch Prediction Upload β Allow users to upload a CSV of nutritional profiles and download the classification results in bulk.
- ποΈ Prediction History Database β Store each prediction (inputs + outputs + timestamp) to a SQLite or PostgreSQL database for audit and trend analysis.
- π― Model Comparison Mode β Expose Logistic Regression, KNN, and Decision Tree models (already imported) for side-by-side metric comparison on the dashboard.
- π± Mobile-Responsive Layout β Optimize the Streamlit layout and CSS for smaller screens and deploy on Streamlit Community Cloud with a sharable public URL.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STREAMLIT UI LAYER β
β β
β Page: "home" βββββββββββββββββββΊ Page: "Prediction" β
β (Dashboard) st.session_state (Input Form) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CACHED PIPELINE LAYER β
β β
β @st.cache_data β @st.cache_resource β
β load_data() β get_model_pipeline() β
β (Raw CSV Load) β (Full Pipeline Cache) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA PIPELINE LAYER β
β β
β food_Classification_Data_Cleaning(df) β
β βββ drop_duplicates() β
β βββ fillna(median) β 11 numerical columns β
β βββ Outlier_Detection() β IQR capping (1.5Γ) β
β βββ Map Is_Vegan / Is_Gluten_Free β "True"/"False" β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ML PIPELINE LAYER β
β β
β ML_model(Data) β
β βββ pd.get_dummies() β Meal_Type, Preparation_Method β
β βββ Label encode β Food_Name (0β9) β
β βββ train_test_split β 80% train / 20% test β
β βββ RandomUnderSampler β Balance training classes β
β βββ StandardScaler β Fit on train, transform test β
β βββ RandomForestClassifier(n_estimators=99) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PREDICTION OUTPUT LAYER β
β β
β model.predict_proba(input_df) β
β βββ Top-5 food classes by probability β
β βββ food_map β {0:"Pizza", 1:"Burger", ..., 9:"Salad"} β
β βββ Unsplash image cards + confidence % display β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
NutriClass is a supervised multi-class food classification system built on a synthetic nutritional dataset containing 10 food categories β Pizza, Burger, Donut, Pasta, Sushi, Ice Cream, Steak, Apple, Banana, and Salad. The system applies a rigorous preprocessing pipeline including median imputation, IQR-based outlier capping, one-hot encoding for meal type and preparation method, and RandomUnderSampler-based class balance correction before training. A Random Forest Classifier with 99 estimators is trained on the balanced dataset and evaluated using weighted Accuracy, Precision, Recall, and F1 Score. The interactive Streamlit dashboard provides a live model performance overview on the home page and a full prediction interface where users enter 15 nutritional and categorical inputs to receive the Top-5 most probable food classifications, each displayed with a visual image card and confidence score. This system serves as a foundation for building nutrition-aware food tagging, dietary recommendation, and meal planning features in health and food-tech applications.
β If you find this project useful, give it a star on GitHub and share your feedback!