Skip to content

Mosshato/DataVisualisationDashboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataVisualisationDashboard — Project Documentation

Table of Contents

  1. Project Overview
  2. Project Structure
  3. Tech Stack
  4. Installation & Setup
  5. Running the Application
  6. Architecture & Data Flow
  7. Module Reference
  8. Dataset
  9. Dashboard Features
  10. Machine Learning Model
  11. Known Issues & Notes

Project Overview

DataVisualisationDashboard is an interactive web-based data visualization and machine learning dashboard for analyzing drug consumption patterns. It is built with Python/Dash and combines:

  • A full ETL pipeline that processes a raw drug consumption survey dataset into formats suitable for visualization and ML training.
  • A multi-tab interactive dashboard with 10+ chart types, sidebar filtering, and cross-chart selection linking.
  • A neural network (ANN) that predicts an addiction sensitivity score from a user's psychometric profile and drug usage history.

Project Structure

DataVisualisationDashboard/
├── main.py                        # Application entry point
├── App.py                         # Dash app initialization
├── README.md                      # Brief readme
├── .gitignore
│
├── Data/
│   ├── Data.py                    # ETL pipeline logic
│   └── mappings.py                # Encoding/decoding maps for all categorical columns
│
├── Dataset/
│   ├── Raw/
│   │   └── drug_consumption.csv   # Original survey dataset
│   └── Processed/
│       ├── drug_consumption_processed_dashboard.csv   # Human-readable, used by UI
│       └── drug_consumption_processed_ml.csv          # Encoded, used for ML training
│
├── Model/
│   ├── Model.py                   # ANN model definition and training utilities
│   └── ss_model.keras             # Pre-trained model weights
│
└── PlotDashboard/
    ├── Dashboard.py               # Main layout and top-level tab routing
    ├── test_dashboard.py          # Development/test server runner
    ├── assets/
    │   └── styles.css             # Custom dark theme CSS
    └── tabs/
        ├── data_tab.py            # EDA and data exploration tab
        ├── visualisation_tab.py   # Interactive visualization tab
        └── model_tab.py           # ML prediction interface tab

Tech Stack

Layer Technology
Language Python 3.x
Dashboard framework Dash + dash-bootstrap-components
Charting Plotly Express + plotly.graph_objects
Data processing pandas, numpy
Machine learning TensorFlow / Keras, scikit-learn
Styling Custom CSS3 dark theme, Bootstrap grid
Underlying web server Flask (via Dash)

Installation & Setup

Prerequisites

  • Python 3.8 or higher
  • pip

Install dependencies

pip install dash dash-bootstrap-components pandas numpy tensorflow scikit-learn plotly

Fix the hard-coded data path

Important: PlotDashboard/tabs/data_tab.py contains a hard-coded absolute path:

DATA_PATH = r"C:\CORUNA\DataVi\Project\DVA_Final_Project\Dataset\Processed\drug_consumption_processed_dashboard.csv"

Change this to a path pointing to your local copy of the processed dashboard CSV:

DATA_PATH = r"<absolute-path-to-repo>\Dataset\Processed\drug_consumption_processed_dashboard.csv"

All other paths in model_tab.py are resolved dynamically relative to the file location and should work without changes.

(Optional) Re-run the ETL pipeline

If you want to regenerate the processed CSVs from the raw dataset, run:

from Data.Data import Data

etl = Data()
etl.runETLforDashboard()   # generates drug_consumption_processed_dashboard.csv
etl.runETLforML()          # generates drug_consumption_processed_ml.csv

Running the Application

Option 1 — Main entry point (recommended)

python main.py

Option 2 — Development server with debug mode

cd PlotDashboard
python test_dashboard.py

Both options start a local Dash server. Open your browser at:

http://localhost:8050

Module Reference

Data Module

Data/mappings.py

Defines all encoding and decoding dictionaries used throughout the pipeline.

Mapping Contents
AGE_MAP 6 age groups: 18-24, 25-34, 35-44, 45-54, 55-64, 65+
EDUCATION_MAP 9 education levels from "Left school before 16" to "Doctorate degree"
GENDER_MAP Male, Female
COUNTRY_MAP 7 countries (UK, USA, Canada, Australia, Ireland, New Zealand, Other)
DRUG_CLASS_MAP 7 frequency classes: Never UsedUsed in Last Day
Drug-specific maps One map per drug (17 total) for individual decoding

Data/Data.pyData class

Manages the complete ETL pipeline.

Method Description
extractData() Reads Dataset/Raw/drug_consumption.csv into a DataFrame
transformData(mode) Applies transformation pipeline. mode='D' → dashboard (decoded); mode='M' → ML (encoded)
loadData(mode) Saves processed DataFrame to the appropriate file in Dataset/Processed/
runETLforDashboard() Convenience method: extract → transform('D') → load
runETLforML() Convenience method: extract → transform('M') → load
dropColumns(cols) Drops a list of columns from the working DataFrame
encode(col) Applies binary encoding to a categorical column
decode(col) Maps numeric codes to human-readable string labels

Model Module

Model/Model.pyModel class

Defines, trains, and saves a regression ANN that predicts the Sensitivity Score (SS).

Architecture:

Input Layer  (n features)
    ↓
Dense(128, activation='relu')  +  L2 regularization
    ↓
Dropout(0.15)
    ↓
Dense(64, activation='relu')  +  L2 regularization
    ↓
Dropout(0.15)
    ↓
Dense(1, activation='linear')   ← regression output (SS value)

Training configuration:

Parameter Value
Optimizer Adam, lr=1e-3
Loss Mean Squared Error (MSE)
Metrics MAE, RMSE
Epochs 200 (default)
Batch size 32
Data split 70% train / 15% validation / 15% test
Regularization L2 (1e-4) + Dropout (0.15)

Key methods:

Method Description
prepare_data(df) Splits processed ML CSV into train/val/test sets using SplitData dataclass
build_model(input_dim) Constructs the Keras model
train(splits) Fits the model with early stopping via validation loss
evaluate(splits) Runs test set evaluation, returns MAE and RMSE
predict(X) Returns sensitivity score predictions for feature array X
save_model(path) Saves model weights to .keras file
load_model(path) Loads weights from file (used at runtime by model_tab.py)

Saved weights: Model/ss_model.keras


Dashboard Module

PlotDashboard/Dashboard.py

Builds the top-level Dash layout:

  • Sidebar (fixed left, 300px): contains filters for Age, Gender, Education, Country, and a numeric column selector.
  • Main area: three tabs routed via dcc.Tabs.

PlotDashboard/tabs/data_tab.py — Data & EDA Tab

Loads drug_consumption_processed_dashboard.csv and renders:

Component Description
KPI cards Total rows, total columns, missing value count
Summary stats Descriptive statistics table
Histogram + Boxplot For any selected numeric column
Correlation heatmap Pearson correlation across all numeric columns
Age distribution Bar chart
Violin plots Numeric metric split by gender
Scatter plot Nscore vs Escore, colored by age group
Education distribution Bar chart
Data table Full dataset with column filtering and sorting

All charts respond to sidebar filters (Age, Gender, Education, Country).

PlotDashboard/tabs/visualisation_tab.py — Visualisation & Interaction Tab

Contains two sub-tabs:

Psychometrics Explorer

  • Scatter plot with brush/lasso selection
  • Parallel coordinates chart — highlights selected data points
  • Violin plot — updates to reflect selection

Consumption Patterns

  • Sankey diagram — shows drug co-consumption flows between substances
  • Heatmap — drug usage frequency across demographic groups
  • Stacked bar chart — consumption distribution per drug

PlotDashboard/tabs/model_tab.py — Model Prediction Tab

Interactive prediction panel:

Input Type
Nscore, Escore, Oscore, Ascore, Cscore, Impulsive Sliders (continuous psychometric scores)
Age group Dropdown
Individual drug usage (17 drugs) Dropdowns (7 frequency classes each)

Output:

  • Gauge chart — shows predicted Sensitivity Score from 0% to 100%
  • Debug panel — toggleable display of the exact feature vector sent to the model

Disclaimer displayed in UI: Predictions are for educational/research purposes only and do not constitute medical advice.

PlotDashboard/assets/styles.css

Custom dark theme using CSS variables:

Variable Default
Background Dark navy #0f172a
Card background #1e293b
Accent (cyan) #6ee7ff
Accent (violet) #a78bfa
Sidebar width 300px
Chart height 340px

Dataset

Source file: Dataset/Raw/drug_consumption.csv

31 columns:

Group Columns
ID ID
Demographics Age, Gender, Education, Country, Ethnicity
Psychometric scores Nscore (Neuroticism), Escore (Extraversion), Oscore (Openness), Ascore (Agreeableness), Cscore (Conscientiousness), Impulsive, SS (Sensation Seeking / Sensitivity Score)
Drug consumption (17) Alcohol, Amphet, Amyl, Benzos, Caff, Cannabis, Choc, Coke, Crack, Ecstasy, Heroin, Ketamine, Legalh, LSD, Meth, Mushrooms, Nicotine, Semer, VSA

Raw data is numerically encoded. The ETL pipeline decodes it to human-readable labels (for the dashboard) or re-encodes it with binary encoding (for ML training).

Consumption frequency classes (7 levels):

Code Label
0 Never Used
1 Over a Decade Ago
2 Last Decade
3 Last Year
4 Last Month
5 Last Week
6 Used in Last Day

Machine Learning Model

The model predicts Sensitivity Score (SS) — a continuous regression value representing addiction sensitivity derived from a respondent's psychometric profile and drug usage history.

Input features used for prediction:

  • Psychometric scores: Nscore, Escore, Oscore, Ascore, Cscore, Impulsive
  • Age group (binary encoded)
  • Usage frequency for each of the 17 drugs (binary encoded)

Demographic columns excluded from ML training: Gender, Education, Country, Ethnicity — dropped during runETLforML().

Model file: Model/ss_model.keras (loaded at runtime, not retrained on each startup)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors