- Project Overview
- Project Structure
- Tech Stack
- Installation & Setup
- Running the Application
- Architecture & Data Flow
- Module Reference
- Dataset
- Dashboard Features
- Machine Learning Model
- Known Issues & Notes
DataVisualisationDashboard is an interactive web-based data visualization and machine learning dashboard for analyzing drug consumption patterns. It is built with Python/Dash and combines:
- A full ETL pipeline that processes a raw drug consumption survey dataset into formats suitable for visualization and ML training.
- A multi-tab interactive dashboard with 10+ chart types, sidebar filtering, and cross-chart selection linking.
- A neural network (ANN) that predicts an addiction sensitivity score from a user's psychometric profile and drug usage history.
DataVisualisationDashboard/
├── main.py # Application entry point
├── App.py # Dash app initialization
├── README.md # Brief readme
├── .gitignore
│
├── Data/
│ ├── Data.py # ETL pipeline logic
│ └── mappings.py # Encoding/decoding maps for all categorical columns
│
├── Dataset/
│ ├── Raw/
│ │ └── drug_consumption.csv # Original survey dataset
│ └── Processed/
│ ├── drug_consumption_processed_dashboard.csv # Human-readable, used by UI
│ └── drug_consumption_processed_ml.csv # Encoded, used for ML training
│
├── Model/
│ ├── Model.py # ANN model definition and training utilities
│ └── ss_model.keras # Pre-trained model weights
│
└── PlotDashboard/
├── Dashboard.py # Main layout and top-level tab routing
├── test_dashboard.py # Development/test server runner
├── assets/
│ └── styles.css # Custom dark theme CSS
└── tabs/
├── data_tab.py # EDA and data exploration tab
├── visualisation_tab.py # Interactive visualization tab
└── model_tab.py # ML prediction interface tab
| Layer | Technology |
|---|---|
| Language | Python 3.x |
| Dashboard framework | Dash + dash-bootstrap-components |
| Charting | Plotly Express + plotly.graph_objects |
| Data processing | pandas, numpy |
| Machine learning | TensorFlow / Keras, scikit-learn |
| Styling | Custom CSS3 dark theme, Bootstrap grid |
| Underlying web server | Flask (via Dash) |
- Python 3.8 or higher
pip
pip install dash dash-bootstrap-components pandas numpy tensorflow scikit-learn plotlyImportant:
PlotDashboard/tabs/data_tab.pycontains a hard-coded absolute path:DATA_PATH = r"C:\CORUNA\DataVi\Project\DVA_Final_Project\Dataset\Processed\drug_consumption_processed_dashboard.csv"Change this to a path pointing to your local copy of the processed dashboard CSV:
DATA_PATH = r"<absolute-path-to-repo>\Dataset\Processed\drug_consumption_processed_dashboard.csv"All other paths in
model_tab.pyare resolved dynamically relative to the file location and should work without changes.
If you want to regenerate the processed CSVs from the raw dataset, run:
from Data.Data import Data
etl = Data()
etl.runETLforDashboard() # generates drug_consumption_processed_dashboard.csv
etl.runETLforML() # generates drug_consumption_processed_ml.csvpython main.pycd PlotDashboard
python test_dashboard.pyBoth options start a local Dash server. Open your browser at:
http://localhost:8050
Defines all encoding and decoding dictionaries used throughout the pipeline.
| Mapping | Contents |
|---|---|
AGE_MAP |
6 age groups: 18-24, 25-34, 35-44, 45-54, 55-64, 65+ |
EDUCATION_MAP |
9 education levels from "Left school before 16" to "Doctorate degree" |
GENDER_MAP |
Male, Female |
COUNTRY_MAP |
7 countries (UK, USA, Canada, Australia, Ireland, New Zealand, Other) |
DRUG_CLASS_MAP |
7 frequency classes: Never Used → Used in Last Day |
| Drug-specific maps | One map per drug (17 total) for individual decoding |
Manages the complete ETL pipeline.
| Method | Description |
|---|---|
extractData() |
Reads Dataset/Raw/drug_consumption.csv into a DataFrame |
transformData(mode) |
Applies transformation pipeline. mode='D' → dashboard (decoded); mode='M' → ML (encoded) |
loadData(mode) |
Saves processed DataFrame to the appropriate file in Dataset/Processed/ |
runETLforDashboard() |
Convenience method: extract → transform('D') → load |
runETLforML() |
Convenience method: extract → transform('M') → load |
dropColumns(cols) |
Drops a list of columns from the working DataFrame |
encode(col) |
Applies binary encoding to a categorical column |
decode(col) |
Maps numeric codes to human-readable string labels |
Defines, trains, and saves a regression ANN that predicts the Sensitivity Score (SS).
Architecture:
Input Layer (n features)
↓
Dense(128, activation='relu') + L2 regularization
↓
Dropout(0.15)
↓
Dense(64, activation='relu') + L2 regularization
↓
Dropout(0.15)
↓
Dense(1, activation='linear') ← regression output (SS value)
Training configuration:
| Parameter | Value |
|---|---|
| Optimizer | Adam, lr=1e-3 |
| Loss | Mean Squared Error (MSE) |
| Metrics | MAE, RMSE |
| Epochs | 200 (default) |
| Batch size | 32 |
| Data split | 70% train / 15% validation / 15% test |
| Regularization | L2 (1e-4) + Dropout (0.15) |
Key methods:
| Method | Description |
|---|---|
prepare_data(df) |
Splits processed ML CSV into train/val/test sets using SplitData dataclass |
build_model(input_dim) |
Constructs the Keras model |
train(splits) |
Fits the model with early stopping via validation loss |
evaluate(splits) |
Runs test set evaluation, returns MAE and RMSE |
predict(X) |
Returns sensitivity score predictions for feature array X |
save_model(path) |
Saves model weights to .keras file |
load_model(path) |
Loads weights from file (used at runtime by model_tab.py) |
Saved weights: Model/ss_model.keras
Builds the top-level Dash layout:
- Sidebar (fixed left, 300px): contains filters for Age, Gender, Education, Country, and a numeric column selector.
- Main area: three tabs routed via
dcc.Tabs.
Loads drug_consumption_processed_dashboard.csv and renders:
| Component | Description |
|---|---|
| KPI cards | Total rows, total columns, missing value count |
| Summary stats | Descriptive statistics table |
| Histogram + Boxplot | For any selected numeric column |
| Correlation heatmap | Pearson correlation across all numeric columns |
| Age distribution | Bar chart |
| Violin plots | Numeric metric split by gender |
| Scatter plot | Nscore vs Escore, colored by age group |
| Education distribution | Bar chart |
| Data table | Full dataset with column filtering and sorting |
All charts respond to sidebar filters (Age, Gender, Education, Country).
Contains two sub-tabs:
Psychometrics Explorer
- Scatter plot with brush/lasso selection
- Parallel coordinates chart — highlights selected data points
- Violin plot — updates to reflect selection
Consumption Patterns
- Sankey diagram — shows drug co-consumption flows between substances
- Heatmap — drug usage frequency across demographic groups
- Stacked bar chart — consumption distribution per drug
Interactive prediction panel:
| Input | Type |
|---|---|
| Nscore, Escore, Oscore, Ascore, Cscore, Impulsive | Sliders (continuous psychometric scores) |
| Age group | Dropdown |
| Individual drug usage (17 drugs) | Dropdowns (7 frequency classes each) |
Output:
- Gauge chart — shows predicted Sensitivity Score from 0% to 100%
- Debug panel — toggleable display of the exact feature vector sent to the model
Disclaimer displayed in UI: Predictions are for educational/research purposes only and do not constitute medical advice.
Custom dark theme using CSS variables:
| Variable | Default |
|---|---|
| Background | Dark navy #0f172a |
| Card background | #1e293b |
| Accent (cyan) | #6ee7ff |
| Accent (violet) | #a78bfa |
| Sidebar width | 300px |
| Chart height | 340px |
Source file: Dataset/Raw/drug_consumption.csv
31 columns:
| Group | Columns |
|---|---|
| ID | ID |
| Demographics | Age, Gender, Education, Country, Ethnicity |
| Psychometric scores | Nscore (Neuroticism), Escore (Extraversion), Oscore (Openness), Ascore (Agreeableness), Cscore (Conscientiousness), Impulsive, SS (Sensation Seeking / Sensitivity Score) |
| Drug consumption (17) | Alcohol, Amphet, Amyl, Benzos, Caff, Cannabis, Choc, Coke, Crack, Ecstasy, Heroin, Ketamine, Legalh, LSD, Meth, Mushrooms, Nicotine, Semer, VSA |
Raw data is numerically encoded. The ETL pipeline decodes it to human-readable labels (for the dashboard) or re-encodes it with binary encoding (for ML training).
Consumption frequency classes (7 levels):
| Code | Label |
|---|---|
| 0 | Never Used |
| 1 | Over a Decade Ago |
| 2 | Last Decade |
| 3 | Last Year |
| 4 | Last Month |
| 5 | Last Week |
| 6 | Used in Last Day |
The model predicts Sensitivity Score (SS) — a continuous regression value representing addiction sensitivity derived from a respondent's psychometric profile and drug usage history.
Input features used for prediction:
- Psychometric scores:
Nscore,Escore,Oscore,Ascore,Cscore,Impulsive - Age group (binary encoded)
- Usage frequency for each of the 17 drugs (binary encoded)
Demographic columns excluded from ML training:
Gender, Education, Country, Ethnicity — dropped during runETLforML().
Model file: Model/ss_model.keras (loaded at runtime, not retrained on each startup)