Skip to content

Phawazz/HyperSense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HyperSense

Explainable AI-Powered Hypertension Risk Screening for West Africa

Live App License: MIT Python XGBoost

Answer a few questions. Know when to get checked.

→ Try HyperSense Live


What Is HyperSense?

HyperSense is an explainable machine learning system that estimates an individual's likelihood of hypertension using six non-invasive inputs — no clinical equipment, no blood test, no blood pressure cuff required.

Given age, sex, place of residence, education level, tobacco use, and BMI, HyperSense returns:

  • A hypertension risk tier (High / Low)
  • A probability estimate of elevated blood pressure
  • A SHAP-based explanation of the personal factors driving the result
  • A personalised recommendation for preventive action

⚠️ HyperSense is a screening and awareness tool. It does not diagnose hypertension. All results should be confirmed with a measured blood pressure reading from a trained healthcare professional.


Why It Was Built

Hypertension affects an estimated 30% of Nigerian adults, yet fewer than 29% are aware of their condition. Only 22% receive treatment. With a physician-to-patient ratio of 3.8 per 10,000 — far below the WHO recommendation — clinical screening at scale is neither feasible nor sustainable through the existing health workforce alone.

Existing risk prediction tools — including the Framingham Risk Score, ASCVD Pooled Cohort Equations, and ESH/ESC 2018 models — were derived from predominantly Western cohorts. They may not accurately reflect the epidemiological, dietary, and demographic characteristics of West African populations.

HyperSense was built to address two simultaneous gaps:

  1. The tool gap — no publicly accessible, non-invasive hypertension screener calibrated for West African populations
  2. The data gap — Nigeria's national health surveys collect no measured blood pressure data, a surveillance failure documented and reported as part of this project

Data Sources

HyperSense is trained on fieldworker-measured blood pressure data from two nationally representative surveys:

Dataset Survey Round Women Men BP Measurement
Ghana DHS 2014 Standard DHS (DHS-7) 9,396 4,388 Fieldworker-measured
Benin DHS 2017–18 Standard DHS (DHS-7) 15,928 7,595 Fieldworker-measured
Combined ~25,300 ~12,000 n = 20,446 (with valid BP)

Label derivation:

htn_status = 1  if  SBP ≥ 140 mmHg  OR  DBP ≥ 90 mmHg
htn_status = 0  otherwise

Labels are derived exclusively from measured blood pressure values. Self-reported hypertension diagnosis was not used.

Survey weights (V005 / 1,000,000) were applied during model training to ensure population-representative estimates.


Model Performance

Metric Value
Algorithm XGBoost (tuned via RandomizedSearchCV)
Training sample n = 16,356
Test sample n = 4,090
ROC-AUC 0.735
Sensitivity at deployment threshold 0.80
Specificity at deployment threshold 0.54
Brier Score 0.162
Deployment threshold 0.353
Class imbalance handling scale_pos_weight = 5

Clinical interpretation at deployment threshold:

  • 376 / 470 hypertensives in the test set correctly flagged (80%)
  • 94 / 470 missed (20% false negative rate — acknowledged limitation)
  • 1,659 normotensives referred unnecessarily (low clinical harm — BP check only)
  • 1,961 correctly reassured

Performance is consistent with published non-invasive hypertension screening models in sub-Saharan African populations (reported AUC range: 0.68–0.78).


SHAP Explainability

Every prediction includes a SHAP (SHapley Additive Explanations) explanation generated by TreeExplainer. Users see:

  • The direction each factor pushes their risk (increasing or decreasing)
  • The magnitude of each factor's contribution
  • Plain-language interpretation of the top drivers

Global feature importance ranking (SHAP):

  1. Age (dominant predictor)
  2. Educational level
  3. Residence (urban/rural)
  4. Sex
  5. Tobacco use

Age dominance is clinically expected and consistent with the epidemiological literature on hypertension in West Africa.


Methodology Notes

BMI inclusion: During early model development, BMI was excluded because anthropometric measurements and blood pressure measurements did not fully overlap in the Benin DHS 2017–18 survey. Restricting analysis to participants with complete BMI data reduced the available sample from 20,446 to 7,844 observations (62% reduction), raising concerns about selection bias and loss of statistical power.

As a result, the initial HyperSense deployment (v1.0) prioritised a larger population-representative sample without BMI. Following additional model development and evaluation, BMI has been incorporated into HyperSense v1.1 through a secondary modelling pipeline, enabling more personalised risk assessment while acknowledging the trade-off between feature richness and sample size.

Why Ghana and Benin, not Nigeria? A systematic search of all publicly available Nigerian health datasets confirmed that no nationally representative Nigerian dataset with fieldworker-measured blood pressure values currently exists. The Nigeria DHS (all rounds including 2023–24) collects only self-reported hypertension awareness. The 2023 Nigeria STEPS Survey with measured biomarkers is under validation as of mid-2026. Active data access requests are pending with authors who have contributed to blood pressure data on a large scale in recent years.

Threshold selection: The deployment threshold (0.353) was selected to achieve 80% sensitivity, consistent with the clinical priority of a screening tool: minimise missed hypertensives, accept higher false positive rate.

Survey weights: DHS surveys use stratified cluster sampling. Survey weights were applied during XGBoost training (sample_weight parameter) to ensure population-representative learning. Evaluation metrics were computed on the unweighted test set to reflect expected performance on individual users at deployment.


Limitations

Limitation Detail
Geographic generalisability Trained on Ghanaian and Beninese adults; not validated on Nigerian data yet
Age ceiling DHS surveys sample ages 15–64; highest-risk demographic (60+) unrepresented
Cross-sectional design Screens current likelihood of hypertension, not future cardiovascular events
BMI absent Primary model excludes BMI due to subsample non-overlap
Calibration Formal isotonic calibration not completed; Brier score reported from test set
Not a diagnostic tool High-risk output does not confirm hypertension; low-risk does not exclude it

Roadmap

  • Phase 2 — Fine-tuning on Nigerian measured BP data (data access requests active)
  • BMI sensitivity analysis — secondary model on n=7,844 complete-case sample
  • Formal calibration assessment (isotonic regression on dedicated holdout)
  • TRIPOD-compliant methodology write-up for peer review
  • OSF pre-registration for pilot evaluation study
  • Multilingual support — Yoruba, Hausa, Igbo

Version 1.1 Update

HyperSense v1.1 introduces Body Mass Index (BMI) as a predictive feature derived from user-entered height and weight.

What's New

  • BMI added as a model input
  • Height and weight collection added to the screening interface
  • Updated XGBoost model retrained with BMI-enhanced feature set
  • Refined screening result messaging
  • Improved clinical disclaimers and user guidance

Why BMI?

BMI is a well-established risk factor associated with hypertension and contributes meaningful predictive information beyond demographic variables alone. Incorporating BMI allows HyperSense to provide a more individualized screening assessment while maintaining a fully non-invasive workflow.

Methodological Trade-off

The original HyperSense v1.0 model was trained on approximately 20,446 adults with valid blood pressure measurements across Ghana and Benin. However, BMI data were not available for all participants due to differences in survey design and data collection procedures.

Including BMI required restricting training to a substantially smaller complete-case sample. This reduced population coverage but enabled the model to leverage an important physiological risk factor that was previously unavailable.

HyperSense v1.1 therefore prioritizes richer individual-level information over maximum sample size. Both approaches have advantages: larger datasets generally improve population representativeness, while BMI-enhanced models may provide more personalized risk assessment.

Important Note

The HyperSense score remains a screening result, not a diagnosis. Users with elevated screening scores should obtain a blood pressure measurement from a qualified healthcare provider.


Tech Stack

Python 3.14.2     · Data processing and modelling
XGBoost          · Primary classification model
SHAP             · Explainability (TreeExplainer)
scikit-learn     · Pipeline, preprocessing, evaluation
pandas / numpy   · Data wrangling
Streamlit        · Web application
pyreadstat       · DHS .dta file ingestion

Deployed on: Streamlit Community Cloud
Repository: github.com/Phawazz/HyperSense


Project Structure

HyperSense/
├── app.py                          # Streamlit application
├── requirements.txt
├── assets/                         # Logo and visual assets
├── models/                         # Trained model artifacts
│   ├── hypersense_model.pkl
│   ├── hypersense_explainer.pkl
│   └── model_config.json
└── notebooks/
    ├── 01_extraction.ipynb         # DHS data extraction
    ├── 02_harmonization.ipynb      # Cross-country harmonization
    ├── 03_eda.ipynb                # Exploratory data analysis
    └── 04_modeling.ipynb           # Model development & evaluation
    └── 05_shap.ipynb               # SHapley Additive exPlanations
    └── 06_deployment.ipynb         # Pre-deployment checks

Data Access

The DHS datasets used in this project are publicly available upon registration at dhsprogram.com. Per DHS terms of use, raw data files are not included in this repository.


Citation

If you use HyperSense in research or build on this work:

@software{hypersense2026,
  author    = {Bello, Fawaz Ariyo},
  title     = {HyperSense: Explainable ML for Hypertension Risk Screening in West Africa},
  year      = {2026},
  url       = {https://github.com/Phawazz/HyperSense},
  note      = {Trained on Ghana DHS 2014 + Benin DHS 2017--18}
}

Training data:

The DHS Program. Ghana Standard DHS 2014; Benin Standard DHS 2017–18. ICF, Rockville, Maryland, USA. dhsprogram.com


Author

Fawaz Bello
Medical Student · College of Medicine, University of Ibadan
ML Engineer · Cardiovascular Health Journal Club B - CRIH

GitHub LinkedIn


HyperSense v1.0 · MIT License · For research and educational use only · Not for clinical diagnosis

About

XGBoost + SHAP hypertension risk screener trained on Ghana & Benin DHS data. Non-invasive. Explainable. Built for West Africa.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors