GitHub - Phawazz/HyperSense: XGBoost + SHAP hypertension risk screener trained on Ghana & Benin DHS data. Non-invasive. Explainable. Built for West Africa.

Explainable AI-Powered Hypertension Risk Screening for West Africa

Answer a few questions. Know when to get checked.

What Is HyperSense?

HyperSense is an explainable machine learning system that estimates an individual's likelihood of hypertension using six non-invasive inputs — no clinical equipment, no blood test, no blood pressure cuff required.

Given age, sex, place of residence, education level, tobacco use, and BMI, HyperSense returns:

A hypertension risk tier (High / Low)
A probability estimate of elevated blood pressure
A SHAP-based explanation of the personal factors driving the result
A personalised recommendation for preventive action

⚠️ HyperSense is a screening and awareness tool. It does not diagnose hypertension. All results should be confirmed with a measured blood pressure reading from a trained healthcare professional.

Why It Was Built

Hypertension affects an estimated 30% of Nigerian adults, yet fewer than 29% are aware of their condition. Only 22% receive treatment. With a physician-to-patient ratio of 3.8 per 10,000 — far below the WHO recommendation — clinical screening at scale is neither feasible nor sustainable through the existing health workforce alone.

Existing risk prediction tools — including the Framingham Risk Score, ASCVD Pooled Cohort Equations, and ESH/ESC 2018 models — were derived from predominantly Western cohorts. They may not accurately reflect the epidemiological, dietary, and demographic characteristics of West African populations.

HyperSense was built to address two simultaneous gaps:

The tool gap — no publicly accessible, non-invasive hypertension screener calibrated for West African populations
The data gap — Nigeria's national health surveys collect no measured blood pressure data, a surveillance failure documented and reported as part of this project

Data Sources

HyperSense is trained on fieldworker-measured blood pressure data from two nationally representative surveys:

Dataset	Survey Round	Women	Men	BP Measurement
Ghana DHS 2014	Standard DHS (DHS-7)	9,396	4,388	Fieldworker-measured
Benin DHS 2017–18	Standard DHS (DHS-7)	15,928	7,595	Fieldworker-measured
Combined		~25,300	~12,000	n = 20,446 (with valid BP)

Label derivation:

htn_status = 1  if  SBP ≥ 140 mmHg  OR  DBP ≥ 90 mmHg
htn_status = 0  otherwise

Labels are derived exclusively from measured blood pressure values. Self-reported hypertension diagnosis was not used.

Survey weights (V005 / 1,000,000) were applied during model training to ensure population-representative estimates.

Model Performance

Metric	Value
Algorithm	XGBoost (tuned via RandomizedSearchCV)
Training sample	n = 16,356
Test sample	n = 4,090
ROC-AUC	0.735
Sensitivity at deployment threshold	0.80
Specificity at deployment threshold	0.54
Brier Score	0.162
Deployment threshold	0.353
Class imbalance handling	`scale_pos_weight` = 5

Clinical interpretation at deployment threshold:

376 / 470 hypertensives in the test set correctly flagged (80%)
94 / 470 missed (20% false negative rate — acknowledged limitation)
1,659 normotensives referred unnecessarily (low clinical harm — BP check only)
1,961 correctly reassured

Performance is consistent with published non-invasive hypertension screening models in sub-Saharan African populations (reported AUC range: 0.68–0.78).

SHAP Explainability

Every prediction includes a SHAP (SHapley Additive Explanations) explanation generated by TreeExplainer. Users see:

The direction each factor pushes their risk (increasing or decreasing)
The magnitude of each factor's contribution
Plain-language interpretation of the top drivers

Global feature importance ranking (SHAP):

Age (dominant predictor)
Educational level
Residence (urban/rural)
Sex
Tobacco use

Age dominance is clinically expected and consistent with the epidemiological literature on hypertension in West Africa.

Methodology Notes

BMI inclusion: During early model development, BMI was excluded because anthropometric measurements and blood pressure measurements did not fully overlap in the Benin DHS 2017–18 survey. Restricting analysis to participants with complete BMI data reduced the available sample from 20,446 to 7,844 observations (62% reduction), raising concerns about selection bias and loss of statistical power.

As a result, the initial HyperSense deployment (v1.0) prioritised a larger population-representative sample without BMI. Following additional model development and evaluation, BMI has been incorporated into HyperSense v1.1 through a secondary modelling pipeline, enabling more personalised risk assessment while acknowledging the trade-off between feature richness and sample size.

Why Ghana and Benin, not Nigeria? A systematic search of all publicly available Nigerian health datasets confirmed that no nationally representative Nigerian dataset with fieldworker-measured blood pressure values currently exists. The Nigeria DHS (all rounds including 2023–24) collects only self-reported hypertension awareness. The 2023 Nigeria STEPS Survey with measured biomarkers is under validation as of mid-2026. Active data access requests are pending with authors who have contributed to blood pressure data on a large scale in recent years.

Threshold selection: The deployment threshold (0.353) was selected to achieve 80% sensitivity, consistent with the clinical priority of a screening tool: minimise missed hypertensives, accept higher false positive rate.

Survey weights: DHS surveys use stratified cluster sampling. Survey weights were applied during XGBoost training (sample_weight parameter) to ensure population-representative learning. Evaluation metrics were computed on the unweighted test set to reflect expected performance on individual users at deployment.

Limitations

Limitation	Detail
Geographic generalisability	Trained on Ghanaian and Beninese adults; not validated on Nigerian data yet
Age ceiling	DHS surveys sample ages 15–64; highest-risk demographic (60+) unrepresented
Cross-sectional design	Screens current likelihood of hypertension, not future cardiovascular events
BMI absent	Primary model excludes BMI due to subsample non-overlap
Calibration	Formal isotonic calibration not completed; Brier score reported from test set
Not a diagnostic tool	High-risk output does not confirm hypertension; low-risk does not exclude it

Roadmap

Phase 2 — Fine-tuning on Nigerian measured BP data (data access requests active)
BMI sensitivity analysis — secondary model on n=7,844 complete-case sample
Formal calibration assessment (isotonic regression on dedicated holdout)
TRIPOD-compliant methodology write-up for peer review
OSF pre-registration for pilot evaluation study
Multilingual support — Yoruba, Hausa, Igbo

Version 1.1 Update

HyperSense v1.1 introduces Body Mass Index (BMI) as a predictive feature derived from user-entered height and weight.

What's New

BMI added as a model input
Height and weight collection added to the screening interface
Updated XGBoost model retrained with BMI-enhanced feature set
Refined screening result messaging
Improved clinical disclaimers and user guidance

Why BMI?

BMI is a well-established risk factor associated with hypertension and contributes meaningful predictive information beyond demographic variables alone. Incorporating BMI allows HyperSense to provide a more individualized screening assessment while maintaining a fully non-invasive workflow.

Methodological Trade-off

The original HyperSense v1.0 model was trained on approximately 20,446 adults with valid blood pressure measurements across Ghana and Benin. However, BMI data were not available for all participants due to differences in survey design and data collection procedures.

Including BMI required restricting training to a substantially smaller complete-case sample. This reduced population coverage but enabled the model to leverage an important physiological risk factor that was previously unavailable.

HyperSense v1.1 therefore prioritizes richer individual-level information over maximum sample size. Both approaches have advantages: larger datasets generally improve population representativeness, while BMI-enhanced models may provide more personalized risk assessment.

Important Note

The HyperSense score remains a screening result, not a diagnosis. Users with elevated screening scores should obtain a blood pressure measurement from a qualified healthcare provider.

Tech Stack

Python 3.14.2     · Data processing and modelling
XGBoost          · Primary classification model
SHAP             · Explainability (TreeExplainer)
scikit-learn     · Pipeline, preprocessing, evaluation
pandas / numpy   · Data wrangling
Streamlit        · Web application
pyreadstat       · DHS .dta file ingestion

Deployed on: Streamlit Community Cloud
Repository: github.com/Phawazz/HyperSense

Project Structure

HyperSense/
├── app.py                          # Streamlit application
├── requirements.txt
├── assets/                         # Logo and visual assets
├── models/                         # Trained model artifacts
│   ├── hypersense_model.pkl
│   ├── hypersense_explainer.pkl
│   └── model_config.json
└── notebooks/
    ├── 01_extraction.ipynb         # DHS data extraction
    ├── 02_harmonization.ipynb      # Cross-country harmonization
    ├── 03_eda.ipynb                # Exploratory data analysis
    └── 04_modeling.ipynb           # Model development & evaluation
    └── 05_shap.ipynb               # SHapley Additive exPlanations
    └── 06_deployment.ipynb         # Pre-deployment checks

Data Access

The DHS datasets used in this project are publicly available upon registration at dhsprogram.com. Per DHS terms of use, raw data files are not included in this repository.

Citation

If you use HyperSense in research or build on this work:

@software{hypersense2026,
  author    = {Bello, Fawaz Ariyo},
  title     = {HyperSense: Explainable ML for Hypertension Risk Screening in West Africa},
  year      = {2026},
  url       = {https://github.com/Phawazz/HyperSense},
  note      = {Trained on Ghana DHS 2014 + Benin DHS 2017--18}
}

Training data:

The DHS Program. Ghana Standard DHS 2014; Benin Standard DHS 2017–18. ICF, Rockville, Maryland, USA. dhsprogram.com

Author

Fawaz Bello
Medical Student · College of Medicine, University of Ibadan
ML Engineer · Cardiovascular Health Journal Club B - CRIH

HyperSense v1.0 · MIT License · For research and educational use only · Not for clinical diagnosis

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
assets		assets
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explainable AI-Powered Hypertension Risk Screening for West Africa

What Is HyperSense?

Why It Was Built

Data Sources

Model Performance

SHAP Explainability

Methodology Notes

Limitations

Roadmap

Version 1.1 Update

What's New

Why BMI?

Methodological Trade-off

Important Note

Tech Stack

Project Structure

Data Access

Citation

Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Explainable AI-Powered Hypertension Risk Screening for West Africa

What Is HyperSense?

Why It Was Built

Data Sources

Model Performance

SHAP Explainability

Methodology Notes

Limitations

Roadmap

Version 1.1 Update

What's New

Why BMI?

Methodological Trade-off

Important Note

Tech Stack

Project Structure

Data Access

Citation

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages