HyperSense is an explainable machine learning system that estimates an individual's likelihood of hypertension using six non-invasive inputs — no clinical equipment, no blood test, no blood pressure cuff required.
Given age, sex, place of residence, education level, tobacco use, and BMI, HyperSense returns:
- A hypertension risk tier (High / Low)
- A probability estimate of elevated blood pressure
- A SHAP-based explanation of the personal factors driving the result
- A personalised recommendation for preventive action
⚠️ HyperSense is a screening and awareness tool. It does not diagnose hypertension. All results should be confirmed with a measured blood pressure reading from a trained healthcare professional.
Hypertension affects an estimated 30% of Nigerian adults, yet fewer than 29% are aware of their condition. Only 22% receive treatment. With a physician-to-patient ratio of 3.8 per 10,000 — far below the WHO recommendation — clinical screening at scale is neither feasible nor sustainable through the existing health workforce alone.
Existing risk prediction tools — including the Framingham Risk Score, ASCVD Pooled Cohort Equations, and ESH/ESC 2018 models — were derived from predominantly Western cohorts. They may not accurately reflect the epidemiological, dietary, and demographic characteristics of West African populations.
HyperSense was built to address two simultaneous gaps:
- The tool gap — no publicly accessible, non-invasive hypertension screener calibrated for West African populations
- The data gap — Nigeria's national health surveys collect no measured blood pressure data, a surveillance failure documented and reported as part of this project
HyperSense is trained on fieldworker-measured blood pressure data from two nationally representative surveys:
| Dataset | Survey Round | Women | Men | BP Measurement |
|---|---|---|---|---|
| Ghana DHS 2014 | Standard DHS (DHS-7) | 9,396 | 4,388 | Fieldworker-measured |
| Benin DHS 2017–18 | Standard DHS (DHS-7) | 15,928 | 7,595 | Fieldworker-measured |
| Combined | ~25,300 | ~12,000 | n = 20,446 (with valid BP) |
Label derivation:
htn_status = 1 if SBP ≥ 140 mmHg OR DBP ≥ 90 mmHg
htn_status = 0 otherwise
Labels are derived exclusively from measured blood pressure values. Self-reported hypertension diagnosis was not used.
Survey weights (V005 / 1,000,000) were applied during model training to ensure population-representative estimates.
| Metric | Value |
|---|---|
| Algorithm | XGBoost (tuned via RandomizedSearchCV) |
| Training sample | n = 16,356 |
| Test sample | n = 4,090 |
| ROC-AUC | 0.735 |
| Sensitivity at deployment threshold | 0.80 |
| Specificity at deployment threshold | 0.54 |
| Brier Score | 0.162 |
| Deployment threshold | 0.353 |
| Class imbalance handling | scale_pos_weight = 5 |
Clinical interpretation at deployment threshold:
- 376 / 470 hypertensives in the test set correctly flagged (80%)
- 94 / 470 missed (20% false negative rate — acknowledged limitation)
- 1,659 normotensives referred unnecessarily (low clinical harm — BP check only)
- 1,961 correctly reassured
Performance is consistent with published non-invasive hypertension screening models in sub-Saharan African populations (reported AUC range: 0.68–0.78).
Every prediction includes a SHAP (SHapley Additive Explanations) explanation generated by TreeExplainer. Users see:
- The direction each factor pushes their risk (increasing or decreasing)
- The magnitude of each factor's contribution
- Plain-language interpretation of the top drivers
Global feature importance ranking (SHAP):
- Age (dominant predictor)
- Educational level
- Residence (urban/rural)
- Sex
- Tobacco use
Age dominance is clinically expected and consistent with the epidemiological literature on hypertension in West Africa.
BMI inclusion: During early model development, BMI was excluded because anthropometric measurements and blood pressure measurements did not fully overlap in the Benin DHS 2017–18 survey. Restricting analysis to participants with complete BMI data reduced the available sample from 20,446 to 7,844 observations (62% reduction), raising concerns about selection bias and loss of statistical power.
As a result, the initial HyperSense deployment (v1.0) prioritised a larger population-representative sample without BMI. Following additional model development and evaluation, BMI has been incorporated into HyperSense v1.1 through a secondary modelling pipeline, enabling more personalised risk assessment while acknowledging the trade-off between feature richness and sample size.
Why Ghana and Benin, not Nigeria? A systematic search of all publicly available Nigerian health datasets confirmed that no nationally representative Nigerian dataset with fieldworker-measured blood pressure values currently exists. The Nigeria DHS (all rounds including 2023–24) collects only self-reported hypertension awareness. The 2023 Nigeria STEPS Survey with measured biomarkers is under validation as of mid-2026. Active data access requests are pending with authors who have contributed to blood pressure data on a large scale in recent years.
Threshold selection: The deployment threshold (0.353) was selected to achieve 80% sensitivity, consistent with the clinical priority of a screening tool: minimise missed hypertensives, accept higher false positive rate.
Survey weights:
DHS surveys use stratified cluster sampling. Survey weights were applied during XGBoost training (sample_weight parameter) to ensure population-representative learning. Evaluation metrics were computed on the unweighted test set to reflect expected performance on individual users at deployment.
| Limitation | Detail |
|---|---|
| Geographic generalisability | Trained on Ghanaian and Beninese adults; not validated on Nigerian data yet |
| Age ceiling | DHS surveys sample ages 15–64; highest-risk demographic (60+) unrepresented |
| Cross-sectional design | Screens current likelihood of hypertension, not future cardiovascular events |
| BMI absent | Primary model excludes BMI due to subsample non-overlap |
| Calibration | Formal isotonic calibration not completed; Brier score reported from test set |
| Not a diagnostic tool | High-risk output does not confirm hypertension; low-risk does not exclude it |
- Phase 2 — Fine-tuning on Nigerian measured BP data (data access requests active)
- BMI sensitivity analysis — secondary model on n=7,844 complete-case sample
- Formal calibration assessment (isotonic regression on dedicated holdout)
- TRIPOD-compliant methodology write-up for peer review
- OSF pre-registration for pilot evaluation study
- Multilingual support — Yoruba, Hausa, Igbo
HyperSense v1.1 introduces Body Mass Index (BMI) as a predictive feature derived from user-entered height and weight.
- BMI added as a model input
- Height and weight collection added to the screening interface
- Updated XGBoost model retrained with BMI-enhanced feature set
- Refined screening result messaging
- Improved clinical disclaimers and user guidance
BMI is a well-established risk factor associated with hypertension and contributes meaningful predictive information beyond demographic variables alone. Incorporating BMI allows HyperSense to provide a more individualized screening assessment while maintaining a fully non-invasive workflow.
The original HyperSense v1.0 model was trained on approximately 20,446 adults with valid blood pressure measurements across Ghana and Benin. However, BMI data were not available for all participants due to differences in survey design and data collection procedures.
Including BMI required restricting training to a substantially smaller complete-case sample. This reduced population coverage but enabled the model to leverage an important physiological risk factor that was previously unavailable.
HyperSense v1.1 therefore prioritizes richer individual-level information over maximum sample size. Both approaches have advantages: larger datasets generally improve population representativeness, while BMI-enhanced models may provide more personalized risk assessment.
The HyperSense score remains a screening result, not a diagnosis. Users with elevated screening scores should obtain a blood pressure measurement from a qualified healthcare provider.
Python 3.14.2 · Data processing and modelling
XGBoost · Primary classification model
SHAP · Explainability (TreeExplainer)
scikit-learn · Pipeline, preprocessing, evaluation
pandas / numpy · Data wrangling
Streamlit · Web application
pyreadstat · DHS .dta file ingestion
Deployed on: Streamlit Community Cloud
Repository: github.com/Phawazz/HyperSense
HyperSense/
├── app.py # Streamlit application
├── requirements.txt
├── assets/ # Logo and visual assets
├── models/ # Trained model artifacts
│ ├── hypersense_model.pkl
│ ├── hypersense_explainer.pkl
│ └── model_config.json
└── notebooks/
├── 01_extraction.ipynb # DHS data extraction
├── 02_harmonization.ipynb # Cross-country harmonization
├── 03_eda.ipynb # Exploratory data analysis
└── 04_modeling.ipynb # Model development & evaluation
└── 05_shap.ipynb # SHapley Additive exPlanations
└── 06_deployment.ipynb # Pre-deployment checks
The DHS datasets used in this project are publicly available upon registration at dhsprogram.com. Per DHS terms of use, raw data files are not included in this repository.
If you use HyperSense in research or build on this work:
@software{hypersense2026,
author = {Bello, Fawaz Ariyo},
title = {HyperSense: Explainable ML for Hypertension Risk Screening in West Africa},
year = {2026},
url = {https://github.com/Phawazz/HyperSense},
note = {Trained on Ghana DHS 2014 + Benin DHS 2017--18}
}Training data:
The DHS Program. Ghana Standard DHS 2014; Benin Standard DHS 2017–18. ICF, Rockville, Maryland, USA. dhsprogram.com
Fawaz Bello
Medical Student · College of Medicine, University of Ibadan
ML Engineer · Cardiovascular Health Journal Club B - CRIH
HyperSense v1.0 · MIT License · For research and educational use only · Not for clinical diagnosis
