Early Detection of Hypertension Risk: A Supervised Machine Learning Approach
Halliru Sani
Mardiyya Lawal Bagiwa
Musbahu Salisu
Abstract
This study introduces a machine-learning–based hypertension risk prediction system aimed at facilitating the early detection of high blood pressure. Hypertension is a major risk factor for cardiovascular disease, stroke, and kidney complications, yet it often remains undetected until severe outcomes occur. This study proposes a supervised machine–learning–based system for the early prediction of hypertension risk, utilising health records from the National Health and Nutrition Examination Survey (NHANES) accessed via Kaggle. The data was cleaned, preprocessed, and analyzed before training Random Forest and Support Vector Machine (SVM) classifiers. Model performance was evaluated using multiple metrics, including accuracy, precision, recall, and F1-score, to provide a comprehensive assessment. Results showed that Random Forest outperformed SVM, achieving an accuracy of 82.2%, precision of 81.1%, recall of 81.9%, and F1-score of 81.5%, compared to SVM’s accuracy of 74.6% and F1-score of 73.0%. Cross-validation further confirmed the robustness of Random Forest, while feature importance analysis identified haemoglobin level and chronic kidney disease as dominant predictors. These findings demonstrate the value of ensemble-based models for complex health prediction tasks and highlight the feasibility of applying machine learning to develop practical, contextually relevant tools for the early detection of hypertension. Such systems hold promise for strengthening preventive healthcare and reducing the burden of hypertension-related complications.
References