The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Public Health
Sec. Digital Public Health
Volume 12 - 2024 |
doi: 10.3389/fpubh.2024.1511689
STI/HIV Risk Prediction Model Development -A Novel Use of Public Data to Forecast STIs/HIV Risk for Men Who Have Sex with Men
Provisionally accepted- 1 School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, Australia
- 2 School of Nursing and Midwifery, Centre for Health Research, Institute for Resilient Regions, University of Southern Queensland, Ipswich, Australia
- 3 School of Psychology and Wellbeing, Centre for Health Research, Institute for Resilient Regions, University of Southern Queensland, Ipswich, Australia
- 4 School of Public Health, Faculty of Medicine, The University of Queensland, Herston, Queensland, Australia
A novel automatic framework is proposed for global sexually transmissible infections (STIs) and HIV risk prediction. Four machine learning methods, namely, Gradient Boosting Machine (GBM), Random Forest (RF), XG Boost, and Ensemble learning GBM-RF-XG Boost are applied and evaluated on the Demographic and Health Surveys Program (DHSP), with thirteen features ultimately selected as the most predictive features. Classification and generalization experiments are conducted to test the accuracy, F1-score, precision, and area under the curve (AUC) performance of these four algorithms. Two imbalanced data solutions are also applied to reduce bias for classification performance improvement. The experimental results of these models demonstrate that the Random Forest algorithm yields the best results on HIV prediction, whereby the highest accuracy, and AUC are 0.99 and 0.99, respectively. The performance of the STI prediction achieves the best when the Synthetic Minority Oversampling Technique (SMOTE) is applied (Accuracy = 0.99, AUC = 0.99), which outperforms the state-of-the-art baselines. Two possible factors that may affect the classification and generalization performance are further analyzed. This automatic classification model helps to improve convenience and reduce the cost of HIV testing.
Keywords: Human immunodeficiency virus (HIV), sexually transmissible infections (STIs), Artificial intelligence (AI), machine learning, risk prediction
Received: 15 Oct 2024; Accepted: 29 Nov 2024.
Copyright: Ā© 2024 Ji, Tang, Osborne, Nguyen, Mullens, Dean and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Xiaopeng Ji, School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba, Australia
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.