AUTHOR=Gutiérrez-Esparza Guadalupe , Pulido Tomas , Martínez-García Mireya , Ramírez-delReal Tania , Groves-Miralrio Lucero E. , Márquez-Murillo Manlio F. , Amezcua-Guerra Luis M. , Vargas-Alarcón Gilberto , Hernández-Lemus Enrique TITLE=A machine learning approach to personalized predictors of dyslipidemia: a cohort study JOURNAL=Frontiers in Public Health VOLUME=11 YEAR=2023 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1213926 DOI=10.3389/fpubh.2023.1213926 ISSN=2296-2565 ABSTRACT=Introduction

Mexico ranks second in the global prevalence of obesity in the adult population, which increases the probability of developing dyslipidemia. Dyslipidemia is closely related to cardiovascular diseases, which are the leading cause of death in the country. Therefore, developing tools that facilitate the prediction of dyslipidemias is essential for prevention and early treatment.

Methods

In this study, we utilized a dataset from a Mexico City cohort consisting of 2,621 participants, men and women aged between 20 and 50 years, with and without some type of dyslipidemia. Our primary objective was to identify potential factors associated with different types of dyslipidemia in both men and women. Machine learning algorithms were employed to achieve this goal. To facilitate feature selection, we applied the Variable Importance Measures (VIM) of Random Forest (RF), XGBoost, and Gradient Boosting Machine (GBM). Additionally, to address class imbalance, we employed Synthetic Minority Over-sampling Technique (SMOTE) for dataset resampling. The dataset encompassed anthropometric measurements, biochemical tests, dietary intake, family health history, and other health parameters, including smoking habits, alcohol consumption, quality of sleep, and physical activity.

Results

Our results revealed that the VIM algorithm of RF yielded the most optimal subset of attributes, closely followed by GBM, achieving a balanced accuracy of up to 80%. The selection of the best subset of attributes was based on the comparative performance of classifiers, evaluated through balanced accuracy, sensitivity, and specificity metrics.

Discussion

The top five features contributing to an increased risk of various types of dyslipidemia were identified through the machine learning technique. These features include body mass index, elevated uric acid levels, age, sleep disorders, and anxiety. The findings of this study shed light on significant factors that play a role in dyslipidemia development, aiding in the early identification, prevention, and treatment of this condition.