To explore machine-learning applications in myopia prediction and analyze the influencing factors of myopia.
Stratified cluster random sampling was used to select elementary school students in Shenzhen, China for inclusion in this case-control study. Myopia screening, ocular biological parameter measurements, and questionnaires were conducted. Random forest (RF), decision tree (DT), extreme gradient boosting trees (XGBoost), support vector machine (SVM), and logistic regression (LR) algorithms were used to construct five myopia prediction models using R software (version 4.3.0). These myopia prediction models were used to investigate the relationship between ocular biological parameters, environmental factors, behavioral factors, genetic factors, and myopia.
This study included 2,947 elementary school students, with a myopia prevalence rate of 47.2%. All five prediction models had an area under the receiver operating characteristic curve (AUC) above 0.75, with prediction accuracy and precision exceeding 0.70. The AUCs in the testing set were 0.846, 0.837, 0.833, and 0.815 for SVM, LR, RF, and XGBoost, respectively, indicating their superior predictive performance to that of DT (0.791). In the RF model, the five most important variables were axial length, age, sex, maternal myopia, and feeding pattern. LR identified axial length was the most significant risk factor for myopia [odds ratio (OR) =8.203], followed by sex (OR = 2.349), maternal myopia (OR = 1.437), Reading and writing posture (OR = 1.270), infant feeding pattern (OR = 1.207), and age (OR = 1.168); corneal radius (OR = 0.034) and anterior chamber depth (OR = 0.516) served as protective factors.
Myopia prediction models based on machine learning demonstrated favorable predictive performance and accurately identified myopia risk factors, and may therefore aid in the implementation of myopia prevention and control measures among high-risk individuals.