Skip to main content

ORIGINAL RESEARCH article

Front. Med., 09 December 2021
Sec. Ophthalmology
This article is part of the Research Topic Computational Medicine in Visual Impairment and Its Related Disorders View all 7 articles

Usefulness of Machine Learning for Identification of Referable Diabetic Retinopathy in a Large-Scale Population-Based Study

\nCheng Yang&#x;Cheng Yang1Qingyang Liu&#x;Qingyang Liu2Haike Guo,Haike Guo3,4Min ZhangMin Zhang2Lixin ZhangLixin Zhang5Guanrong ZhangGuanrong Zhang6Jin ZengJin Zeng1Zhongning HuangZhongning Huang1Qianli Meng
Qianli Meng1*Ying Cui
Ying Cui1*
  • 1Department of Ophthalmology, Guangdong Provincial People's Hospital, Guangdong Eye Institute, Guangdong Academy of Medical Sciences, Guangzhou, China
  • 2Department of Ophthalmology, Dongguan People's Hospital, Dongguan, China
  • 3Shanghai Peace Eye Hospital, Shanghai, China
  • 4Xiamen Eye Center, Xiamen University, Xiamen, China
  • 5Department of Ophthalmology, Hengli Hospital, Dongguan, China
  • 6Information and Statistical Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China

Purpose: To development and validation of machine learning-based classifiers based on simple non-ocular metrics for detecting referable diabetic retinopathy (RDR) in a large-scale Chinese population–based survey.

Methods: The 1,418 patients with diabetes mellitus from 8,952 rural residents screened in the population-based Dongguan Eye Study were used for model development and validation. Eight algorithms [extreme gradient boosting (XGBoost), random forest, naïve Bayes, k-nearest neighbor (KNN), AdaBoost, Light GBM, artificial neural network (ANN), and logistic regression] were used for modeling to detect RDR in individuals with diabetes. The area under the receiver operating characteristic curve (AUC) and their 95% confidential interval (95% CI) were estimated using five-fold cross-validation as well as an 80:20 ratio of training and validation.

Results: The 10 most important features in machine learning models were duration of diabetes, HbA1c, systolic blood pressure, triglyceride, body mass index, serum creatine, age, educational level, duration of hypertension, and income level. Based on these top 10 variables, the XGBoost model achieved the best discriminative performance, with an AUC of 0.816 (95%CI: 0.812, 0.820). The AUCs for logistic regression, AdaBoost, naïve Bayes, and Random forest were 0.766 (95%CI: 0.756, 0.776), 0.754 (95%CI: 0.744, 0.764), 0.753 (95%CI: 0.743, 0.763), and 0.705 (95%CI: 0.697, 0.713), respectively.

Conclusions: A machine learning–based classifier that used 10 easily obtained non-ocular variables was able to effectively detect RDR patients. The importance scores of the variables provide insight to prevent the occurrence of RDR. Screening RDR with machine learning provides a useful complementary tool for clinical practice in resource-poor areas with limited ophthalmic infrastructure.

Introduction

Diabetes mellitus affects 463 million adults and consumes 1.8% of gross domestic product globally, posing a huge burden on healthcare systems, especially in remote, underserved areas (1). Diabetic retinopathy (DR) is a vision-threatening condition that affects 22.27% of adults with diabetes (2). With the diabetes pandemic spreading from wealthy industrialized countries to developing regions, the number of people with DR will increase from 103.12 million in 2020 to 160.50 million in 2045 (2). Visual impairment and blindness due to DR can be significantly reduced if diagnosed at an early stage and treated appropriately. However, due to the high cost and low accessibility of eye services, <70% of people with diabetes receive eye examinations at regular intervals (3, 4).

The current strategy for detecting DR is based on clinical examination by an ophthalmologist or grading of retinal photographs via telemedicine, which relies on highly trained staff or requires expensive equipment. In addition, whether the recommended screening interval can be extended has attracted extensive debate because a large number of DR-negative patients receive repeated annual fundus screenings (5). It was estimated that the DR service would be reduced by 40% if people with no visible retinopathy at two consecutive screens received 2-year rather than annual screening in the Scottish Diabetic Retinopathy Screening programme (6). The National Health Service Foundation Trust claimed that screening people with type 2 diabetes every 2 years, rather than annually, would reduce screening costs by 25% (7). Therefore, establishing simple, practical methods for identifying people at high risk of referable DR (RDR) based on easily accessible indicators has become an important goal, which will help to target screening and prevention (8, 9).

Modeling for RDR is challenging because most medical data has a non-linear, non-normal, and non-independent distribution, and traditional regression analysis techniques would lose information (10). The use of machine learning (ML) techniques offers an alternative solution, which captures the non-linear relationship in data without prior assumption. Furthermore, ML is able to rank variables by importance. Previous studies have demonstrated that ML-based methods can accurately identify diabetes in the general population (11, 12). However, limited studies based on ML for DR classification are available to date (13, 14). To fill the knowledge gap, this study aims to develop RDR classifiers based on four ML techniques using simple non-ocular indicators and compare them with traditional logistic regression models to evaluate their usefulness in screening RDR in a large population-based survey.

Methods

Data Source and Participants

This study is a secondary analysis based on the Dongguan Eye Study (DES), which is a large-scale population-based survey conducted in Guangdong Province, Southern China (15, 16). The present study protocol was approved by the ethics committees of Guangdong Provincial People's Hospital. The study was performed in accordance with the Declaration of Helsinki. Written consent was obtained from all participants before entering the study.

The detailed methodology of the DES has been reported in previous articles (15, 16). In brief, 11,357 eligible participants residing in Hengli Town, Dongguan City were recruited between September 2011 and February 2012, with 8,952 (response rate 78.8%) completing the systemic and ophthalmic examinations. Standardized questionnaires were used to obtain data on demographics, lifestyle, socio-economic status, quality of life, and medical and family history. Height, weight, waist and hip circumference, and blood pressure were measured using standard protocols. Fasting venous blood was collected to obtain the following measurements: fasting blood glucose (FBG), hemoglobin A1c (HbA1c), total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and blood uric acid (UA). All participants with diabetes or hypertension received a comprehensive ocular examination that covered visual acuity, automatic refraction, slit lamp, intraocular pressure, and retinal photography.

Definition of the Outcome

The diagnosis of diabetes is based on medical history and endocrinologists' records, the use of insulin therapy, oral hypoglycaemic drugs, or the latest criteria according to the Chinese Guidelines for the Management of Diabetes Mellitus in the Elderly (2021 Edition): typical symptoms of diabetes mellitus (irritable and excessive drinking, polyuria, polyphagia, and unexplained weight loss) plus random plasma glucose ≥ 11.1 mmol/L; FBG ≥ 7.0 mmol/L; 2-h plasma glucose level ≥ 11.1 mmol/L during a 75-g oral glucose tolerance test (OGTT); or HbA1c ≥ 6.5%. Re-testing on another day was performed to confirm the diagnosis.

The DR status was graded based on fundus photography according to the classification system designed by the Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR) and the Early Treatment Diabetic Retinopathy Study (ETDRS) (17). Diabetic macular oedema (DME) was diagnosed according to the International Diabetic Macular Edema Severity Scale, defined as having significant retinal thickening or hard exudate in the posterior pole. Fundus fluorescein angiography (FFA) was performed to confirm the diagnosis in participants with suspected severe non-proliferative diabetic retinopathy (NPDR) or proliferative diabetic retinopathy (PDR), macular edema, retinal vasculopathy, posterior uveitis, or other retinochoroidal diseases. DR was categorized into no diabetic retinopathy, mild NPDR, moderate NPDR, severe NPDR, or PDR. RDR was adopted as the primary outcome in the present study, which was defined as the presence of moderate NPDR, severe NPDR, PDR, or DME. RDR is clinically essential because RDR people will be referred to ophthalmologists for review in a diabetic retinopathy screening programme, while those without RDR will continue to be screened in primary care (18).

Variables for Modeling

Table 1 shows the potential variables included in the model, including age, gender, body mass index (BMI), waist-to-hip ratio (WHR), waist circumference, blood pressure, lifestyle, and medical and family history. Lifestyle information included smoking, alcohol consumption, and dietary habits. Those who smoked at least one cigarette a day for 6 months were defined as smokers, while those who drank alcohol at least once a week for 6 months were defined as drinkers. Duration of diabetes was defined as the time between the first diagnosis of diabetes by an endocrinologist and entry into this study. For newly diagnosed diabetes, diabetes duration was defined as 0 years. In addition, laboratory serum parameters were included, as these tests are routine performed in government health institutions in China.

TABLE 1
www.frontiersin.org

Table 1. A list of the variables that were used for modeling in this study.

Statistical Analysis

All analyses were performed in R software version 4.0.3. The distribution of demographic and clinical characteristics was presented using mean ± standard deviation (SD) for continuous variables and by number and percentage for categorical data. Differences between RDR and non-RDR patients were evaluated by using the independent t-test for continuous normally distributed variables, the Mann-Whitney test for non-normally distributed variables, and the chi-squared test for categorical variables. All tests were two-tailed, and p < 0.05 was considered to be statistically significant.

Figure 1 shows the analytical framework for this study. Eight algorithms were used to construct models for detecting RDRs: extreme gradient boosting (XGBoost), random forest, naïve Bayes, k-nearest neighbor (KNN), AdaBoost, Light gradient boosting machine (GBM), artificial neural network (ANN), and logistic regression. ML techniques were able to calculate the importance of the variables, i.e., the effect of each variable on the generated model of statistical significance. To identify the most important features for diagnosing RDR (Table 1), we applied XGBoost, random forest, naïve Bayes, and KNN to rank the importance of the variables. The top 10 variables that were present in all four ML algorithms were entered into the subsequent model development. After data cleaning, the data were randomly divided into a training set and a test set (at an 80:20 ratio) to assess the reliability of these classifiers. To obtain the realistic and generalisable estimates as well as conservative confidence intervals, five-fold cross-validation and variance estimation were performed. Each model was fitted based on the training dataset, and its accuracy was assessed on the test dataset. The area under the curve (AUC) of receiver operating characteristic (ROC) curves was calculated to evaluate the performance of each model.

FIGURE 1
www.frontiersin.org

Figure 1. Machine learning flowchart of this study. ML, machine learning; XGBoost, extreme gradient boosting; ANN, artificial neural network; AdaBoost, adaptive boosting; GBM, gradient boosting machine.

Results

Table 2 summarizes the demographic and clinical characteristics of the included participants. A total of 1,418 eligible patients with diabetes (82 RDR and 1,336 non-RDR) were included in the model development and internal validation. The mean ages of participants with RDR and non-RDR were 61.1 ± 10.7 years and 60.0 ± 11.0 years, respectively. The participants with RDR had higher systolic blood pressure (SBP), a longer duration of diabetes, higher levels of FBG and HbA1c, and a higher proportion of family history of diabetes (all p < 0.05). Other characteristics were similar in the two groups, such as sex, BMI, WHR, and BUN (all p > 0.05).

TABLE 2
www.frontiersin.org

Table 2. Characteristics of the included participants.

Relative Importance of Variables

Figure 2 shows the top 10 features for RDR in the four ML algorithms. The duration of diabetes, BUN, BMI, FBG, TG, HbA1c, TC, age, SBP, and WHR were identified as the 10 most important factors in the XGBoost model. The duration of diabetes, FBG, HbA1c, SCr, TC, TG, BUN, BMI, WHR, and age ranked as the 10 most important factors in the random forest model. In the naïve Bayes model, the top 10 factors were the duration of diabetes, drinking status, history of hyperlipidaemia, use of insulin, education level, daily alcohol consumption, marital status, TG, duration of drinking, and duration of smoking. Among the top 20 factors in each model, there were 10 essential factors present in all models: the duration of diabetes, HbA1c, SBP, TG, BMI, SCr, age, education level, duration of hypertension, and income level (Supplementary Table 1; Figure 3).

FIGURE 2
www.frontiersin.org

Figure 2. Feature importance contributed to each machine learning model. (A) XGBoost. (B) Random forest. (C) Naïve Bayes. (D) KNN.

FIGURE 3
www.frontiersin.org

Figure 3. Venn plot showing the most important features in each model for detecting referable diabetic retinopathy.

Performance of the ML Algorithms

Table 3 shows the discriminative performance of the algorithms using five-fold crossvalidation and an 80:20 ratio of training and validation. Figure 4 shows the performance of top-5 models. The XGBoost algorithm was nominally the best with an AUC of 0.816 (95%CI: 0.812, 0.820). The AUCs for logistic regression, AdaBoost, naïve Bayes, and Random forest were 0.766 (95%CI: 0.756, 0.776), 0.754 (95%CI: 0.744, 0.764), 0.753 (95%CI: 0.743, 0.763), and 0.705 (95%CI: 0.697, 0.713), respectively.

TABLE 3
www.frontiersin.org

Table 3. The performance of machine learning models for diagnosing referable diabetic retinopathy.

FIGURE 4
www.frontiersin.org

Figure 4. Receiver operating characteristic curves of five algorithms for detecting referable diabetic retinopathy based on top-10 important variables.

Discussion

This study developed and validated an ML-based model for screening RDR in a Chinese population using common and readily available variables. After ranking the importance of the risk factors, the top 10 essential risk factors were adopted for modeling by eight ML models. The XGBoost classifier exhibited the best performance with an AUC of 0.816, which was validated in an independent population. To our knowledge, this is the first diagnostic model for RDR in the Chinese diabetic population based on ML and simple variables, which has the potential for accurate and rapid RDR screening.

State-of-the-art ML methods were adopted in this study. Traditional regression analysis relies on hypothesis-driven assumptions, while the ML techniques used do not require a predetermined assumption. This feature allows for data-driven exploration for non-linear patterns that predict risk for a given individual, i.e., precise risk stratification (10, 19). As observed in this study, the ranking of the importance showed that the duration of diabetes, HbA1c, systolic blood pressure, TG, BMI, serum creatinine, age, education level, duration of hypertension, and income level were the 10 most important factors for RDR. Furthermore, the given ML algorithm requires only minimal input during the model development stage, which is particularly important given that ML models can easily incorporate new data to update and optimize, thereby continuously improving their discriminative performance over time (20). Our models provided information for DR screening in high-risk populations and can help to reduce the frequency of ocular examinations in low-risk populations (21).

Limited studies were available on risk stratification of DR based on ML and non-ocular parameters. Azizi-Soleiman et al. reported a model for detecting DR in Iranians based on outpatient clinical data (22). By training the data of 1,782 patients (without using cross-validation), the logit model obtained an AUC of 0.760 based on backward elimination as a feature selection strategy (22). Tsao et al. divided the clinical data of 536 patients in Taiwan into training and validation sets (at an 80:20 ratio), and compared the performance of four models (support vector machine, decision tree, ANN, and logistic regression) for DR detection, and found that support vector machine performed best with an AUC of 0.839 (14). Yao et al. reported that a back propagation artificial neural network outperformed logistic regression for DR detection with AUCs of 0.84 and 0.77, respectively (13). The abovementioned studies were based on hospital-based data, but population-based data are more relevant to the reality of DR screening programmes (5). This study applied ML techniques to population-based data and demonstrated their usefulness for RDR detection with similar AUCs to those in hospital-based studies.

The XGBoost algorithm, which has attracted attention in recent years due to its excellent performance and efficient training speed, performed best in this study. This model has been evaluated in several other ocular diseases. Oh et al. compared four ML models (support vector machine, C5.0, random forest, and XGboost) for detecting glaucoma and reported that XGboost performed best with an AUC of 0.945, accuracy of 0.947, sensitivity of 0.941, and specificity of 0.950 (23). Xu et al. demonstrated that the XGBoost classifier had the highest accuracies for predicting subretinal fluid absorption at 1, 3, and 6 months in patients with central serous chorioretinopathy (24). Wu et al. reported that the intraocular pressure in children with myopia treated with topical atropine can be predicted by using ML methods, and the XGBoost ranks the best predictive models (25). The present study confirmed that XGBoost is also a good tool for DR screening.

This study has several strengths. First, all variables were derived from easily accessible non-ocular examinations and questionnaires. The model is especially suitable for primary hospitals and diabetic clinics without the need for expensive laboratory tests and ocular specialists equipped with ophthalmic equipment, which is especially useful in areas of low socio-economic status and with limited health resources. Second, the model is derived from a large population-based survey in China, making it highly representative and generalisable. Third, the majority of previous studies divided smoking and alcohol consumption into only two categories (with or without), and therefore they do not reflect the effect of frequency and quantity on disease. The importance ranking analysis showed that the amount and duration of smoking and drinking were also important for RDR. Finally, the ranking of risk factors might provide insight into the prevention of DR. This study also has limitations. Only Chinese adults were included in the present study; however, ethnic variations in DR onset and progression have been confirmed in population studies (26, 27). Therefore, this study needs to be repeated with other races. In addition, this study evaluated the feasibility and performance of ML, but not its implementation. However, a population-based study is especially suited to assessing the initial feasibility of ML algorithms in the real world.

Conclusion

In this secondary analysis of a large-scale population-based survey, we first extracted demographic variables, laboratory test results, and medical and family history, and then applied different ML algorithms to rank risk factors and for identification of RDR. The XGBoost algorithm achieved the best performance based on 10 simple variables. The usage of ML algorithms to rank epidemic risk factors (other than ophthalmic examinations) to identify referable patients will reduce the cost and had a high application valuable in resource-poor areas in China.

Data Availability Statement

Data are available from the authors upon reasonable request and with permission of Guangdong Provincial People's Hospital.

Ethics Statement

Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author Contributions

CY and QL: conceptualization, investigation, formal analysis, and writing—original draft. HG: validation, resources, material support, administrative, and writing—review and editing. MZ: investigation, resources, material support, and administrative. LZ: investigation, material support, and review and editing. GZ: data analysis and review and editing. QM: project administration, conceptualization, investigation, supervision, formal analysis, and writing—review and editing. YC: project administration, conceptualization, investigation, supervision, formal analysis, and writing—review and editing. All authors contributed to the article and approved the submitted version.

Funding

This research was supported by the National Natural Science Foundation of China, Beijing, China (81800829, 82000897); Guangdong Basic and Applied Basic Research Foundation (2019A1515010697); the Guangzhou Science and Technology Program Project (202002030400). The funding organizations had no role in the design or conduct of this research.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2021.773881/full#supplementary-material

References

1. Bommer C, Sagalova V, Heesemann E, Manne-Goehler J, Atun R, Barnighausen T, et al. Global economic burden of diabetes in adults: projections from 2015 to 2030. Diabetes Care. (2018) 41:963-70. doi: 10.2337/dc17-1962

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Teo ZL, Tham YC, Yu M, Chee ML, Rim TH, Cheung N, et al. Global prevalence of diabetic retinopathy and projection of burden through 2045: systematic review and meta-analysis. Ophthalmology. (2021) 2021:S161-6420. doi: 10.1016/j.ophtha.2021.04.027

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Benoit SR, Swenor B, Geiss LS, Gregg EW, Saaddine JB. Eye care utilization among insured people with diabetes in the U.S., 2010-2014. Diabetes Care. (2019) 42:427-33. doi: 10.2337/dc18-0828

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Eppley SE, Mansberger SL, Ramanathan S, Lowry EA. Characteristics associated with adherence to annual dilated eye examinations among US patients with diagnosed diabetes. Ophthalmology. (2019) 126:1492-9. doi: 10.1016/j.ophtha.2019.05.033

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Taylor-Phillips S, Mistry H, Leslie R, Todkill D, Tsertsvadze A, Connock M, et al. Extending the diabetic retinopathy screening interval beyond 1 year: systematic review. Br J Ophthalmol. (2016) 100:105-14. doi: 10.1136/bjophthalmol-2014-305938

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Looker HC, Nyangoma SO, Cromie DT, Olson JA, Leese GP, Philip S, et al. Predicted impact of extending the screening interval for diabetic retinopathy: The Scottish Diabetic Retinopathy Screening programme. Diabetologia. (2013) 56:1716-25. doi: 10.1007/s00125-013-2928-7

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Chalk D, Pitt M, Vaidya B, Stein K. Can the retinal screening interval be safely increased to 2 years for type 2 diabetic patients without retinopathy? Diabetes Care. (2012) 35:1663-8. doi: 10.2337/dc11-2282

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Vujosevic S, Aldington SJ, Silva P, Hernandez C, Scanlon P, Peto T, et al. Screening for diabetic retinopathy: new perspectives and challenges. Lancet Diabetes Endocrinol. (2020) 8:337-47. doi: 10.1016/S2213-8587(19)30411-5

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Jampol LM, Glassman AR, Sun J. Evaluation and care of patients with diabetic retinopathy. N Engl J Med. (2020) 382:1629-37. doi: 10.1056/NEJMra1909637

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Jamshidi A, Pelletier JP, Martel-Pelletier J. Machine-learning-based patient-specific prediction models for knee osteoarthritis. Nat Rev Rheumatol. (2019) 15:49-60. doi: 10.1038/s41584-018-0130-5

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Boutilier JJ, Chan T, Ranjan M, Deo S. Risk stratification for early detection of diabetes and hypertension in Resource-Limited settings: machine learning analysis. J Med Internet Res. (2021) 23:e20123. doi: 10.2196/20123

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Zhang L, Shang X, Sreedharan S, Yan X, Liu J, Keel S, et al. Predicting the development of type 2 diabetes in a large australian cohort using Machine-Learning techniques: longitudinal survey study. JMIR Med Inform. (2020) 8:e16850. doi: 10.2196/16850

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Yao L, Zhong Y, Wu J, Zhang G, Chen L, Guan P, et al. Multivariable logistic regression and back propagation artificial neural network to predict diabetic retinopathy. Diabetes Metab Syndr Obes. (2019) 12:1943-51. doi: 10.2147/DMSO.S219842

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Tsao HY, Chan PY, Su EC. Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms. BMC Bioinformatics. (2018) 19:283. doi: 10.1186/s12859-018-2277-0

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Cui Y, Zhang M, Zhang L, Zhang L, Kuang J, Zhang G, et al. Prevalence and risk factors for diabetic retinopathy in a cross-sectional population-based study from rural southern China: Dongguan Eye Study. BMJ Open. (2019) 9:e23586. doi: 10.1136/bmjopen-2018-023586

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Meng Q, Cui Y, Zhang M, Zhang L, Zhang L, Zhang J, et al. Design and baseline characteristics of a population-based study of eye disease in southern Chinese people: the Dongguan Eye Study. Clin Exp Ophthalmol. (2016) 44:170-80. doi: 10.1111/ceo.12670

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Early Treatment Diabetic Retinopathy Study Research Group. Grading diabetic retinopathy from stereoscopic color fundus photographs - an extension of the modified Airlie house classification: ETDRS Report Number 10. Ophthalmology. (2020) 127:S99-119. doi: 10.1016/j.ophtha.2020.01.030

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Cheung CY, Sabanayagam C, Law AK, Kumari N, Ting DS, Tan G, et al. Retinal vascular geometry and 6 year incidence and progression of diabetic retinopathy. Diabetologia. (2017) 60:1770-81. doi: 10.1007/s00125-017-4333-0

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. (2018) 319:1317. doi: 10.1001/jama.2017.18391

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Weldy CS, Ashley EA. Towards precision medicine in heart failure. Nat Rev Cardiol. (2021) 18:745-62. doi: 10.1038/s41569-021-00566-9

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Emamipour S, van der Heijden A, Nijpels G, Elders P, Beulens J, Postma MJ, et al. A personalised screening strategy for diabetic retinopathy: a cost-effectiveness perspective. Diabetologia. (2020) 63:2452-61. doi: 10.1007/s00125-020-05239-9

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Azizi-Soleiman F, Heidari-Beni M, Ambler G, Omar R, Amini M, Hosseini SM. Iranian risk model as a predictive tool for retinopathy in patients with type 2 diabetes. Can J Diabetes. (2015) 39:358-63. doi: 10.1016/j.jcjd.2015.01.290

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Oh S, Park Y, Cho KJ, Kim SJ. Explainable machine learning model for glaucoma diagnosis and its interpretation. Diagnostics. (2021) 11:510. doi: 10.3390/diagnostics11030510

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Xu F, Xiang Y, Wan C, You Q, Zhou L, Li C, et al. Predicting subretinal fluid absorption with machine learning in patients with central serous chorioretinopathy. Ann Transl Med. (2021) 9:242. doi: 10.21037/atm-20-1519

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Wu TE, Chen HA, Jhou MJ, Chen YN, Chang TJ, Lu CJ. Evaluating the effect of topical atropine use for myopia control on intraocular pressure by using machine learning. J Clin Med. (2020) 10:111. doi: 10.3390/jcm10010111

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Sabanayagam C, Banu R, Chee ML, Lee R, Wang YX, Tan G, et al. Incidence and progression of diabetic retinopathy: A systematic review. Lancet Diabetes Endocrinol. (2019) 7:140-9. doi: 10.1016/S2213-8587(18)30128-1

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Tan GS, Gan A, Sabanayagam C, Tham YC, Neelam K, Mitchell P, et al. Ethnic differences in the prevalence and risk factors of diabetic retinopathy: the Singapore epidemiology of eye diseases study. Ophthalmology. (2018) 125:529-36. doi: 10.1016/j.ophtha.2017.10.026

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: diabetic retinopathy, machine learning, population-based study, classifier, XGBoost

Citation: Yang C, Liu Q, Guo H, Zhang M, Zhang L, Zhang G, Zeng J, Huang Z, Meng Q and Cui Y (2021) Usefulness of Machine Learning for Identification of Referable Diabetic Retinopathy in a Large-Scale Population-Based Study. Front. Med. 8:773881. doi: 10.3389/fmed.2021.773881

Received: 10 September 2021; Accepted: 11 November 2021;
Published: 09 December 2021.

Edited by:

Wei Wang, Sun Yat-sen University, China

Reviewed by:

Mo Yang, People's Liberation Army General Hospital, China
Minwen Zhou, Shanghai General Hospital, China
Xia Gong, Sun Yat-sen University, China

Copyright © 2021 Yang, Liu, Guo, Zhang, Zhang, Zhang, Zeng, Huang, Meng and Cui. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Qianli Meng, bWVuZ3FseSYjeDAwMDQwOzE2My5jb20=; Ying Cui, Y3VpeWluZy1zeXN1JiN4MDAwNDA7MTYzLmNvbQ==

These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.