AUTHOR=Li Wanyue , Zeng Li , Yuan Shiqi , Shang Yaru , Zhuang Weisheng , Chen Zhuoming , Lyu Jun TITLE=Machine learning for the prediction of cognitive impairment in older adults JOURNAL=Frontiers in Neuroscience VOLUME=17 YEAR=2023 URL=https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2023.1158141 DOI=10.3389/fnins.2023.1158141 ISSN=1662-453X ABSTRACT=Objective

The purpose of this study was to develop and validate a predictive model of cognitive impairment in older adults based on a novel machine learning (ML) algorithm.

Methods

The complete data of 2,226 participants aged 60–80 years were extracted from the 2011–2014 National Health and Nutrition Examination Survey database. Cognitive abilities were assessed using a composite cognitive functioning score (Z-score) calculated using a correlation test among the Consortium to Establish a Registry for Alzheimer's Disease Word Learning and Delayed Recall tests, Animal Fluency Test, and the Digit Symbol Substitution Test. Thirteen demographic characteristics and risk factors associated with cognitive impairment were considered: age, sex, race, body mass index (BMI), drink, smoke, direct HDL-cholesterol level, stroke history, dietary inflammatory index (DII), glycated hemoglobin (HbA1c), Patient Health Questionnaire-9 (PHQ-9) score, sleep duration, and albumin level. Feature selection is performed using the Boruta algorithm. Model building is performed using ten-fold cross-validation, machine learning (ML) algorithms such as generalized linear model (GLM), random forest (RF), support vector machine (SVM), artificial neural network (ANN), and stochastic gradient boosting (SGB). The performance of these models was evaluated in terms of discriminatory power and clinical application.

Results

The study ultimately included 2,226 older adults for analysis, of whom 384 (17.25%) had cognitive impairment. After random assignment, 1,559 and 667 older adults were included in the training and test sets, respectively. A total of 10 variables such as age, race, BMI, direct HDL-cholesterol level, stroke history, DII, HbA1c, PHQ-9 score, sleep duration, and albumin level were selected to construct the model. GLM, RF, SVM, ANN, and SGB were established to obtain the area under the working characteristic curve of the test set subjects 0.779, 0.754, 0.726, 0.776, and 0.754. Among all models, the GLM model had the best predictive performance in terms of discriminatory power and clinical application.

Conclusions

ML models can be a reliable tool to predict the occurrence of cognitive impairment in older adults. This study used machine learning methods to develop and validate a well performing risk prediction model for the development of cognitive impairment in the elderly.