AUTHOR=Zhu Yibing , Zhang Jin , Wang Guowei , Yao Renqi , Ren Chao , Chen Ge , Jin Xin , Guo Junyang , Liu Shi , Zheng Hua , Chen Yan , Guo Qianqian , Li Lin , Du Bin , Xi Xiuming , Li Wei , Huang Huibin , Li Yang , Yu Qian 

TITLE=Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database

JOURNAL=Frontiers in Medicine

VOLUME=Volume 8 - 2021

YEAR=2021

URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2021.662340

DOI=10.3389/fmed.2021.662340

ISSN=2296-858X

ABSTRACT=<p><bold>Background:</bold> Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features available on the first day of admission.</p><p><bold>Methods:</bold> A retrospective administrative database study from the Medical Information Mart for Intensive Care (MIMIC-III) database was conducted. The exposures of interest consisted of the demographics, pre-ICU comorbidity, ICU diagnosis, disease severity scores, vital signs, and laboratory test results on the first day of ICU admission. Hospital mortality was used as the outcome. We used the machine learning methods of <italic>k</italic>-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, Extreme Gradient Boosting (XGBoost), and neural network for model establishment. A sample of 70% of the cohort was used for the training set; the remaining 30% was applied for testing. Areas under the receiver operating characteristic curves (AUCs) and calibration plots would be constructed for the evaluation and comparison of the models' performance. The significance of the risk factors was identified through models and the top factors were reported.</p><p><bold>Results:</bold> A total of 28,530 subjects were enrolled through the screening of the MIMIC-III database. After data preprocessing, 25,659 adult patients with 66 predictors were included in the model analyses. With the training set, the models of KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost were established and the testing set obtained AUCs of 0.806, 0.818, 0.743, 0.819, 0.780, 0.803, and 0.821, respectively. The calibration curves of all the models, except for the neural network, performed well. The XGBoost model performed best among the seven models. The top five predictors were age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate.</p><p><bold>Conclusion:</bold> The current study indicates that models with the risk of factors on the first day could be successfully established for predicting mortality in ventilated patients. The XGBoost model performs best among the seven machine learning models.</p>