AUTHOR=Huang Tao , Huang Zhihai , Peng Xiaodong , Pang Lingpin , Sun Jie , Wu Jinbo , He Jinman , Fu Kaili , Wu Jun , Sun Xishi TITLE=Construction and validation of risk prediction models for pulmonary embolism in hospitalized patients based on different machine learning methods JOURNAL=Frontiers in Cardiovascular Medicine VOLUME=11 YEAR=2024 URL=https://www.frontiersin.org/journals/cardiovascular-medicine/articles/10.3389/fcvm.2024.1308017 DOI=10.3389/fcvm.2024.1308017 ISSN=2297-055X ABSTRACT=Objective

This study aims to apply different machine learning (ML) methods to construct risk prediction models for pulmonary embolism (PE) in hospitalized patients, and to evaluate and compare the predictive efficacy and clinical benefit of each model.

Methods

We conducted a retrospective study involving 332 participants (172 PE positive cases and 160 PE negative cases) recruited from Guangdong Medical University. Participants were randomly divided into a training group (70%) and a validation group (30%). Baseline data were analyzed using univariate analysis, and potential independent risk factors associated with PE were further identified through univariate and multivariate logistic regression analysis. Six ML models, namely Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Support Vector Machine (SVM), and AdaBoost were developed. The predictive efficacy of each model was compared using the receiver operating characteristic (ROC) curve analysis and the area under the curve (AUC). Clinical benefit was assessed using decision curve analysis (DCA).

Results

Logistic regression analysis identified lower extremity deep venous thrombosis, elevated D-dimer, shortened activated partial prothrombin time, and increased red blood cell distribution width as potential independent risk factors for PE. Among the six ML models, the RF model achieved the highest AUC of 0.778. Additionally, DCA consistently indicated that the RF model offered the greatest clinical benefit.

Conclusion

This study developed six ML models, with the RF model exhibiting the highest predictive efficacy and clinical benefit in the identification and prediction of PE occurrence in hospitalized patients.