AUTHOR=Jajcay Nikola , Bezak Branislav , Segev Amitai , Matetzky Shlomi , Jankova Jana , Spartalis Michael , El Tahlawi Mohammad , Guerra Federico , Friebel Julian , Thevathasan Tharusan , Berta Imrich , Pölzl Leo , Nägele Felix , Pogran Edita , Cader F. Aaysha , Jarakovic Milana , Gollmann-Tepeköylü Can , Kollarova Marta , Petrikova Katarina , Tica Otilia , Krychtiuk Konstantin A. , Tavazzi Guido , Skurk Carsten , Huber Kurt , Böhm Allan TITLE=Data processing pipeline for cardiogenic shock prediction using machine learning JOURNAL=Frontiers in Cardiovascular Medicine VOLUME=10 YEAR=2023 URL=https://www.frontiersin.org/journals/cardiovascular-medicine/articles/10.3389/fcvm.2023.1132680 DOI=10.3389/fcvm.2023.1132680 ISSN=2297-055X ABSTRACT=Introduction

Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS.

Methods

We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD)—based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction.

Results

We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization.

Conclusion

We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.