AUTHOR=Jajcay Nikola , Bezak Branislav , Segev Amitai , Matetzky Shlomi , Jankova Jana , Spartalis Michael , El Tahlawi Mohammad , Guerra Federico , Friebel Julian , Thevathasan Tharusan , Berta Imrich , Pölzl Leo , Nägele Felix , Pogran Edita , Cader F. Aaysha , Jarakovic Milana , Gollmann-Tepeköylü Can , Kollarova Marta , Petrikova Katarina , Tica Otilia , Krychtiuk Konstantin A. , Tavazzi Guido , Skurk Carsten , Huber Kurt , Böhm Allan 

TITLE=Data processing pipeline for cardiogenic shock prediction using machine learning

JOURNAL=Frontiers in Cardiovascular Medicine

VOLUME=Volume 10 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/cardiovascular-medicine/articles/10.3389/fcvm.2023.1132680

DOI=10.3389/fcvm.2023.1132680

ISSN=2297-055X

ABSTRACT=Introduction: Recent advances in machine learning provide new possibilities to process and analyse observational patient data to predict patient outcomes. In this paper, we introduce a data processing pipeline for cardiogenic shock (CS) prediction from the MIMIC III database of intensive cardiac care unit patients with acute coronary syndrome. The ability to identify high-risk patients could possibly allow taking pre-emptive measures and thus prevent the development of CS.
Methods: We mainly focus on techniques for the imputation of missing data by generating a pipeline for imputation and comparing the performance of various multivariate imputation algorithms, including k-nearest neighbours, two singular value decomposition (SVD) - based methods, and Multiple Imputation by Chained Equations. After imputation, we select the final subjects and variables from the imputed dataset and showcase the performance of the gradient-boosted framework that uses a tree-based classifier for cardiogenic shock prediction. 
Results: We achieved good classification performance thanks to data cleaning and imputation (cross-validated mean area under the curve 0.805) without hyperparameter optimization. 
Conclusion: We believe our pre-processing pipeline would prove helpful also for other classification and regression experiments.