AUTHOR=Lopes Ricardo R. , Mamprin Marco , Zelis Jo M. , Tonino Pim A. L. , van Mourik Martijn S. , Vis Marije M. , Zinger Svitlana , de Mol Bas A. J. M. , de With Peter H. N. , Marquering Henk A.
TITLE=Local and Distributed Machine Learning for Inter-hospital Data Utilization: An Application for TAVI Outcome Prediction
JOURNAL=Frontiers in Cardiovascular Medicine
VOLUME=8
YEAR=2021
URL=https://www.frontiersin.org/journals/cardiovascular-medicine/articles/10.3389/fcvm.2021.787246
DOI=10.3389/fcvm.2021.787246
ISSN=2297-055X
ABSTRACT=
Background: Machine learning models have been developed for numerous medical prognostic purposes. These models are commonly developed using data from single centers or regional registries. Including data from multiple centers improves robustness and accuracy of prognostic models. However, data sharing between multiple centers is complex, mainly because of regulations and patient privacy issues.
Objective: We aim to overcome data sharing impediments by using distributed ML and local learning followed by model integration. We applied these techniques to develop 1-year TAVI mortality estimation models with data from two centers without sharing any data.
Methods: A distributed ML technique and local learning followed by model integration was used to develop models to predict 1-year mortality after TAVI. We included two populations with 1,160 (Center A) and 631 (Center B) patients. Five traditional ML algorithms were implemented. The results were compared to models created individually on each center.
Results: The combined learning techniques outperformed the mono-center models. For center A, the combined local XGBoost achieved an AUC of 0.67 (compared to a mono-center AUC of 0.65) and, for center B, a distributed neural network achieved an AUC of 0.68 (compared to a mono-center AUC of 0.64).
Conclusion: This study shows that distributed ML and combined local models techniques, can overcome data sharing limitations and result in more accurate models for TAVI mortality estimation. We have shown improved prognostic accuracy for both centers and can also be used as an alternative to overcome the problem of limited amounts of data when creating prognostic models.