AUTHOR=Rath Adyasha , Mishra Debahuti , Panda Ganapati TITLE=Imbalanced ECG signal-based heart disease classification using ensemble machine learning technique JOURNAL=Frontiers in Big Data VOLUME=5 YEAR=2022 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2022.1021518 DOI=10.3389/fdata.2022.1021518 ISSN=2624-909X ABSTRACT=

The machine learning (ML)-based classification models are widely utilized for the automated detection of heart diseases (HDs) using various physiological signals such as electrocardiogram (ECG), magnetocardiography (MCG), heart sound (HS), and impedance cardiography (ICG) signals. However, ECG-based HD identification is the most common one used by clinicians. In the current investigation, the ECG records or subjects have been sampled and are used as inputs to the classification model to distinguish between normal and abnormal patients. The study has employed an imbalanced number of ECG samples for training the various classification models. Few ML methods such as support vector machine (SVM), logistic regression (LR), and adaptive boosting (AdaBoost) which have been rarely used for HD detection have been selected. The performance of the developed model has been evaluated in terms of accuracy, F1-score, and area under curve (AUC) values using ECG signals of subjects given in publicly available (PTB-ECG, MIT-BIH) datasets. Ranking of the models has been assigned based on these performance metrics and it is found that the AdaBoost and LR classifiers stand in first and second positions. These two models have been ensembled based on the majority voting principle and the performance measure of this ensemble model has also been determined. It is, in general, observed that the proposed ensemble model demonstrates the best HD detection performance of 0.946, 0.949, and 0.951 for the PTB-ECG dataset and 0.921, 0.926, and 0.950 for the MIT-BIH dataset in terms of accuracy, F1-score, and AUC, respectively. The proposed methodology can also be employed for the classification of HD using ICG, MCG, and HS signals as inputs. Further, the proposed methodology can also be applied to the detection of other diseases.