Deep learning-based end-to-end automated stenosis classification and localization on catheter coronary angiography

Cong, Chao; Kato, Yoko; Vasconcellos, Henrique Doria De; Ostovaneh, Mohammad R.; Lima, Joao A. C.; Ambale-Venkatesh, Bharath

doi:10.3389/fcvm.2023.944135

ORIGINAL RESEARCH article

Front. Cardiovasc. Med. , 07 February 2023

Sec. Cardiovascular Imaging

Volume 10 - 2023 | https://doi.org/10.3389/fcvm.2023.944135

This article is part of the Research Topic Quantitative Imaging (QI) and Artificial Intelligence (AI) in Cardiovascular Diseases View all 16 articles

Deep learning-based end-to-end automated stenosis classification and localization on catheter coronary angiography

$\r\nChao Cong,&#x;$ Chao Cong^1,2†

Yoko Kato^1†

Henrique Doria De Vasconcellos¹

Mohammad R. Ostovaneh¹

Joao A. C. Lima¹

Bharath Ambale-Venkatesh^3*

¹Division of Cardiology, Johns Hopkins University, Baltimore, MD, United States
²School of Electrical and Electronic Engineering, Chongqing University of Technology, Chongqing, China
³Division of Radiology, Johns Hopkins University, Baltimore, MD, United States

Background: Automatic coronary angiography (CAG) assessment may help in faster screening and diagnosis of stenosis in patients with atherosclerotic disease. We aimed to provide an end-to-end workflow that separates cases with normal or mild stenoses from those with higher stenosis severities to facilitate safety screening of a large volume of the CAG images.

Methods: A deep learning-based end-to-end workflow was employed as follows: (1) Candidate frame selection from CAG videograms with Convolutional Neural Network (CNN) + Long Short Term Memory (LSTM) network, (2) Stenosis classification with Inception-v3 using 2 or 3 categories (<25%, >25%, and/or total occlusion) with and without redundancy training, and (3) Stenosis localization with two methods of class activation map (CAM) and anchor-based feature pyramid network (FPN). Overall 13,744 frames from 230 studies were used for the stenosis classification training and fourfold cross-validation for image-, artery-, and per-patient-level. For the stenosis localization training and fourfold cross-validation, 690 images with > 25% stenosis were used.

Results: Our model achieved an accuracy of 0.85, sensitivity of 0.96, and AUC of 0.86 in per-patient level stenosis classification. Redundancy training was effective to improve classification performance. Stenosis position localization was adequate with better quantitative results in anchor-based FPN model, achieving global-sensitivity for left coronary artery (LCA) and right coronary artery (RCA) of 0.68 and 0.70.

Conclusion: We demonstrated a fully automatic end-to-end deep learning-based workflow that eliminates the vessel extraction and segmentation step in coronary artery stenosis classification and localization on CAG images. This tool may be useful to facilitate safety screening in high-volume centers and in clinical trial settings.

Introduction

Coronary artery disease (CAD) is the leading cause of morbidity and mortality worldwide (1). X-ray coronary angiography (CAG) is the current gold standard imaging technique for CAD diagnosis. Expert CAG interpretation requires considerable “hands-on” training both visually and cognitively. In clinical practice and also for quality control purposes in research settings, screening CAG studies visually to distinguish cases with normal or mild stenosis from those with higher stenosis severity is a time-consuming process even for experienced readers. Developing an automatic CAG assessment tool to exclude normal or mild stenosis cases would facilitate diagnosis and treatment and enable the screening of large data sets for quality control purposes.

Recent studies confirmed the feasibility of using deep learning methods for CAG stenosis detection. Generally, the method consists of multiple steps. The most widely used vessel-based workflow starts from the visual or automatic selection of candidate frames (2–4) or regions (5, 6) from a CAG video. This is followed by the artery extraction using image segmentation algorithms (7) like center-tracking (8, 9), model-based (10), or Convolutional Neural Network (CNN) (11–15). Finally, individual stenotic lesion localization and classification is performed in two ways: patch-wise (16–18) and image-wise (2, 3, 6).

However, there are limitations in previous CAG stenosis classification and detection methods. One of the main drawbacks is that the vessel shape and characterization (19, 20) were not well exploited from a multi-view CAG study, causing a relatively low accuracy in detecting the stenotic lesions, especially in curved or bifurcation regions in the vascular tree (21, 22). Another limitation is that there are numerous pre-processing stages (manually or automatically) in some methods (15, 18, 23), such as detecting keyframes/region/views from a CAG sequence, or annotating segmentation for vessels, or preparing patches and labels for training procedure. The need for extensive human interaction during image data and training label preparation, in addition to addressing problems of sampling imbalance during supervised-learning, has led to algorithms that are commonly evaluated on small datasets prone to overfitting (7). Clinically speaking, those studies generally aimed to differentiate significant stenosis from non-significant stenosis in CAG images while developing a tool to facilitate safety screening of a large volume of CAG images by separating cases with normal or mild stenoses from those with higher stenosis severities have not been targeted (24).

In this study, we propose a fully automatic, deep learning-based end-to-end CAG stenosis detection method to achieve efficient safety screening and precise localization of stenoses. Our method consists of following unique steps that (1) it eliminates the vessel extraction and segmentation step for supervised learning; (2) the CNN + LSTM structure is designed for automatic detection of candidate frames from CAG sequences to improve training efficiency and reduce overfitting; (3) a multi-view analyzing architecture is established to train CNNs for different angle-views and generate classification results in artery-level and patient-level; (4) the redundancy training strategy is proposed to eliminate the negative effect of background and unnecessary features in training; and (5) the unsupervised- and supervised-learning methods are explored to localize the coronary stenoses in CAG images, which includes an anchor-based feature pyramid network (FPN).

Materials and methods

Study population

This research was retrospectively performed on 230 participants with available data from a ‘‘Combined Non-invasive Coronary Angiography and Myocardial Perfusion Imaging Using 320 Detector Computed Tomography (CORE320)’’ study (NCT00934037),¹ a prospective, multicenter, international study that assessed the performance of combined 320-row CTA and myocardial CT perfusion imaging (CTP) in comparison with the combination of invasive CAG and single-photon emission computed tomography myocardial perfusion imaging (SPECT-MPI) for detecting myocardial perfusion defects and luminal stenosis in patients with suspected CAD (25, 26). For the stenosis classification, 36 studies out of 230 were excluded from the training due to the low image quality or contrasting condition. These images, however, were included for evaluation. The original CORE320 study was approved by central and local institutional review boards, and written informed consent was obtained from all participants (25, 26). Given the retrospective and ancillary nature of the data, the current study is covered by the original CORE320 study IRB.

Candidate frame selection

The entire study workflow is summarized in Figure 1. All the CAG studies were saved in the universal DICOM format with a resolution of 512 × 512, 15 fps, typically 60–200 frames per view. The detailed imaging parameters were summarized in Supplementary Table 1. Coronary type (left and right coronary artery, LCA, and RCA) was classified initially by experts in a small subset (19 patients). This was then leveraged by training an inception-V3 classifier (27) for automated coronary selection (100% classification accuracy was obtained). To identify the angle views of the CAG images, DICOM tags were used. Overall 4 angles for LCA [left anterior oblique (LAO) Cranial, LAO Caudal, right anterior oblique (RAO) Cranial, and RAO Caudal] and 3 angles for RCA (LAO, straight RAO, and shallow LAO/RAO Cranial) were used based on the optimal view map (OVM) (20).

FIGURE 1

Figure 1. Dataset and algorithm workflow. Three steps of data preparation, stenosis classification, and stenosis positioning were presented. The steps of image and training label preparation including coronary artery selection, viewing angle selection, and contrasting frame detection were designed in a fully automatic manner. Stenosis severity classification training was performed on image-level, artery-level, and patient-level. Stenosis positioning was performed in two methods of CAM-based and anchor-based methods. CAM, class activation map; QCA, quantitative coronary angiography.

A CNN + Long Short Term Memory (LSTM) network was implemented for the candidate frame selection from 19 patients (146 videos in total, and 18,688 frames overall). A candidate frame was defined as an image with good quality, full contrasting, clear vessel border, and anatomical significance of stenosis (if it had stenosis) in a video frame. Inception-v3 was employed as a basic classifier to recognize full-contrasting frames and non-contrasting frames as candidates or redundancy frames. Then, the fully connection layer of inception-v3 was output to a bi-directional LSTM with 32 time-steps (units), and also concatenated with the output of forward and backward LSTM units. The concatenation result was connected with a multi-layer perception (MLP, with one hidden layer) and a binary activation layer (sigmoid). The detailed structure of inception-v3 and LSTM is provided in Supplementary Figure 1. The inception model was initialized by ImageNet weights and then pre-trained for 200 epochs with the initial learning rate (LR) of 1e⁻⁴ with the loss function as binary entropy. The LSTM was initialized using Xavier uniform method for kernels and orthogonal matrix for recurrent weights, then trained for 100 epochs with LR = 4e^–5 with the loss function of convolutional F1 score. Typically, this strategy selected 5–10 candidate frames per video.

The performance of candidate frame detection was tested with 582 videos from 175 patients using mean error and standard deviations of beginning contrasting frame (BCF) and ending contrasting frame (ECF) between ground-truth and prediction. The acceptance and error rates were also calculated with average differences of BCF and ECF in a pre-defined range (2), in which accept rate with the error ≤3 frames and error rate with the error ≥10 frames. Performance was reported using classification accuracy, F1, and Kappa.

Stenosis classification

For the stenosis classification, the quantitative coronary angiography (QCA) results previously documented per segmental level in the CORE320 study were utilized as a reference (25, 26, 28). In the current study, in order to accommodate with our study goals (separating cases with normal coronary arteries or mild stenoses from that with higher stenosis severities), coronary stenosis severities were re-categorized into the per-coronary artery, i.e., per LCA or RCA, and grouped into three categories of < 25%, 25–99%, and total occlusion in 3-categories (CAT), or two groups of < 25 and ≥ 25% in 2-CAT. It is known that there is a mismatch between the coronary stenosis severity and functional significance. Even the intermediate stenosis lesion can present functionally significant stenosis by fractional flow reserve (29, 30). Since we aimed to develop a safety screening tool for a large volume of the CAG images, we selected a stenosis threshold with high specificity to correctly separate cases that does and does not need further functional stenosis assessment.

Different CNN architectures of ResNet-50, ResNet-101, Inception-v3 and InceptionResNet-v2 were employed for the image-level stenosis classification training and prediction. And the inception-v3 was employed finally in image-level, artery- and patient-level stenosis prediction, since it has a good balance in transfer timing, parameter size and performance. The training was performed on 4 models of LCA for each angle view and one model of RCA combining the three angle views due to the complicated features of LCA when compared to the RCA (31).

The classification prediction of artery-level and patient-level was implemented by a multi-view analyzing architecture, as described in Figure 2. For artery-level prediction, CNN scores from 4, or, 3 angle-views were combined and fed into a max pooling layer to generate LCA/RCA classification results, respectively. Similarly, the patient-level prediction scores were generated by feeding LCA and RCA scores into another max pooling layer (Figure 2). For the image-level labeling, 2 or 3-CAT stenosis categories were assigned in each angle view. For the artery-level labeling, 2-CAT stenosis categories were assigned in each coronary artery, i.e., in the LCA and the RCA. Overall 10,872 frames from 194 studies were used for image-level stenosis classification training and 13,744 frames from 230 studies were used for the fourfold cross-validation. The distribution of the cases in the image-, artery-, and patient-levels are summarized in Table 1. Performance of image-level classification on 3- CAT and 2- CAT with and without redundancy training was reported using accuracy, sensitivity, F1, Kappa, and area under the curve (AUC). Performance of artery-level and per-patient level classification was assessed on the 2-CAT with redundancy training image-level results and reported using accuracy, sensitivity, and AUC.

FIGURE 2

Figure 2. The architecture of the output of the stenosis classification inception model. A max-pooling layer was added to the output of inception to evaluate the artery-level stenosis prediction and the patient-level stenosis prediction. LCA, left coronary artery; LAO, left anterior oblique; RAO, right anterior oblique; RCA, right coronary artery; QCA, quantitative coronary angiography.

TABLE 1

Table 1. Distribution of the cases in each stenosis severity category used for the image view, artery, and patient-levels validation.

Redundancy training

In the image-level stenosis classification training, the redundancy frames were accessorily added to the training dataset but not in the validation set. A redundancy frame was defined as a background frame without any contrasting agent in arteries. Thereafter, the redundancy categories were comprised of background frames with the roughly same amount of samples as the target categories in training dataset. Subsequently, there are 12,351 redundancy frames combined with 10,872 candidate frames in 3- and 2- CAT image-level training, namely redundancy training, as the similar methods used before (3, 32). It is expected that the use of redundancy frames can hedge against the invalid feature learning and reduce the train/test overfitting.

Stenosis localization

For the stenosis positioning, two methods were investigated: (1) class activation map (CAM) (33) based on the back-propagation from the stenosis classification decision and (2) anchor-based FPN. The anchor-based FPN model is developed from RetinaNet (34) and FPNs (35), using the pre-trained inception-V3 as backbone. The network structure is demonstrated as Supplementary Figure 2. The 1st, 2nd, and 3rd feature map in the pyramid were derived from the output of the concatenate feature before the 1st, 2nd, and 3rd pooling layer, respectively. The 4th and 5th feature maps were down sampled from the previous layers. For FPN inputs, 1,588 positioning boxes with a minimal size of 35 × 35 pixels were annotated by two independent expert cardiologists. The shapes of anchor were preset by K-Means clustering method with seven different groups of height and width. The anchor-based model was trained with Learning Rate = 1e^–4 over 500 epochs. Then FPN was built on the feature maps of pre-trained classification models. The same reader-annotated bounding boxes were also used for the evaluation of the CAM-based localization technique. For the positioning training and fourfold evaluation, 690 frames with > 25% stenosis were used (Figure 1).

The performances of the two stenosis localization methods were assessed by the metrics of global-sensitivity, per-stenosis-sensitivity (Sens_s), per-stenosis-specificity (Spec_s), and mean square error (MSE). Global-sensitivity was defined as the recall of localization for the most significant stenosis in the images, which is similar to AR^∧(max = 1) in COCO benchmark (21). Sens_s and Spec_s were defined as the recall rate of all stenosis localizations in the images. MSE was assessed in 512 × 512 images for the CAM-based model and the anchor-based models. Due to the lower resolution, metrics for the CAM-based model were calculated with Intersection over Union (IoU) > 0.2 in the CAM-based model whereas IoU > 0.5 for the anchor-based model.

Statistical analysis

All the statistical evaluation was performed in Python (version 3.6; Python Software Foundation, Wilmington, Del).² In this study, the calculation for diagnostic performance was based on a per-patient approach, including image-level severity classification. Accuracy, f1-score, and Cohen’s Kappa were calculated for image-level stenosis classification; receiver operating characteristic (ROC) analysis and areas under the curves (AUC) were used to further evaluate the image-/artery-/patient-level diagnostic performance. Stenosis positioning was evaluated by sensitivity, specificity, and MSE as described above. The CNN, LSTM, CAM, and anchor-based models were performed on TensorFlow (version 2.4.0), Python (version 3.6), and the Ubuntu system (version 20.04). All metrics were computed using Scikit-learn, version 0.19.1. Continuous variables that were normally distributed were summarized and reported as means ± standard deviations.

Results

Patient characteristics

The study participants’ characteristics are given in Table 2. A total of 230 individuals were included in our analysis. The median age was 62 years (IQR 55, 69), 70% were men, 45% were white, 82% had hypertension, 71% had dyslipidemia, 16% were current smokers, and 27% had a high pretest probability of obstructive CAD.

TABLE 2

Table 2. Clinical characteristics of the study participants.