To compare the mammographic malignant architectural distortion (AD) detection performance of radiologists who read mammographic examinations unaided versus those who read these examinations with the support of artificial intelligence (AI) systems.
This retrospective case-control study was based on a double-reading of clinical mammograms between January 2011 and December 2016 at a large tertiary academic medical center. The study included 177 malignant and 90 benign architectural distortion (AD) patients. The model was built based on the ResNeXt-50 network. Algorithms used deep learning convolutional neural networks, feature classifiers, image analysis algorithms to depict AD and output a score that translated to malignant. The accuracy for malignant AD detection was evaluated using area under the curve (AUC).
The overall AUC was 0.733 (95% CI, 0.673-0.792) for Reader First-1, 0.652 (95% CI, 0.586-0.717) for Reader First-2, and 0.655 (95% CI, 0.590-0.719) for Reader First-3. and the overall AUCs for Reader Second-1, 2, 3 were 0.875 (95% CI, 0.830-0.919), 0.882 (95% CI, 0.839-0.926), 0.884 (95% CI, 0.841-0.927),respectively. The AUCs for all the reader-second radiologists were significantly higher than those for all the reader-first radiologists (Reader First-1 vs. Reader Second-1, P= 0.004). The overall AUC was 0.792 (95% CI, 0.660-0.925) for AI algorithms. The combination assessment of AI algorithms and Reader First-1 achieved an AUC of 0.880 (95% CI, 0.793-0.968), increased than the Reader First-1 alone and AI algorithms alone. AI algorithms alone achieved a specificity of 61.1% and a sensitivity of 80.6%. The specificity for Reader First-1 was 55.5%, and the sensitivity was 86.1%. The results of the combined assessment of AI and Reader First-1 showed a specificity of 72.7% and sensitivity of 91.7%. The performance showed significant improvements compared with AI alone (p<0.001) as well as the reader first-1 alone (p=0.006).
While the single AI algorithm did not outperform radiologists, an ensemble of AI algorithms combined with junior radiologist assessments were found to improve the overall accuracy. This study underscores the potential of using machine learning methods to enhance mammography interpretation, especially in remote areas and primary hospitals.