AUTHOR=Liu Zhou , Li Li , Li Tianran , Luo Douqiang , Wang Xiaoliang , Luo Dehong TITLE=Does a Deep Learning–Based Computer-Assisted Diagnosis System Outperform Conventional Double Reading by Radiologists in Distinguishing Benign and Malignant Lung Nodules? JOURNAL=Frontiers in Oncology VOLUME=10 YEAR=2020 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2020.545862 DOI=10.3389/fonc.2020.545862 ISSN=2234-943X ABSTRACT=Background

In differentiating indeterminate pulmonary nodules, multiple studies indicated the superiority of deep learning–based computer-assisted diagnosis system (DL-CADx) over conventional double reading by radiologists, which needs external validation. Therefore, our aim was to externally validate the performance of a commercial DL-CADx in differentiating benign and malignant lung nodules.

Methods

In this retrospective study, 233 patients with 261 pathologically confirmed lung nodules were enrolled. Double reading was used to rate each nodule using a four-scale malignancy score system, including unlikely (0–25%), malignancy cannot be completely excluded (25–50%), highly likely (50–75%), and considered as malignant (75–100%), with any disagreement resolved through discussion. DL-CADx automatically rated each nodule with a malignancy likelihood ranging from 0 to 100%, which was then quadrichotomized accordingly. Intraclass correlation coefficient (ICC) was used to evaluate the agreement in malignancy risk rating between DL-CADx and double reading, with ICC value of <0.5, 0.5 to 0.75, 0.75 to 0.9, and >0.9 indicating poor, moderate, good, and perfect agreement, respectively. With malignancy likelihood >50% as cut-off value for malignancy and pathological results as gold standard, sensitivity, specificity, and accuracy were calculated for double reading and DL-CADx, separately.

Results

Among the 261 nodules, 247 nodules were successfully detected by DL-CADx with detection rate of 94.7%. Regarding malignancy rating, DL-CADx was in moderate agreement with double reading (ICC = 0.555, 95% CI 0.424 to 0.655). DL-CADx misdiagnosed 40 true malignant nodules as benign nodules and 30 true benign nodules as malignant nodules with sensitivity, specificity, and accuracy of 79.2, 45.5, and 71.7%, respectively. In contrast, double reading achieved better performance with 16 true malignant nodules misdiagnosed as benign nodules and 26 true benign nodules as malignant nodules with sensitivity, specificity, and accuracy of 91.7, 52.7, and 83.0%, respectively.

Conclusion

Compared with double reading, DL-CADx we used still shows inferior performance in differentiating malignant and benign nodules.