AUTHOR=Hallinan James Thomas Patrick Decourcy , Zhu Lei , Zhang Wenqiao , Ge Shuliang , Muhamat Nor Faimee Erwan , Ong Han Yang , Eide Sterling Ellis , Cheng Amanda J. L. , Kuah Tricia , Lim Desmond Shi Wei , Low Xi Zhen , Yeong Kuan Yuen , AlMuhaish Mona I. , Alsooreti Ahmed Mohamed , Kumarakulasinghe Nesaretnam Barr , Teo Ee Chin , Yap Qai Ven , Chan Yiong Huak , Lin Shuxun , Tan Jiong Hao , Kumar Naresh , Vellayappan Balamurugan A. , Ooi Beng Chin , Quek Swee Tian , Makmur Andrew TITLE=Deep learning assessment compared to radiologist reporting for metastatic spinal cord compression on CT JOURNAL=Frontiers in Oncology VOLUME=13 YEAR=2023 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2023.1151073 DOI=10.3389/fonc.2023.1151073 ISSN=2234-943X ABSTRACT=Introduction

Metastatic spinal cord compression (MSCC) is a disastrous complication of advanced malignancy. A deep learning (DL) algorithm for MSCC classification on CT could expedite timely diagnosis. In this study, we externally test a DL algorithm for MSCC classification on CT and compare with radiologist assessment.

Methods

Retrospective collection of CT and corresponding MRI from patients with suspected MSCC was conducted from September 2007 to September 2020. Exclusion criteria were scans with instrumentation, no intravenous contrast, motion artefacts and non-thoracic coverage. Internal CT dataset split was 84% for training/validation and 16% for testing. An external test set was also utilised. Internal training/validation sets were labelled by radiologists with spine imaging specialization (6 and 11-years post-board certification) and were used to further develop a DL algorithm for MSCC classification. The spine imaging specialist (11-years expertise) labelled the test sets (reference standard). For evaluation of DL algorithm performance, internal and external test data were independently reviewed by four radiologists: two spine specialists (Rad1 and Rad2, 7 and 5-years post-board certification, respectively) and two oncological imaging specialists (Rad3 and Rad4, 3 and 5-years post-board certification, respectively). DL model performance was also compared against the CT report issued by the radiologist in a real clinical setting. Inter-rater agreement (Gwet’s kappa) and sensitivity/specificity/AUCs were calculated.

Results

Overall, 420 CT scans were evaluated (225 patients, mean age=60 ± 11.9[SD]); 354(84%) CTs for training/validation and 66(16%) CTs for internal testing. The DL algorithm showed high inter-rater agreement for three-class MSCC grading with kappas of 0.872 (p<0.001) and 0.844 (p<0.001) on internal and external testing, respectively. On internal testing DL algorithm inter-rater agreement (κ=0.872) was superior to Rad 2 (κ=0.795) and Rad 3 (κ=0.724) (both p<0.001). DL algorithm kappa of 0.844 on external testing was superior to Rad 3 (κ=0.721) (p<0.001). CT report classification of high-grade MSCC disease was poor with only slight inter-rater agreement (κ=0.027) and low sensitivity (44.0), relative to the DL algorithm with almost-perfect inter-rater agreement (κ=0.813) and high sensitivity (94.0) (p<0.001).

Conclusion

Deep learning algorithm for metastatic spinal cord compression on CT showed superior performance to the CT report issued by experienced radiologists and could aid earlier diagnosis.