AUTHOR=Xu Fan , Xiong Yuchao , Ye Guoxi , Liang Yingying , Guo Wei , Deng Qiuping , Wu Li , Jia Wuyi , Wu Dilang , Chen Song , Liang Zhiping , Zeng Xuwen 

TITLE=Deep learning-based artificial intelligence model for classification of vertebral compression fractures: A multicenter diagnostic study

JOURNAL=Frontiers in Endocrinology

VOLUME=Volume 14 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/endocrinology/articles/10.3389/fendo.2023.1025749

DOI=10.3389/fendo.2023.1025749

ISSN=1664-2392

ABSTRACT=Objective: To develop and validate an artificial intelligence diagnostic system based on X-ray imaging data for diagnosing VCFs.
Methods: In total, 1904 patients who underwent X-ray at four independent hospitals were retrospectively (n=1847) and prospectively (n=57) enrolled. The participants were separated into a development cohort, a prospective test cohort and three external test cohorts. The proposed model used a transfer learning method based on the ResNet-18 architecture. The diagnostic performance of the model was evaluated using receiver operating characteristic curve analysis and validated using a prospective validation set and three external sets. The performance of the model was compared with three degrees of musculoskeletal expertise: expert, competent, and trainee. 
Results: The diagnostic accuracy for identifying compression fractures was 0.850 in the testing set, 0.829 in the prospective set, and ranged from 0.757 to 0.832 in the three external validation sets. In the human and deep learning collaboration dataset, the overall accuracy of the deep learning model was 0.764, which was significantly higher than that of the trainee radiologist (0.707), similar to the competent radiologist (0.769), and slightly lower than the expert radiologist (0.782). When combined with the deep learning model, the accuracies and sensitivities of expert, competent, and trainee radiologists were significantly improved (accuracy: 0.853 vs. 0.782, 0.816 vs. 0.769, and 0.778 vs. 0.707; sensitivity: 0.776 vs. 0.673, 0.727 vs. 0.653, and 0.667 vs. 0.560).
Conclusions: Our study offers a high-accuracy multi-class deep learning model which could assist community-based hospitals in improving the diagnostic accuracy of VCFs.