AUTHOR=Ye Pengyu , Li Sihe , Wang Zhongzheng , Tian Siyu , Luo Yi , Wu Zhanyong , Zhuang Yan , Zhang Yingze , Grzegorzek Marcin , Hou Zhiyong
TITLE=Development and validation of a deep learning-based model to distinguish acetabular fractures on pelvic anteroposterior radiographs
JOURNAL=Frontiers in Physiology
VOLUME=14
YEAR=2023
URL=https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2023.1146910
DOI=10.3389/fphys.2023.1146910
ISSN=1664-042X
ABSTRACT=
Objective: To develop and test a deep learning (DL) model to distinguish acetabular fractures (AFs) on pelvic anteroposterior radiographs (PARs) and compare its performance to that of clinicians.
Materials and methods: A total of 1,120 patients from a big level-I trauma center were enrolled and allocated at a 3:1 ratio for the DL model’s development and internal test. Another 86 patients from two independent hospitals were collected for external validation. A DL model for identifying AFs was constructed based on DenseNet. AFs were classified into types A, B, and C according to the three-column classification theory. Ten clinicians were recruited for AF detection. A potential misdiagnosed case (PMC) was defined based on clinicians’ detection results. The detection performance of the clinicians and DL model were evaluated and compared. The detection performance of different subtypes using DL was assessed using the area under the receiver operating characteristic curve (AUC).
Results: The means of 10 clinicians’ sensitivity, specificity, and accuracy to identify AFs were 0.750/0.735, 0.909/0.909, and 0.829/0.822, in the internal test/external validation set, respectively. The sensitivity, specificity, and accuracy of the DL detection model were 0.926/0.872, 0.978/0.988, and 0.952/0.930, respectively. The DL model identified type A fractures with an AUC of 0.963 [95% confidence interval (CI): 0.927–0.985]/0.950 (95% CI: 0.867–0.989); type B fractures with an AUC of 0.991 (95% CI: 0.967–0.999)/0.989 (95% CI: 0.930–1.000); and type C fractures with an AUC of 1.000 (95% CI: 0.975–1.000)/1.000 (95% CI: 0.897–1.000) in the test/validation set. The DL model correctly recognized 56.5% (26/46) of PMCs.
Conclusion: A DL model for distinguishing AFs on PARs is feasible. In this study, the DL model achieved a diagnostic performance comparable to or even superior to that of clinicians.