AUTHOR=Wang Dongdong , Wu Zhenhua , Fan Guoxin , Liu Huaqing , Liao Xiang , Chen Yanxi , Zhang Hailong TITLE=Accuracy and reliability analysis of a machine learning based segmentation tool for intertrochanteric femoral fracture CT JOURNAL=Frontiers in Surgery VOLUME=9 YEAR=2022 URL=https://www.frontiersin.org/journals/surgery/articles/10.3389/fsurg.2022.913385 DOI=10.3389/fsurg.2022.913385 ISSN=2296-875X ABSTRACT=Introduction

Three-dimensional (3D) reconstruction of fracture fragments on hip Computed tomography (CT) may benefit the injury detail evaluation and preoperative planning of the intertrochanteric femoral fracture (IFF). Manually segmentation of bony structures was tedious and time-consuming. The purpose of this study was to propose an artificial intelligence (AI) segmentation tool to achieve semantic segmentation and precise reconstruction of fracture fragments of IFF on hip CTs.

Materials and Methods

A total of 50 labeled CT cases were manually segmented with Slicer 4.11.0. The ratio of training, validation and testing of the 50 labeled dataset was 33:10:7. A simplified V-Net architecture was adopted to build the AI tool named as IFFCT for automatic segmentation of fracture fragments. The Dice score, precision and sensitivity were computed to assess the segmentation performance of IFFCT. The 2D masks of 80 unlabeled CTs segmented by AI tool and human was further assessed to validate the segmentation accuracy. The femoral head diameter (FHD) was measured on 3D models to validate the reliability of 3D reconstruction.

Results

The average Dice score of IFFCT in the local test dataset for “proximal femur”, “fragment” and “distal femur” were 91.62%, 80.42% and 87.05%, respectively. IFFCT showed similar segmentation performance in cross-dataset, and was comparable to that of human expert in human-computer competition with significantly reduced segmentation time (p < 0.01). Significant differences were observed between 2D masks generated from semantic segmentation and conventional threshold-based segmentation (p < 0.01). The average FHD in the automatic segmentation group was 47.5 ± 4.1 mm (41.29∼56.59 mm), and the average FHD in the manual segmentation group was 45.9 ± 6.1 mm (40.34∼64.93 mm). The mean absolute error of FHDs in the two groups were 3.38 mm and 3.52 mm, respectively. No significant differences of FHD measurements were observed between the two groups (p > 0.05). All ICCs were greater than 0.8.

Conclusion

The proposed AI segmentation tool could effectively segment the bony structures from IFF CTs with comparable performance of human experts. The 2D masks and 3D models generated from automatic segmentation were effective and reliable, which could benefit the injury detail evaluation and preoperative planning of IFFs.