AUTHOR=Wang Xing-Rui , Ma Xi , Jin Liu-Xu , Gao Yan-Jun , Xue Yong-Jie , Li Jing-Long , Bai Wei-Xian , Han Miao-Fei , Zhou Qing , Shi Feng , Wang Jing TITLE=Application value of a deep learning method based on a 3D V-Net convolutional neural network in the recognition and segmentation of the auditory ossicles JOURNAL=Frontiers in Neuroinformatics VOLUME=16 YEAR=2022 URL=https://www.frontiersin.org/journals/neuroinformatics/articles/10.3389/fninf.2022.937891 DOI=10.3389/fninf.2022.937891 ISSN=1662-5196 ABSTRACT=Objective

To explore the feasibility of a deep learning three-dimensional (3D) V-Net convolutional neural network to construct high-resolution computed tomography (HRCT)-based auditory ossicle structure recognition and segmentation models.

Methods

The temporal bone HRCT images of 158 patients were collected retrospectively, and the malleus, incus, and stapes were manually segmented. The 3D V-Net and U-Net convolutional neural networks were selected as the deep learning methods for segmenting the auditory ossicles. The temporal bone images were randomized into a training set (126 cases), a test set (16 cases), and a validation set (16 cases). Taking the results of manual segmentation as a control, the segmentation results of each model were compared.

Results

The Dice similarity coefficients (DSCs) of the malleus, incus, and stapes, which were automatically segmented with a 3D V-Net convolutional neural network and manually segmented from the HRCT images, were 0.920 ± 0.014, 0.925 ± 0.014, and 0.835 ± 0.035, respectively. The average surface distance (ASD) was 0.257 ± 0.054, 0.236 ± 0.047, and 0.258 ± 0.077, respectively. The Hausdorff distance (HD) 95 was 1.016 ± 0.080, 1.000 ± 0.000, and 1.027 ± 0.102, respectively. The DSCs of the malleus, incus, and stapes, which were automatically segmented using the 3D U-Net convolutional neural network and manually segmented from the HRCT images, were 0.876 ± 0.025, 0.889 ± 0.023, and 0.758 ± 0.044, respectively. The ASD was 0.439 ± 0.208, 0.361 ± 0.077, and 0.433 ± 0.108, respectively. The HD 95 was 1.361 ± 0.872, 1.174 ± 0.350, and 1.455 ± 0.618, respectively. As these results demonstrated, there was a statistically significant difference between the two groups (P < 0.001).

Conclusion

The 3D V-Net convolutional neural network yielded automatic recognition and segmentation of the auditory ossicles and produced similar accuracy to manual segmentation results.