AUTHOR=Sun Kailun , Fan Chanyuan , Feng Zhaoyan , Min Xiangde , Wang Yu , Sun Ziyan , Li Yan , Cai Wei , Yin Xi , Zhang Peipei , Liu Qiuyu , Xia Liming TITLE=Magnetic resonance imaging based deep-learning model: a rapid, high-performance, automated tool for testicular volume measurements JOURNAL=Frontiers in Medicine VOLUME=10 YEAR=2023 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2023.1277535 DOI=10.3389/fmed.2023.1277535 ISSN=2296-858X ABSTRACT=Background

Testicular volume (TV) is an essential parameter for monitoring testicular functions and pathologies. Nevertheless, current measurement tools, including orchidometers and ultrasonography, encounter challenges in obtaining accurate and personalized TV measurements.

Purpose

Based on magnetic resonance imaging (MRI), this study aimed to establish a deep learning model and evaluate its efficacy in segmenting the testes and measuring TV.

Materials and methods

The study cohort consisted of retrospectively collected patient data (N = 200) and a prospectively collected dataset comprising 10 healthy volunteers. The retrospective dataset was divided into training and independent validation sets, with an 8:2 random distribution. Each of the 10 healthy volunteers underwent 5 scans (forming the testing dataset) to evaluate the measurement reproducibility. A ResUNet algorithm was applied to segment the testes. Volume of each testis was calculated by multiplying the voxel volume by the number of voxels. Manually determined masks by experts were used as ground truth to assess the performance of the deep learning model.

Results

The deep learning model achieved a mean Dice score of 0.926 ± 0.034 (0.921 ± 0.026 for the left testis and 0.926 ± 0.034 for the right testis) in the validation cohort and a mean Dice score of 0.922 ± 0.02 (0.931 ± 0.019 for the left testis and 0.932 ± 0.022 for the right testis) in the testing cohort. There was strong correlation between the manual and automated TV (R2 ranging from 0.974 to 0.987 in the validation cohort; R2 ranging from 0.936 to 0.973 in the testing cohort). The volume differences between the manual and automated measurements were 0.838 ± 0.991 (0.209 ± 0.665 for LTV and 0.630 ± 0.728 for RTV) in the validation cohort and 0.815 ± 0.824 (0.303 ± 0.664 for LTV and 0.511 ± 0.444 for RTV) in the testing cohort. Additionally, the deep-learning model exhibited excellent reproducibility (intraclass correlation >0.9) in determining TV.

Conclusion

The MRI-based deep learning model is an accurate and reliable tool for measuring TV.