The aim of this study was to propose and evaluate a novel three-dimensional (3D) V-Net and two-dimensional (2D) U-Net mixed (VUMix-Net) architecture for a fully automatic and accurate gross tumor volume (GTV) in esophageal cancer (EC)–delineated contours.
We collected the computed tomography (CT) scans of 215 EC patients. 3D V-Net, 2D U-Net, and VUMix-Net were developed and further applied simultaneously to delineate GTVs. The Dice similarity coefficient (DSC) and 95th-percentile Hausdorff distance (95HD) were used as quantitative metrics to evaluate the performance of the three models in ECs from different segments. The CT data of 20 patients were randomly selected as the ground truth (GT) masks, and the corresponding delineation results were generated by artificial intelligence (AI). Score differences between the two groups (GT versus AI) and the evaluation consistency were compared.
In all patients, there was a significant difference in the 2D DSCs from U-Net, V-Net, and VUMix-Net (p=0.01). In addition, VUMix-Net showed achieved better 3D-DSC and 95HD values. There was a significant difference among the 3D-DSC (mean ± STD) and 95HD values for upper-, middle-, and lower-segment EC (p<0.001), and the middle EC values were the best. In middle-segment EC, VUMix-Net achieved the highest 2D-DSC values (p<0.001) and lowest 95HD values (p=0.044).
The new model (VUMix-Net) showed certain advantages in delineating the GTVs of EC. Additionally, it can generate the GTVs of EC that meet clinical requirements and have the same quality as human-generated contours. The system demonstrated the best performance for the ECs of the middle segment.