AUTHOR=Ye Xianghua , Guo Dazhou , Tseng Chen-Kan , Ge Jia , Hung Tsung-Min , Pai Ping-Ching , Ren Yanping , Zheng Lu , Zhu Xinli , Peng Ling , Chen Ying , Chen Xiaohua , Chou Chen-Yu , Chen Danni , Yu Jiaze , Chen Yuzhen , Jiao Feiran , Xin Yi , Huang Lingyun , Xie Guotong , Xiao Jing , Lu Le , Yan Senxiang , Jin Dakai , Ho Tsung-Ying TITLE=Multi-Institutional Validation of Two-Streamed Deep Learning Method for Automated Delineation of Esophageal Gross Tumor Volume Using Planning CT and FDG-PET/CT JOURNAL=Frontiers in Oncology VOLUME=11 YEAR=2022 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2021.785788 DOI=10.3389/fonc.2021.785788 ISSN=2234-943X ABSTRACT=Background

The current clinical workflow for esophageal gross tumor volume (GTV) contouring relies on manual delineation with high labor costs and inter-user variability.

Purpose

To validate the clinical applicability of a deep learning multimodality esophageal GTV contouring model, developed at one institution whereas tested at multiple institutions.

Materials and Methods

We collected 606 patients with esophageal cancer retrospectively from four institutions. Among them, 252 patients from institution 1 contained both a treatment planning CT (pCT) and a pair of diagnostic FDG-PET/CT; 354 patients from three other institutions had only pCT scans under different staging protocols or lacking PET scanners. A two-streamed deep learning model for GTV segmentation was developed using pCT and PET/CT scans of a subset (148 patients) from institution 1. This built model had the flexibility of segmenting GTVs via only pCT or pCT+PET/CT combined when available. For independent evaluation, the remaining 104 patients from institution 1 behaved as an unseen internal testing, and 354 patients from the other three institutions were used for external testing. Degrees of manual revision were further evaluated by human experts to assess the contour-editing effort. Furthermore, the deep model’s performance was compared against four radiation oncologists in a multi-user study using 20 randomly chosen external patients. Contouring accuracy and time were recorded for the pre- and post-deep learning-assisted delineation process.