AUTHOR=Peng Yiru , Liu Yaoying , Chen Zhaocai , Zhang Gaolong , Ma Changsheng , Xu Shouping , Yin Yong 

TITLE=Accuracy Improvement Method Based on Characteristic Database Classification for IMRT Dose Prediction in Cervical Cancer: Scientifically Training Data Selection

JOURNAL=Frontiers in Oncology

VOLUME=Volume 12 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.808580

DOI=10.3389/fonc.2022.808580

ISSN=2234-943X

ABSTRACT=Purpose: 
Consistent training and testing datasets can lead to good performance for deep learning (DL) models. However, a large-high-quality training dataset for unusual clinical scenarios is usually not easy to collect. The work aims to find optimal training data collection strategies for DL-based dose prediction models. 
Material and methods: 
A total of 325 clinically approved cervical IMRT plans were utilized. We designed comparison experiments to investigate the impact of 1) Beam angles. 2) The number of beams. 3) Patient position for DL dose prediction models. In addition, a novel geometry-based beam mask generation method was proposed to provide beam setting information in the model training process. What is more, we proposed a new training strategy named ‘full-database pre-trained strategy.’ 
Results: 
The model trained with a homogeneous dataset with the same beam settings achieved the best performance (mean prediction errors of PTV, bladder and rectum: 0.29±0.15%, 3.1±2.55% and 3.15±1.69%) compared with that trained with large mixed beam settings plans (mean errors of PTV, bladder and rectum: 0.8±0.14%, 5.03±2.2% and 4.45±1.4%). A homogeneous dataset is more accessible to train an accurate dose prediction model (mean errors of PTV, bladder and rectum: 2.2±0.15%, 5±2.1% and 3.23±1.53%) than a non-homogeneous one (mean errors of PTV, bladder and rectum: 2.55±0.12%, 6.33±2.46% and 4.76±2.91%) without other processing approaches. The added beam mask can constantly improve the model performance, especially for datasets with different beam settings (mean errors of PTV, bladder and rectum improved from 0.8±0.14%, 5.03±2.2% and 4.45±1.4% to 0.29±0.15%, 3.1±2.55% and 3.15±1.69%). 
Conclusions: 
A consistent dataset is recommended to form a patient-specific IMRT dose prediction model. When a consistent dataset is not accessible to collect, a large dataset with different beam angles and a training model with beam information can also get a relatively good model. The full-database pre-trained strategies can rapidly form an accuracy model from a pre-trained model. The proposed beam mask can effectively improve the model performance. Our study may be helpful for further dose prediction studies for training the database establishment.