AUTHOR=Feng Huichun , Wang Hui , Xu Lixia , Ren Yao , Ni Qianxi , Yang Zhen , Ma Shenglin , Deng Qinghua , Chen Xueqin , Xia Bing , Kuang Yu , Li Xiadong TITLE=Prediction of radiation-induced acute skin toxicity in breast cancer patients using data encapsulation screening and dose-gradient-based multi-region radiomics technique: A multicenter study JOURNAL=Frontiers in Oncology VOLUME=12 YEAR=2022 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.1017435 DOI=10.3389/fonc.2022.1017435 ISSN=2234-943X ABSTRACT=Purpose

Radiation-induced dermatitis is one of the most common side effects for breast cancer patients treated with radiation therapy (RT). Acute complications can have a considerable impact on tumor control and quality of life for breast cancer patients. In this study, we aimed to develop a novel quantitative high-accuracy machine learning tool for prediction of radiation-induced dermatitis (grade ≥ 2) (RD 2+) before RT by using data encapsulation screening and multi-region dose-gradient-based radiomics techniques, based on the pre-treatment planning computed tomography (CT) images, clinical and dosimetric information of breast cancer patients.

Methods and Materials

214 patients with breast cancer who underwent RT between 2018 and 2021 were retrospectively collected from 3 cancer centers in China. The CT images, as well as the clinical and dosimetric information of patients were retrieved from the medical records. 3 PTV dose related ROIs, including irradiation volume covered by 100%, 105%, and 108% of prescribed dose, combined with 3 skin dose-related ROIs, including irradiation volume covered by 20-Gy, 30-Gy, 40-Gy isodose lines within skin, were contoured for radiomics feature extraction. A total of 4280 radiomics features were extracted from all 6 ROIs. Meanwhile, 29 clinical and dosimetric characteristics were included in the data analysis. A data encapsulation screening algorithm was applied for data cleaning. Multiple-variable logistic regression and 5-fold-cross-validation gradient boosting decision tree (GBDT) were employed for modeling training and validation, which was evaluated by using receiver operating characteristic analysis.

Results

The best predictors for symptomatic RD 2+ were the combination of 20 radiomics features, 8 clinical and dosimetric variables, achieving an area under the curve (AUC) of 0.998 [95% CI: 0.996-1.0] and an AUC of 0.911 [95% CI: 0.838-0.983] in the training and validation dataset, respectively, in the 5-fold-cross-validation GBDT model. Meanwhile, the top 12 most important characteristics as well as their corresponding importance measures for RD 2+ prediction in the GBDT machine learning process were identified and calculated.

Conclusions

A novel multi-region dose-gradient-based GBDT machine learning framework with a random forest based data encapsulation screening method integrated can achieve a high-accuracy prediction of acute RD 2+ in breast cancer patients.