AUTHOR=Teng Xinzhi , Zhang Jiang , Ma Zongrui , Zhang Yuanpeng , Lam Saikit , Li Wen , Xiao Haonan , Li Tian , Li Bing , Zhou Ta , Ren Ge , Lee Francis Kar-ho , Au Kwok-hung , Lee Victor Ho-fun , Chang Amy Tien Yee , Cai Jing 

TITLE=Improving radiomic model reliability using robust features from perturbations for head-and-neck carcinoma

JOURNAL=Frontiers in Oncology

VOLUME=Volume 12 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.974467

DOI=10.3389/fonc.2022.974467

ISSN=2234-943X

ABSTRACT=Abstract:
Background: Using high robust radiomic features in modeling is recommended, yet its impact on radiomic model is unclear. This study evaluated the radiomic model's robustness and generalizability after screening out low-robust features before radiomic modeling. The results were validated with four datasets and two clinically relevant tasks.
Materials and methods: A total of 1,419 head-and-neck cancer patients' computed tomography images, gross tumor volume segmentation, and clinically relevant outcomes (distant metastasis and local-regional recurrence) were collected from four publicly available datasets. The perturbation method was implemented to simulate images, and the radiomic feature robustness was quantified using intra-class correlation of coefficient (ICC). Three radiomic models were built using all features (ICC > 0), good-robust features (ICC > 0.75), and excellent-robust features (ICC > 0.95), respectively. A filter-based feature selection and Ridge classification method were used to construct the radiomic models. Model performance was assessed with both robustness and generalizability. The robustness of the model was evaluated by the ICC, and the generalizability of the model was quantified by the train-test difference of Area Under the Receiver Operating Characteristic Curve (AUC).
Results: The average model robustness ICC improved significantly from 0.65 to 0.78 (P < 0.0001) using good-robust features and to 0.91 (P < 0.0001) using excellent-robust features. Model generalizability also showed a substantial increase, as a closer gap between training and testing AUC was observed where the mean train-test AUC difference was reduced from 0.21 to 0.18 (P < 0.001) in good-robust features and to 0.12 (P < 0.0001) in excellent-robust features. Furthermore, good-robust features yielded the best average AUC in the unseen datasets of 0.58 (P < 0.001) over four datasets and clinical outcomes.
Conclusions: Including robust only features in radiomic modeling significantly improves model robustness and generalizability in unseen datasets. Yet, the robustness of radiomic model has to be verified despite building with robust radiomic features, and tightly restricted feature robustness may prevent the optimal model performance in the unseen dataset as it may lower the discrimination power of the model.