To establish an online predictive model for the prediction of cervical lymph node metastasis (CLNM) in children and adolescents with differentiated thyroid cancer (caDTC). And analyze the impact between socioeconomic disparities, regional environment and CLNM.
We retrospectively analyzed clinicopathological and sociodemographic data of caDTC from the Surveillance, Epidemiology, and End Results (SEER) database from 2000 to 2019. Risk factors for CLNM in caDTC were analyzed using univariate and multivariate logistic regression (LR). And use the extreme gradient boosting (XGBoost) algorithm and other commonly used ML algorithms to build CLNM prediction models. Model performance assessment and visualization were performed using the area under the receiver operating characteristic (AUROC) curve and SHapley Additive exPlanations (SHAP).
In addition to common risk factors, our study found that median household income and living regional were strongly associated with CLNM. Whether in the training set or the validation set, among the ML models constructed based on these variables, the XGBoost model has the best predictive performance. After 10-fold cross-validation, the prediction performance of the model can reach the best, and its best AUROC value is 0.766 (95%CI: 0.745-0.786) in the training set, 0.736 (95%CI: 0.670-0.802) in the validation set, and 0.733 (95%CI: 0.683-0.783) in the test set. Based on this XGBoost model combined with SHAP method, we constructed a web-base predictive system.
The online prediction model based on the XGBoost algorithm can dynamically estimate the risk probability of CLNM in caDTC, so as to provide patients with personalized treatment advice.