AUTHOR=Ding Li , Zhang Chi , Wang Kun , Zhang Yang , Wu Chuang , Xia Wentao , Li Shuaishuai , Li Wang , Wang Junqi TITLE=A machine learning-based model for predicting the risk of early-stage inguinal lymph node metastases in patients with squamous cell carcinoma of the penis JOURNAL=Frontiers in Surgery VOLUME=10 YEAR=2023 URL=https://www.frontiersin.org/journals/surgery/articles/10.3389/fsurg.2023.1095545 DOI=10.3389/fsurg.2023.1095545 ISSN=2296-875X ABSTRACT=Objective

Inguinal lymph node metastasis (ILNM) is significantly associated with poor prognosis in patients with squamous cell carcinoma of the penis (SCCP). Patient prognosis could be improved if the probability of ILNM incidence could be accurately predicted at an early stage. We developed a predictive model based on machine learning combined with big data to achieve this.

Methods

Data of patients diagnosed with SCCP were obtained from the Surveillance, Epidemiology, and End Results Program Research Data. By combing variables that represented the patients' clinical characteristics, we applied five machine learning algorithms to create predictive models based on logistic regression, eXtreme Gradient Boosting, Random Forest, Support Vector Machine, and k-Nearest Neighbor. Model performance was evaluated by ten-fold cross-validation receiver operating characteristic curves, which were used to calculate the area under the curve of the five models for predictive accuracy. Decision curve analysis was conducted to estimate the clinical utility of the models. An external validation cohort of 74 SCCP patients was selected from the Affiliated Hospital of Xuzhou Medical University (February 2008 to March 2021).

Results

A total of 1,056 patients with SCCP from the SEER database were enrolled as the training cohort, of which 164 (15.5%) developed early-stage ILNM. In the external validation cohort, 16.2% of patients developed early-stage ILNM. Multivariate logistic regression showed that tumor grade, inguinal lymph node dissection, radiotherapy, and chemotherapy were independent predictors of early-stage ILNM risk. The model based on the eXtreme Gradient Boosting algorithm showed stable and efficient prediction performance in both the training and external validation groups.

Conclusion

The ML model based on the XGB algorithm has high predictive effectiveness and may be used to predict early-stage ILNM risk in SCCP patients. Therefore, it may show promise in clinical decision-making.