The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Oncol.
Sec. Gastrointestinal Cancers: Colorectal Cancer
Volume 15 - 2025 |
doi: 10.3389/fonc.2025.1517846
Development and Validation of a Machine Learning Model for Predicting Lymph Node Metastasis in Rectal Cancer Using Clinicoradiological Data
Provisionally accepted- 1 Department of General Surgery, First Affiliated Hospital of Anhui Medical University, Hefei, China
- 2 Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui Province, China
- 3 Anhui Medical University, Hefei, Anhui Province, China
- 4 Department of Radiology, First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, China
- 5 School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui Province, China
Background Rectal cancer (RC) is a common malignant tumor with lymph node metastasis (LNM) being a critical determinant of patient prognosis. Traditional diagnostic methods have limitations, necessitating the development of predictive models using clinical data. This study aimed to construct and validate machine learning models to predict LNM risk in RC patients based on clinical data. Methods Retrospective clinical data from 2,454 RC patients in the SEER database were used for training and internal validation. An additional 500 RC patients' data from the First Affiliated Hospital of Anhui Medical University formed the external validation set.Lymph nodes were identified on CT scans and combined with clinicopathological data.Variables were selected using LASSO, followed by univariate and multivariate logistic regression. Eleven ML models were developed, including LR, NBC, SVM, KNN, RF, ET, XGB, GBM, LightGBM, AdaBoost, and MLP. Model performance was assessed 2/23 using AUC, correlation heatmaps, permutation analysis, and decision curve analysis.The training cohort included 1954 patients, with internal and external validation cohorts each comprising 500 patients. LNM was present in 526 (26.9%), 135 (27%), and 405 (81%) patients, respectively. Independent predictors of LNM were tumor pathological grade, clinical T stage, clinical N stage, tumor length, nerve invasion, and Total number of lymph nodes. In the internal validation cohort, AUC values ranged from 0.859 to 0.964, with RandomForest and ET achieving the highest AUC of 0.964, followed by LightGBM (0.943) and XGBoost (0.942). In the external validation cohort, AUC values ranged from 0.735 to 0.838, with Gradient Boosting achieving the highest AUC of 0.879, followed by XGBoost (0.832) and LightGBM (0.831). Decision curve analysis showed that the XGBoost model was optimal in terms of net benefit and threshold probability applicability. Conclusions This study successfully developed and validated 11 ML models to predict LNM risk in RC. The XGBoost model was optimal, achieving AUC > 0.9 in 10 internal models and AUC > 0.8 in 7 external models.The identified predictors of LNM can facilitate early diagnosis and personalized treatment, highlighting the potential of integrating CT scan data with clinicopathological findings to build effective predictive models.
Keywords: rectal cancer, lymph node metastasis, Machine learning model, Prediction models, Clinical data, prognosis, XGBoost
Received: 27 Oct 2024; Accepted: 24 Jan 2025.
Copyright: © 2025 Hou, Wang, Bian, Wang, Wan and Zou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Bingbing Zou, Department of General Surgery, First Affiliated Hospital of Anhui Medical University, Hefei, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.