AUTHOR=Cheng Xiaoyun , Li Jinzhang , Xu Tianming , Li Kemin , Li Jingnan TITLE=Predicting Survival of Patients With Rectal Neuroendocrine Tumors Using Machine Learning: A SEER-Based Population Study JOURNAL=Frontiers in Surgery VOLUME=8 YEAR=2021 URL=https://www.frontiersin.org/journals/surgery/articles/10.3389/fsurg.2021.745220 DOI=10.3389/fsurg.2021.745220 ISSN=2296-875X ABSTRACT=

Background: The number of patients diagnosed with rectal neuroendocrine tumors (R-NETs) is increasing year by year. An integrated survival predictive model is required to predict the prognosis of R-NETs. The present study is aimed at exploring epidemiological characteristics of R-NETs based on a retrospective study from the Surveillance, Epidemiology, and End Results (SEER) database and predicting survival of R-NETs with machine learning.

Methods: Data of patients with R-NETs were extracted from the SEER database (2000–2017), and data were also retrospectively collected from a single medical center in China. The main outcome measure was the 5-year survival status. Risk factors affecting survival were analyzed by Cox regression analysis, and six common machine learning algorithms were chosen to build the predictive models. Data from the SEER database were divided into a training set and an internal validation set according to the year 2010 as a time point. Data from China were chosen as an external validation set. The best machine learning predictive model was compared with the American Joint Committee on Cancer (AJCC) seventh staging system to evaluate its predictive performance in the internal validation dataset and external validation dataset.

Results: A total of 10,580 patients from the SEER database and 68 patients from a single medical center were included in the analysis. Age, gender, race, histologic type, tumor size, tumor number, summary stage, and surgical treatment were risk factors affecting survival status. After the adjustment of parameters and algorithms comparison, the predictive model using the eXtreme Gradient Boosting (XGBoost) algorithm had the best predictive performance in the training set [area under the curve (AUC) = 0.87, 95%CI: 0.86–0.88]. In the internal validation, the predictive ability of XGBoost was better than that of the AJCC seventh staging system (AUC: 0.90 vs. 0.78). In the external validation, the XGBoost predictive model (AUC = 0.89) performed better than the AJCC seventh staging system (AUC = 0.83).

Conclusions: The XGBoost algorithm had better predictive power than the AJCC seventh staging system, which had a potential value of the clinical application.