- 1Department of General Surgery, Ningbo No.9 Hospital, Ningbo, Zhejiang, China
- 2Department of Thoracic Surgery, Ningbo No.9 Hospital, Ningbo, Zhejiang, China
Objective: To develop and validate a deep learning predictive model with better performance in survival estimation of esophageal adenocarcinoma (EAC).
Method: Cases diagnosed between January 2010 and December 2018 were extracted from the Surveillance, Epidemiology, and End Results (SEER) database. A deep learning survival neural network was developed and validated based on 17 variables, including demographic information, clinicopathological characteristics, and treatment details. Based on the total risk score derived from this algorithm, a novel risk classification system was constructed and compared with the 8th edition of the tumor, node, and metastasis (TNM) staging system.
Results: Of 7,764 EAC patients eligible for the study, 6,818 (87.8%) were men and the median (interquartile range, IQR) age was 65 (58–72) years. The deep learning model generated significantly superior predictions to the 8th edition staging system on the test data set (C-index: 0.773 [95% CI, 0.757–0.789] vs. 0.683 [95% CI, 0.667–0.699]; P < 0.001). Calibration curves revealed that the deep learning model was well calibrated for 1- and 3-year OS, most points almost directly distributing on the 45° line. Decision curve analyses (DCAs) showed that the novel risk classification system exhibited a more significant positive net benefit than the TNM staging system. A user-friendly and precise web-based calculator with a portably executable file was implemented to visualize the deep learning predictive model.
Conclusion: A deep learning predictive model was developed and validated, which possesses more excellent calibration and discrimination abilities in survival prediction of EAC. The novel risk classification system based on the deep learning algorithm may serve as a useful tool in clinical decision making given its easy-to-use and better clinical applicability.
Introduction
Over the past few decades, the incidence of esophageal adenocarcinoma (EAC) has increased substantially in many Western populations and with most patients diagnosed at advanced stages (1–3). Despite recent advances in multimodality treatment modalities, the prognosis of EAC remains poor, with a dismal overall 5-year survival rate of around 20% (4, 5). The precise risk stratification according to survival outcomes of patients with EAC represents a crucial determinant of treatment (6). The eighth edition of the American Joint Committee on Cancer (AJCC) staging scheme still classifies patients with EAC based on tumor, node, and metastasis (TNM) frameworks, which may tailor limited survival estimations to individuals (7, 8).
It has been widely acknowledged that a variety of potential prognostic factors unaccounted for by the current TNM stage groupings (such as age, gender, tumor differentiation, and treatment choices) could significantly contribute to individualized predictions of survival (9–11). As such, studies with different methods to improve the accuracy of prognostication have been implemented. However, the majority of these studies generated prognostic tools based on the Cox proportional hazards (CPH) model, which hardly handle potentially non-linear correlations in survival analyses. As a result, the discriminative ability of these tools may be just passable (12–14).
With the rapid progress in artificial intelligence (AI) recently, deep learning is a promising solution to this problem (15). As the state-of-the-art algorithm, deep learning allows a prognostic network to automatically discover the potentially non-linear relationships with the use of multiple neural layers (16). In application, these networks, especially combined with a large-scale cohort, have shown great potential in many prognostic studies, such as lung cancer and breast cancer (17, 18). However, to date, studies taking advantage of the deep learning algorithm are absent in the prognosis of EAC.
Using a large population-based cancer database, the present study was designed to develop a deep learning survival neural network with better predictive performance for patients with EAC. With this neural network, we also attempt to construct a novel and more precise risk classification system based on the 8th edition of TNM frameworks.
Methods
Patient selection and data preparation
The current study used data from a prospectively maintained and nationwide cancer database, the Surveillance, Epidemiology, and End Results (SEER) database. All primary patients with EAC pathologically diagnosed between January 2010 and December 2018 were initially considered eligible for our study. We collected the demographic information of patients (age, gender, race, and marital status), clinicopathological characteristics (tumor [T] stage, nodal [N] stage, metastasis [M] stage, metastatic site [bone, brain, liver, lung], histologic grade, tumor location and size), and treatment choices (surgery, chemotherapy, and radiation), with a total of 17 potentially influencing variables. Excluded were cases identified by autopsy or death certificate only, follow-up less than 1 month, and those who lacked any of the included features as mentioned above. Finally, a total of 7,764 cases were selected for further analyses, randomly divided into the training and test cohorts with the ratio of 8:2 (Figure 1A). This study was deemed exempt from the institutional review board (IRB), since any identifiable information of patients in this database is unavailable. All methods were carried out in accordance with relevant guidelines and regulations.
Figure 1 Analytical framework for survival prediction. (A) Flowchart showing derivation of the training and test cohorts. (B) A detailed pipeline to develop, validate, and test the deep learning model.
In this study, marital status was reclassified as single, married, divorced/separated, and widowed. Since only 38 cases were unmarried or domestic partners, we reassigned them as single. The TNM stage was also reclassified to generate a uniform dataset according to the eighth edition of the AJCC staging system. Thus, for patients diagnosed before 2018, we manually translated 7th-edition stages into their corresponding 8th-edition stages. Then, all T stages were redefined into T1, T2, T3, T4, and TX. For instance, T1 stages were consisting of T1a and T1b. Similarly, all N stages were transformed into N0, N1, N2, N3, and NX.
Deep learning survival neural network
In this study, we develop the deep learning survival neural network with a method referred to as DeepSurv, which was designed by Katzman (Figure 1B) (19). In brief, the deep learning model contained a fully connected feed-forward neural network structure with a single output node to calculate the survival risks of patients using the negative log-partial likelihood function. Firstly, all numerical covariates were standardized and categorical features were transformed as dummy variables when tuning. In the present study, all 17 variables described above were included. A detailed dataset description is presented in Table 1. Next, the grid search method was adopted to select the optimal hyperparameters of DeepSurv. To minimize model overfitting, the optimal hyperparameters were determined according to the least validation loss from fivefold cross validation. Then, based on the optimal hyperparameters, we developed a deep learning survival neural network comprising three hidden layers, each of which has 40 neurons. The selected optimal hyperparameters were as follows: the dropout rate was 0.3, the learning rate was 0.002, the batch size was 200, and the optimizer was Adam. Lastly, to confirm the robustness of our neural network, we attempted to develop the model with other different random seeds. As shown in Supplementary Figure S1, the discriminative ability of the network was relatively robust. The current study was in line with the TRIPOD guideline (Supplementary Material: Tripod-Checklist-Prediction-Model-Development).
Statistical analysis
The primary end point was overall survival (OS), which was calculated from the date of diagnosis to the time of death from any cause or last follow-up observation. Continuous variables are presented as medians with interquartile range (IQR), whereas categorical variables are presented as frequencies with percentages. The training and test cohorts were compared with the Mann–Whitney test for continuous variables or the chi-square test in case of categorical variables. In addition to the neural network, a CPH model was also constructed by applying a backward approach based on the Akaike information criterion (AIC). Harrell’s concordance index (C-index) and calibration curves were utilized to evaluate the predictive performance, and compared between the proposed models and the 8th edition of the AJCC staging system.
Then, we further derived the total risk score from the neural network. According to the quartile of the total risk score, patients with EAC were divided into four groups to construct a novel risk classification system. Survival curves were plotted with the Kaplan–Meier method and compared using the log-rank test. Decision curve analyses (DCAs) were performed to evaluate the clinical utility and compared between the novel risk classification system and the 8th edition of the AJCC staging system. With pandas and Scikit-learn packages utilized for the treatment of data, the deep learning survival neural network was developed on the PyTorch framework. Other statistical analyses were performed with R software (Edition of 4.0.2, R Foundation, Vienna, Austria); a 2-sided P-value < 0.05 was considered statistically significant.
Results
Patient demographics and characteristics
A total of 7,764 patients with EAC were eligible for the study (median [IQR] age, 65 [58–72] years; 6,818 [87.8%] men), of which 6,211 were assigned to the training cohort, whereas 1,553 were assigned to the test cohort (Figure 1). The training and test cohorts showed similar distributions in demographic and clinical characteristics (Table 1). More than 90% of EAC patients were white, with the majority diagnosed at advanced stages (5,222 [67.3%]). Median follow-up was 50 months (95% CI, 48–52 months). There were 5,050 patients (65.0%) who died during the follow-up period, of which 4,418 (56.9%) were deaths from EAC.
Calibration and validation of the deep learning model in the test cohort
We compared the discriminative ability of the deep learning model to that of the 8th edition of the TNM staging system in the test cohort (Table 2). The deep learning model generated significantly superior predictions to the 8th-edition staging system (C-index: 0.773 [95% CI, 0.757–0.789] vs. 0.683 [95% CI, 0.667–0.699]; P < 0.001). Similarly, the C-index of the deep learning model was also significantly superior to that of the CPH model (C-index for CPH was 0.748 [95% CI, 0.732–0.764]; P < 0.001). Calibration curves revealed that the deep learning model was best calibrated, with the superb agreement between the predicted probabilities and the actual outcomes of all patients with EAC for 1- and 3-year OS, most points almost directly distributing on the 45° line (Supplementary Figure S2).
Table 2 Comparison of C-indices between the proposed models and the 8th edition of the AJCC staging system.
Model visualization and establishment of a novel risk classification system
We established an easy-to-use and precise web-based calculator (https://web-calculator.shinyapps.io/DynNomapp/) with a portably executable file (https://pan.baidu.com/s/1DUU-x6XbpYHfmf1Kf1dFNA?pwd=1234). The total risk score for each patient would be calculated automatically by the application according to the input characteristics (Figure 2A). Then, the web-based calculator would conveniently provide the precise predicted probability of OS based on the total risk score (Figure 2B).
Figure 2 Model visualization of the deep learning predictive model. (A) A portably executable file to calculate the total risk score according to the input characteristics. (B) A web-based calculator for overall survival (OS) estimation in patients with EAC.
Next, according to the quartile of the total risk score, we attempted to assign patients with EAC into four risk groups to construct a novel risk classification system. Compared with the 8th edition of the TNM staging system, the Kaplan–Meier curves showed that the novel risk classification system seemed to better distinguish patients with different risks in both the training and test cohorts (Figure 3). Further, DCAs also showed that the novel risk classification system exhibited a more significant positive net benefit than the 8th edition of the TNM staging system (Supplementary Figure S3).
Figure 3 Kaplan–Meier curves for overall survival (OS) according to the different staging systems. The novel risk classification system in the training cohort (A) and test cohort (C). The eighth edition of the TNM staging system in the training cohort (B) and test cohort (D).
Discussion
Overall, our pilot study was designed to construct a deep learning neural network in survival prediction for newly diagnosed patients with EAC. This large-scale study demonstrated that the deep learning model possessed significantly superior predictive performance than the traditional TNM staging system. Moreover, we succeeded to construct a novel and more precise risk classification system based on the 8th edition of the TNM frameworks, which may potentially help clinicians in clinical decision making.
During the past few years, EAC has experienced a dramatic increase in incidence and surpassed esophageal squamous cell carcinoma (ESCC) in many Western countries, including the United States (20, 21). It is estimated that the incidence of EAC will continuously increase up to 2030, which certainly imposes economic burdens (22). As such, evidence-based treatment opinions on optimal strategy recommending are crucial for reducing such burdens. The eighth edition of the TNM staging scheme was the most well-validated prognostic indicator for EAC. However, patients within the same stage cohort vary widely in the survival rate (12). Other potential prognostic factors, which are not included in these TNM stage groupings, could affect the outcome, from patient-specific factors such as age and gender to tumor-related information such as grade and tumor location (23). Incorporating these various characteristics of each EAC patient to provide an accurate prediction of prognosis is challenging in the absence of easy-to-use and comprehensive predictive models.
Previous research has reported a variety of predictive tools based on linear CPH for survival prediction in patients with EAC (12–14). By integrating other potentially independent factors with TNM stage, these linear models to some extent derive more precise risk stratification for patients with EAC. However, CPH assume that a patient’s log-risk of death is a simplistic linear combination of some observed covariates, which cannot handle the non-linear relationships and failed to include some potentially colinear but important prognostic factors (17, 24).
AI has been more and more popular in various disciplines for its ability to mimic an intelligent human mind’s cognitive behavior (22). In gastroenterology, AI-based technologies, which are characterized by deep learning as state-of-the-art algorithms, have been already applied in many aspects, such as to diagnose dysplasia in Barrett’s esophagus (BE), to identify Helicobacter pylori in the upper gastrointestinal (UGI) tract, and other diagnostic applications (25–27). Nevertheless, few studies have focused on its performance in the prediction of survival for EAC patients, to our knowledge. The present study is the first to develop a more accurate predictive tool for EAC patients using a novel deep learning model, DeepSurv. In this study, a deep learning survival neural network integrating the demographic information of patients, characteristics of tumor extent, and type of treatment was established and validated, with calibration curves revealing excellent agreement between the predicted and actual OS probability. Moreover, the results showed that this neural network significantly outperformed use of the TNM staging alone, as well as the CPH model, which provide additional evidence of the superior predictive accuracy of deep learning models over conventional approaches.
In addition, according to the quartile of the total risk score derived from the deep learning model, we also attempted to assign EAC patients into four risk groups to construct a novel risk classification system. The DCAs showed that a more significant positive net benefit was observed in the novel risk classification system than the TNM staging system. Given that an easy-to-use and precise web-based calculator was implemented, we do believe this novel risk classification system would have widespread acceptance and play a big role in clinical decision making.
Despite these strengths, it is important to recognize several limitations of the present study. Firstly, it is regrettable that the SEER database could not provide information on lifestyle habits and overall comorbidity, more information about peritoneal metastases, and biomarkers in the laboratory, which may further improve the predictive accuracy (28). Secondly, we also acknowledge it hard to interpret how the deep learning model works because the process of predictions is much like black boxes. Thirdly, although a more precise risk classification system was established with utilizing a large population-based US database, a creditable validation in a non-US population is still warranted.
Conclusion
The present study is the first, to our knowledge, to construct and validate a deep learning model, which possesses more excellent calibration and discrimination abilities in survival prediction of EAC. This novel risk classification system may serve as a useful tool in clinical decision making given its easy-to-use clinical applicability. Further creditable studies in a non-US population are warranted to validate our deep learning model.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://seer.cancer.gov/.
Author contributions
Dr QS had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: QS, HC. Acquisition, analysis, or interpretation of data: QS, HC. Statistical analysis: QS. Drafting of the manuscript: QS. Supervision: HC. All authors contributed to the article and approved the submitted version.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2022.887841/full#supplementary-material
Supplementary Figure 1 | The discriminative ability of the deep learning network in the test cohort with different random seeds.
Supplementary Figure 2 | Calibration curves for overall survival (OS) for the deep learning (DL) model, Cox proportional hazards (CPH) model and TNM staging system. DL at 1- year (A) and 3- year (B); CPH at 1- year (C) and 3- year (D); TNM at 1- year (E) and 3- year (F).
Supplementary Figure 3 | Decision curve analyses (DCAs) demonstrating the net benefits associated with the use of two different staging systems. (A) In the training cohort; (B) In the test cohort.
References
1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin (2021) 71(3):209–49. doi: 10.3322/caac.21660
2. Skinner HD, Lee JH, Bhutani MS, Weston B, Hofstetter W, Komaki R, et al. A validated miRNA profile predicts response to therapy in esophageal adenocarcinoma. Cancer. (2014) 120(23):3635–41. doi: 10.1002/cncr.28911
3. Lanuti M, Liu G, Goodwin JM, Zhai R, Fuchs BC, Asomaning K, et al. A functional epidermal growth factor (EGF) polymorphism, EGF serum levels, and esophageal adenocarcinoma risk and outcome. Clin Cancer Res (2008) 14(10):3216–22. doi: 10.1158/1078-0432.CCR-07-4932
4. Dong J, Buas MF, Gharahkhani P, Kendall BJ, Onstad L, Zhao S, et al. Determining risk of barrett's esophagus and esophageal adenocarcinoma based on epidemiologic factors and genetic variants. Gastroenterology. (2018) 154(5):1273–81.e3. doi: 10.1053/j.gastro.2017.12.003
5. Steins A, Ebbing EA, Creemers A, van der Zalm AP, Jibodh RA, Waasdorp C, et al. Chemoradiation induces epithelial-to-mesenchymal transition in esophageal adenocarcinoma. Int J Cancer (2019) 145(10):2792–803. doi: 10.1002/ijc.32364
6. Raja S, Ahmad U. Is newer actually better? where does the 8th edition outperform the 7th edition of the esophageal TNM staging system? Ann Surg Oncol (2021) 28(2):596–7. doi: 10.1245/s10434-020-09199-7
7. Rice TW, Patil DT, Blackstone EH. 8th edition AJCC/UICC staging of cancers of the esophagus and esophagogastric junction: Application to clinical practice. Ann Cardiothorac Surg (2017) 6(2):119–30. doi: 10.21037/acs.2017.03.14
8. Hu K, Kang N, Liu Y, Guo D, Jing W, Lu J, et al. Proposed revision of n categories to the 8th edition of the AJCC-TNM staging system for non-surgical esophageal squamous cell cancer. Cancer Sci (2019) 110(2):717–25. doi: 10.1111/cas.13891
9. Goense L, van Rossum PSN, Xi M, Maru DM, Carter BW, Meijer GJ, et al. Preoperative nomogram to risk stratify patients for the benefit of trimodality therapy in esophageal adenocarcinoma. Ann Surg Oncol (2018) 25(6):1598–607. doi: 10.1245/s10434-018-6435-4
10. Gotink AW, van de Ven SEM, Ten Kate FJC, Nieboer D, Suzuki L, Weusten B, et al. Individual risk calculator to predict lymph node metastases in patients with submucosal (T1b) esophageal adenocarcinoma: A multicenter cohort study. Endoscopy (2021) 54(24):109–17. doi: 10.1055/a-1399-4989
11. Shao Y, Geng Y, Gu W, Ning Z, Huang J, Pei H, et al. Assessment of lymph node ratio to replace the pN categories system of classification of the TNM system in esophageal squamous cell carcinoma. J Thorac Oncol (2016) 11(10):1774–84. doi: 10.1016/j.jtho.2016.06.019
12. Shao CY, Yu Y, Li QF, Liu XL, Song HZ, Shen Y, et al. Development and validation of a clinical prognostic nomogram for esophageal adenocarcinoma patients. Front Oncol (2021) 11:736573. doi: 10.3389/fonc.2021.736573
13. Gabriel E, Attwood K, Shah R, Nurkin S, Hochwald S, Kukar M. Novel calculator to estimate overall survival benefit from neoadjuvant chemoradiation in patients with esophageal adenocarcinoma. J Am Coll Surg (2017) 224(5):884–94 e1. doi: 10.1016/j.jamcollsurg.2017.01.043
14. Shao CY, Liu XL, Yao S, Li ZJ, Cong ZZ, Luo J, et al. Development and validation of a new clinical staging system to predict survival for esophageal squamous cell carcinoma patients: Application of the nomogram. Eur J Surg Oncol (2021) 47(6):1473–80. doi: 10.1016/j.ejso.2020.12.004
15. Qu W, Liu Q, Jiao X, Zhang T, Wang B, Li N, et al. Development and validation of a personalized survival prediction model for uterine adenosarcoma: A population-based deep learning study. Front Oncol (2020) 10:623818. doi: 10.3389/fonc.2020.623818
16. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. (2015) 521(7553):436–44. doi: 10.1038/nature14539
17. She Y, Jin Z, Wu J, Deng J, Zhang L, Su H, et al. Development and validation of a deep learning model for non-small cell lung cancer survival. JAMA Netw Open (2020) 3(6):e205842. doi: 10.1001/jamanetworkopen.2020.5842
18. Bice N, Kirby N, Bahr T, Rasmussen K, Saenz D, Wagner T, et al. Deep learning-based survival analysis for brain metastasis patients with the national cancer database. J Appl Clin Med Phys (2020) 21(9):187–92. doi: 10.1002/acm2.12995
19. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC Med Res Methodol (2018) 18(1):24. doi: 10.1186/s12874-018-0482-1
20. Nguyen GH, Schetter AJ, Chou DB, Bowman ED, Zhao R, Hawkes JE, et al. Inflammatory and microRNA gene expression as prognostic classifier of barrett's-associated esophageal adenocarcinoma. Clin Cancer Res (2010) 16(23):5824–34. doi: 10.1158/1078-0432.CCR-10-1110
21. Wang Z, Da Silva TG, Jin K, Han X, Ranganathan P, Zhu X, et al. Notch signaling drives stemness and tumorigenicity of esophageal adenocarcinoma. Cancer Res (2014) 74(21):6364–74. doi: 10.1158/0008-5472.CAN-14-2051
22. Zhang YH, Guo LJ, Yuan XL, Hu B. Artificial intelligence-assisted esophageal cancer management: Now and future. World J Gastroenterol (2020) 26(35):5256–71. doi: 10.3748/wjg.v26.i35.5256
23. Davies AR, Pillai A, Sinha P, Sandhu H, Adeniran A, Mattsson F, et al. Factors associated with early recurrence and death after esophagectomy for cancer. J Surg Oncol (2014) 109(5):459–64. doi: 10.1002/jso.23511
24. Randall RL, Cable MG. Nominal nomograms and marginal margins: what is the law of the line? Lancet Oncol (2016) 17(5):554–6. doi: 10.1016/S1470-2045(16)00072-3
25. Mori Y, Kudo SE, Mohmed HEN, Misawa M, Ogata N, Itoh H, et al. Artificial intelligence and upper gastrointestinal endoscopy: Current status and future perspective. Dig Endosc (2019) 31(4):378–88. doi: 10.1111/den.13317
26. Xu Y, Selaru FM, Yin J, Zou TT, Shustova V, Mori Y, et al. Artificial neural networks and gene filtering distinguish between global gene expression profiles of barrett's esophagus and esophageal cancer. Cancer Res (2002) 62(12):3493–7. doi: 10.1111/den.13317
27. Le Berre C, Sandborn WJ, Aridhi S, Devignes MD, Fournier L, Smail-Tabbone M, et al. Application of artificial intelligence to gastroenterology and hepatology. Gastroenterology. (2020) 158(1):76–94.e72. doi: 10.1053/j.gastro.2019.08.058
Keywords: deep learning, clinical decision- making, prognosis, prognosis carcinoma, esophageal adenocarcinoma (EAC)
Citation: Shen Q and Chen H (2022) A novel risk classification system based on the eighth edition of TNM frameworks for esophageal adenocarcinoma patients: A deep learning approach. Front. Oncol. 12:887841. doi: 10.3389/fonc.2022.887841
Received: 04 March 2022; Accepted: 18 November 2022;
Published: 07 December 2022.
Edited by:
Emmanuel Gabriel, Mayo Clinic, United StatesReviewed by:
David Wang, University of Texas Southwestern Medical Center, United StatesYe Bai, Chongqing Medical University, China
Fang Liao, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, China
Copyright © 2022 Shen and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Hongyu Chen, Y2hlbmhvbmd5dTAxMTlAMTYz