Machine learning to predict distant metastasis and prognostic analysis of moderately differentiated gastric adenocarcinoma patients: a novel focus on lymph node indicators

Yang, Kangping; Wu, Jiaqiang; Xu, Tian; Zhou, Yuepeng; Liu, Wenchun; Yang, Liang

doi:10.3389/fimmu.2024.1398685

ORIGINAL RESEARCH article

Front. Immunol., 19 September 2024

Sec. Cancer Immunity and Immunotherapy

Volume 15 - 2024 | https://doi.org/10.3389/fimmu.2024.1398685

This article is part of the Research TopicApplication of Bioinformatics, Machine Learning, and Artificial Intelligence to Improve Diagnosis, Prognosis and Treatment of CancerView all 12 articles

Machine learning to predict distant metastasis and prognostic analysis of moderately differentiated gastric adenocarcinoma patients: a novel focus on lymph node indicators

Kangping Yang^1†

Jiaqiang Wu^2†

Tian Xu^3†

Yuepeng Zhou¹

Wenchun Liu⁴

Liang Yang^1*

¹Department of Gastroenterological Surgery, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi, China
²Department of General Surgery, First Medical Center of the Chinese People's Liberation Army General Hospital, Beijing, China
³Department of Gastroenterological Surgery, Jiangxi Hospital of Integrated Traditional Chinese and Western Medicine, Nanchang, Jiangxi, China
⁴The Second Department of Internal Medicine, Anfu People’s Hospital, Anfu, Jiangxi, China

Background: Moderately differentiated gastric adenocarcinoma (MDGA) has a high risk of metastasis and individual variation, which strongly affects patient prognosis. Using large-scale datasets and machine learning algorithms for prediction can improve individualized treatment. The specific efficacy of several lymph node indicators in predicting distant metastasis (DM) and patient prognosis in MDGA remains obscure.

Methods: We collected data from MDGA patients from the SEER database from 2010 to 2019. Additionally, we collected data from MDGA patients in China. We used nine machine learning algorithms to predict DM. Subsequently, we used Cox regression analysis to determine the risk factors affecting overall survival (OS) and cancer-specific survival (CSS) in DM patients and constructed nomograms. Furthermore, we used logistic regression and Cox regression analyses to assess the specific impact of six lymph node indicators on DM incidence and patient prognosis.

Results: We collected data from 5,377 MDGA patients from the SEER database and 109 MDGC patients from hospitals. T stage, N stage, tumor size, primary site, number of positive lymph nodes, and chemotherapy were identified as independent risk factors for DM. The random forest prediction model had the best overall predictive performance (AUC = 0.919). T stage, primary site, chemotherapy, and the number of regional lymph nodes were identified as prognostic factors for OS. Moreover, T stage, number of regional lymph nodes, primary site, and chemotherapy were also influential factors for CSS. The nomograms showed good predictive value and stability in predicting the 1-, 3-, and 5-year OS and CSS in DM patients. Additionally, the log odds of a metastatic lymph node and the number of negative lymph nodes may be risk factors for DM, while the regional lymph node ratio and the number of regional lymph nodes are prognostic factors for OS.

Conclusion: The random forest prediction model accurately identified high-risk populations, and we established OS and CSS survival prediction models for MDGA patients with DM. Our hospital samples demonstrated different characteristics of lymph node indicators in terms of distant metastasis and prognosis.

1 Introduction

Gastric cancer, a very prevalent gastrointestinal tumor, is the fifth most prevalent tumor worldwide (1). In 2020, there were more than one million additional cases of gastric cancer (2). The histologic type of gastric cancer is predominantly adenocarcinoma, and the pathologic grade includes highly, moderately, and poorly differentiated and undifferentiated (3, 4). Although progressive gastric cancer is predominantly poorly differentiated, some moderately differentiated gastric adenocarcinomas (MDGAs) still have a high risk of metastasis and individual differences, which have been reported in animal models and clinical studies (5–7). There is no doubt that the occurrence of distant metastasis (DM) directly affects patient prognosis (8). According to the latest eighth revision of the UICC/AJCC TNM classification for gastric cancer, once DM occurs, the disease has already entered stage IV, at which time the patient’s survival chances are extremely poor (9). A retrospective study showed that the median overall survival (OS) time for patients with liver metastases from gastric cancer was 7 months and that for patients with lung and brain metastases (10) was only 5 months. Timely and accurate determination of the distant metastasis status of gastric cancer patients has important positive implications for avoiding missing opportunities for early and effective interventions and improving patient survival.

Currently, tests to clarify the occurrence of DM mainly rely on multidetector computed tomography (CT), positron emission tomography-CT (PET/CT), and other imaging methods (11, 12). However, all of these methods have the problem of insufficient sensitivity in practical applications (13). For example, in PET/CT, some poorly differentiated carcinomas, mucinous carcinomas, and indolent cell carcinomas usually have low ¹⁸F-FDG uptake, which often results in false-negative results and delayed therapy (14). Therefore, there is an urgent need for an accurate, convenient, yet affordable method for DM diagnosis and prediction. The use of emerging machine learning (ML) algorithms and large-scale datasets to construct predictive models is currently a popular solution (15–17). ML algorithms are able to accurately process raw data originating from databases, analyze the relationships between important data, and ultimately build and filter the best predictive models (18–21). This prediction model, which integrates clinical manifestations and imaging data to form a comprehensive assessment tool, can be used to diagnose the presence or absence of DM early and accurately and can better guide subsequent clinical diagnosis and treatment.

For patients with already occurring DM, the median OS after performing conventional chemotherapy is approximately 12 months (22). With regard to cancer-specific survival (CSS), the 1- and 3-year CSS rates for the younger group (≤60 years of age) were 29.0% and 6.2%, respectively, compared with 22.8% and 4.8% for the older group (>60 years of age), respectively (23). These findings suggest that there are many factors that can influence DM patient prognosis, and clarifying the effects of these factors and applying them in a targeted manner are important ways to improve patient prognosis. Many studies have demonstrated that factors such as age, tumor size, sex, degree of differentiation, and primary site are directly associated with DM patient prognosis (24–26). Moreover, recent studies have demonstrated a strong association between various lymph node indicators and DM and the prognosis of moderately differentiated gastric adenocarcinoma patients. For example, lymph node-specific indicators include the number of positive lymph nodes (PLNs), the lymph node ratio (LNR), and the log odds of metastatic LNs (LODDS) (27–29). However, the specific efficacy of these lymph node indicators in predicting DM and patient prognosis is unclear (30–33). This study explored these prognostic factors in DM patients in the MDGA to provide strong theoretical support for individualized treatment in this population. Afterward, the above factors were combined to construct OS and CSS prognostic nomograms at 1, 3, and 5 years for DM patients with MDGA, which is a simplified visualization model for statistical prediction in combination with independent factors.

Our goal was to formulate models for predicting DM in MDGA patients and to ensure the stability and accuracy of these models through both database validation and external validation. A prognostic analysis of DM patients was then performed to plot OS and CSS prognostic nomograms for MDGA patients. Importantly, we focused on exploring the relationships between various lymph node indicators whose efficacy is still unclear and between DM and prognosis to further promote the application of lymph node indicators in the clinical practice of stomach cancer diagnosis.

2 Materials and methods

2.1 Sources of data and sample selection

The primary training dataset was obtained by collecting all 2010–2019 gastric cancer patient data from the Surveillance, Epidemiology, and End Results (SEER) database. The SEER database is the most detailed publicly available cancer database. Moreover, we collected the clinical data of MDGA patients treated at the Second Affiliated Hospital of Nanchang University between 2008 and 2010 as an external validation dataset. The inclusion criteria were as follows: 1) had a diagnosis of MDGA, 2) did not receive preoperative radiotherapy or immunotherapy, and 3) had comprehensive and searchable prognostic data. The exclusion criteria were as follows: 1) patients whose primary tumor was not gastric cancer, 2) patients whose tumor and lymph node status were not clear, and 3) patients whose other basic information was incomplete. The specific data selection steps are illustrated below in Figure 1.

Figure 1

Figure 1. Flowchart of the data screening process. The figure shows the process of filtering eligible patient data from the SEER database.

2.2 Variable selection

Variables in the present study included age, TNM stage, primary site, tumor size, sex (male or female), and two therapeutic variables (chemotherapy and radiation) obtained from the diagnostic information, as well as several lymph node indicators. Multiple lymph node indicators included the number of Reg LNs, number of all LNs, number of Reg LNs, number of Neg LNs, gross LN metastasis, LN positivity rate, log odds of metastatic LNs, and lymph node ratio (number of metastatic LNs to total number of LNs examined).

OS and CSS are the main outcomes for predicting the prognosis of patients with DM. In OS, deaths due to any cause will be counted, while in CSS analysis, only deaths due to MDGA will be considered events, and deaths due to other factors as well as survival will be excluded.

2.3 Statistical methods

The research procedure is illustrated in Figure 2. Heatmapping was first developed to correlate the proposed study variables. We use regression analysis and machine learning for dual validation of risk factors; regression analysis is performed using the full SEER data, and machine learning uses the training set, the test set, and the external validation set to construct predictive models. Independent risk factors influencing DM in moderately differentiated gastric adenocarcinoma patients were screened by logistic regression analysis. The outcomes are expressed as hazard ratios (HRs) and 95% confidence intervals (CIs). The patient data screened from the SEER dataset were randomized 7:3 into a training set and a test set. Then, the training set will be utilized to build the predictive model. The constructed predictive models are then tested and evaluated using the test set data. We constructed nine ML algorithms in the training set, including RF (random forest), LR (logistic regression), LASSO (least absolute shrinkage and selection operator), SVM (support vector machine), KNN (K-nearest neighbor), NBC (naive Bayes classifier), and ANN (artificial neural network). The receiver operating characteristic (ROC) curve, the area under the ROC curve (AUC), sensitivity, specificity, F1 score, and accuracy were used to compare the performance of the models. Additionally, the predictive models were evaluated and validated using test set data. Self-collected hospital patient data were used as an external validation set to validate the best predictive model that assessed the generalization ability of the model.

Figure 2

Figure 2. Data analysis guide. The figure shows the procedure of this study for processing and analyzing the screened data.

We used several R packages in R for data analysis and visualization. The createDataPartition function of the caret package was used for grouping the training and validation sets. The imp function of randomForest package was used to construct the importance scores of RF. The coords function of the pROC package was used to construct the confusion matrix. The randomForest package, the MASS package, the rms package, the glmnet package, the e1071 package, the xgboost package, the adabag package, and the neuralnet package were all used for machine learning model construction. The MASS package, rms package, glmnet package, e1071 package, xgboost package, adabag package, and neuralnet package were used for the construction of machine learning models. The ggplot package and pROC package were used for the visualization of ROC curves and importance scores.

For survival prognostic analyses, single-variable Cox regression analysis was first adopted to screen the relevant variables that could influence the prognosis (P < 0.05), and then multifactorial analyses were carried out on the screened variables. Moreover, we used the Kaplan−Meier curves to assess the differences in survival prognosis among patients stratified by different variables and compared the results by means of the log-rank test. The independent risk factors identified through Cox regression analysis were used to construct the nomogram. Moreover, using multifactor Cox regression analysis, the regression coefficients β (coe β) for each variable were normalized and are displayed as risk scores on the nomograms. The accuracy and discriminatory power of the generated nomograms were assessed with the AUC, calibration curves, and consistency index (C-index). In addition, we evaluated the clinical value of the nomograms by using decision curve analysis (DCA). This is a commonly used measure to assess model validity by quantitatively estimating the net effectiveness under the exposure threshold.

Finally, the impact of multiple nuanced tumor-associated LN indicators on the development of DM in MDGA patients was investigated by logistic regression analysis of patient data collected at our institution, as well as factors affecting patient prognosis. For descriptive statistics, categorical variables were compared using the chi-square test or Fisher’s exact probability method. P <0.05 indicated statistical significance.

2.4 Ethics approval

The use of patient data in this research has been authorized. The approval document from the Ethics Committee is shown in the attachment. Patients from the SEER database provided consent for open research in any scientific study worldwide.

3 Results

3.1 Basic features and patient subgroups

Altogether, 5,377 patients from the database were included in this study; 749 (13.93%) had DM, and 4,628 (86.07%) had no DM. The local patient dataset, which served as an external validation set, included a total of 109 patients, of whom 15 (13.76%) had DM and 94 (86.24%) had no DM. The patient data screened from the SEER dataset were randomized 7:3 into training and testing sets. The results of the analyses, as shown in Table 1, show the comprehensive demographic and clinical characteristics of the three groups of MDGA patients. Additionally, there were no statistically significant differences (P > 0.05) in any of the clinical characteristics analyzed, such as tumor size, primary site, TNM stage, or number of Reg LNs, between the patients in the training and testing sets.

Table 1

Table 1. Comparison of the general features of the training and test sets.

3.2 Comparison and analysis of model variables

Pearson correlation analysis was used to examine the relationship between each variable (Figure 3A). Moreover, with the comprehensive consideration of the type of data, distribution characteristics, and other factors, all the variables are independent and well-distributed and can be included in the following statistical analysis. By multifactorial logistic regression analysis, this study revealed that six variables were statistically significant in predicting the occurrence of DM in patients with MDGA (Table 2). These included the T and N stages, but the M stage seemed to be not significantly different. Other variables included primary site, tumor size, number of Reg LNs, and chemotherapy. In addition, the importance scores from the random forest model indicated variable significance (as displayed in Figure 3B). The number of Reg LNs, N stage, T stage, chemotherapy, age, tumor size, and primary site were positively related to the occurrence of DM in MDGA patients. Specifically, the outcome was the same as the findings of the former correlation analyses, except for age. With distant metastasis as the outcome variable, we conducted single- and multiple-factor logistic regression analyses on eight factors: primary site, tumor size, age, sex, T stage, N stage, number of positive LNs, and chemotherapy. Multiple factor regression was performed, and step-forward analysis revealed that the P-values for T stage, N stage, primary site, number of positive LNs, tumor size, and chemotherapy were less than 0.05 and were considered statistically significant independent risk factors. The results of forward regression analysis indicated the meaningful impact of six variables on distant metastasis: sex, T stage, N stage, primary site, tumor size, and number of positive LNs (the detailed results are presented in the Supplementary Material).

Figure 3

Figure 3. Results of Pearson correlation analysis for each variable (A) and ranking of importance of predictive model characteristics (B). The results of Pearson correlation analysis for each variable (A) showed that all variables existed independently of each other. The predictive model characteristics (B) were the number of Reg LNs, N stage, T stage, chemotherapy, age, tumor size, and primary site, in order of importance.

Table 2

Table 2. Multifactorial analysis of moderately differentiated distant metastatic gastric adenocarcinoma.

3.3 Establishment of predictive models for distant metastasis

This research used nine distinct machine learning models individually to construct a distant metastasis prediction model for MDGA patients. The built models were trained with data from the training set. The symptoms were finely tuned to stabilize the models and prevent them from overfitting.

Table 3 and Figure 4A present the evaluation standards for each model comparison, including ROC curves, specificity, sensitivity, accuracy, recall, and F1 score. Based on the comparison results, it is concluded that the random forest model has the highest predictive value. Its AUC (0.913), specificity (0.891), and accuracy (0.880) were the best among the nine models. The results in the testing set verified this point again. The AUC of the ROC curve for the RF model was 0.848 (Figure 4B), which was noticeably superior to those of the other eight models. Finally, the RF models were externally validated using our 109 hospital patients (AUC = 0.728) (Figure 4C). We also made an aggregation of the previous ROC curves (Figure 4D). In summary, we trained eight machine learning prediction models with data from the training set. Through the experimental results of the test set and validation set, it was determined that the RF model has a relatively accurate ability to predict the risk of DM in MDGA patients and has high clinical value.

Table 3

Table 3. Comparison of the predictive performance values of nine forecasting models in the training set.

Figure 4

Figure 4. Receiver operating characteristic (ROC) curves for the training set, test set, and external validation set prediction models. (A) Training set; (B) test set; (C) external validation set. The aggregation of the previous ROC curves for the RF model (D). AUC, area under the ROC curve; RF, random forest; LR, logistic regression; LASSO, least absolute shrinkage and selection operator; SVM, support vector machine; KNN, K-nearest neighbor; NBC, naive Bayes classifier; ANN, artificial neural network.

4 Prognostic analysis and prediction of MDGA patients with established DM

4.1 Patient baseline characteristics

The 749 eligible MDGA patients with DM were randomized into two groups in the same 7:3 split. The training set included 524 patients, while the testing set included 225 patients. In terms of clinical characteristics, 40–60 years of age was the most common age for distant metastasis according to the MDGA (55.41% according to the SEER data), and the highest proportion of distant metastases according to the MDGA originated in cardia (42.86% according to the SEER data). Descriptions of other clinical characteristics are summarized in the accompanying table (Table 4). The results suggested that no statistically significant differences were found between the basic information of the two datasets.

Table 4

Table 4. Basic information about MDGA patients with DM.

4.2 Analysis of prognosis-related factors

Using OS and CSS as prognostic endpoints, we performed univariate and multivariate Cox regression analyses on data from eligible patients screened from the SEER database. Nine variables were included in the univariate analysis, and the detailed results are shown in the left half of Tables 5 and 6. Afterward, according to the outcome, statistically significant variables were included in the multivariate analyses.

Table 5

Table 5. Cox regression analysis of OS in the SEER cohort.

Table 6

Table 6. Cox regression analysis of CSS according to the SEER data.

The Cox regression results for OS are shown in Table 5. The detailed outcomes suggested that age, T stage, primary site, chemotherapy, radiation, and the number of Reg LNs were correlated with OS in MDGA patients. Multifactorial analysis for OS revealed that only T stage (2 and 3), primary site, chemotherapy, and number of Reg LNs were statistically significant independent risk factors for MDGA patients with established DM. Moreover, patients with higher T stages (T3 and 4) and without chemotherapy had significantly shorter survival and worse outcomes. Patients with superficial primary sites (gastric antrum and greater curvature) and a greater number of Reg LNs could have improved outcomes. More comprehensive OS analysis information, such as the analytical CIs and P-values for each variable, is collated and displayed in Table 5.

The outcome of the Cox regression analysis using CSS as the endpoint is presented in Table 6. The primary site, number of Reg LNs, age, T stage, chemotherapy, and radiotherapy variables were integrated into the multifactorial analysis. The analysis indicated that the number of Reg LNs, T stage, primary site, and chemotherapy were considered statistically significant independent risk factors for CSS. The CIs and the corresponding P-values are summarized in Table 6.

4.3 Nomogram

According to the outcomes obtained in the previous steps, this study developed a visual nomogram to predict the survival time of MDGA patients with established DM. The nomogram, derived from the prognostically relevant risk factors that have been identified, provides a score based on the patient’s current condition. Physicians can assess a patient’s probability of 1-, 3-, and 5-year OS/CSS based on this nomogram (Figure 5). According to the OS nomogram (Figure 5A), of the five independent risk factors, chemotherapy had the greatest impact on survival, followed by the primary tumor site, while T stage had the least impact. According to the CSS nomogram (Figure 5B), the presence or absence of chemotherapy was considered to be the most influential factor for survival, followed by the lymph node positivity rate.

Figure 5

Figure 5. Nomograms for 1-, 3-, and 5-year OS (A) and CSS (B) in MDCA patients with DM.

A simple example of how to use a nomogram is given below. Suppose a 60-year-old patient with distant metastases from MDGA has received conventional chemotherapy but no radiotherapy. At the same time, the pathological findings suggest that the tumor originated in the greater curvature, the current T stage is 3, and the number of regional LNs reaches more than four. At this point, an approximate score can be calculated based on the nomogram (age, 17.5 points; T stage 3, 2 points; primary site, 17 points; number of regional LNs, 11 points; received chemotherapy, 0; not received radiotherapy, 18 points). This hypothetical patient would therefore have a total score of 65.5, and this score was plotted against a scale of total scores. By plotting vertically on a straight line of survival probability, one can derive the probability that the overall survival available for reference is approximately 0.78, 0.55, and 0.45 for 1, 3, and 5 years, respectively. Similarly, the corresponding CSS for this patient can be calculated using the same steps as above.

4.4 Evaluation and validation of the nomograms

The predictive results and clinical value of the nomograms were assessed and verified by the C-index, AUC, calibration curve, and DCA. In the training set, the AUC values for predicting 1-, 3-, and 5-year OS were 0.797, 0.807, and 0.737, respectively (Figure 6A), while in the validation set, they were 0.757, 0.737, and 0.718, respectively (Figure 6B). The C-index of the nomograms was 0.726 (95% CI, 0.703–0.748) in the training set for OS and 0.703 (95% CI, 0.663–0.744) in the validation set. The fit of the 1-, 3-, and 5-year calibration curves for predicting OS was satisfactory (Figures 6C–H). In the calibration curves of the nomograms in the training and validation cohorts, the red curve fit line matches the gray diagonal line (representing the predicted probability of the ideal state) to a high degree, suggesting that the predicted probability of survival via the nomograms remains generally consistent with the observed probability of survival, with no excessive overestimation or underestimation of risk. The DCA curve presented graphically in Figures 6I–N suggests that this nomogram for OS has excellent net clinical efficacy.

Figure 6

Figure 6. Evaluation of the ability of the nomogram to predict OS. ROC curves validating the OS prediction nomogram for 1-, 3-, and 5-year OS in the training set (A) and validation set (B). Calibration curves validating the OS prediction nomograms for 1-, 3-, and 5-year OS in the training set (C–E) and validation set (F–H). Decision curve analysis validating the OS prediction nomogram for 1-, 3-, and 5-year OS in the training set (I–K) and validation set (L–N).

Similarly, the results for evaluating the CSS nomograms show a positive applicability. The C-index was 0.727 (95% CI, 0.703–0.751) for the training set and 0.705 (95% CI, 0.663–0.748) for the validation set. In addition, the AUCs of the nomograms were 0.747, 0.737, and 0.699 for 1-, 3-, and 5-year CSS, respectively, in the training cohort (Figure 7A), and in the validation cohort, the AUCs were 0.661, 0.713, and 0.892, respectively (Figure 7B). Moreover, both the calibration curves and DCA curves used for the 1-, 3-, and 5-year CSS forecasts also exhibited satisfactory fits and net gains (Figures 7C–N). In summary, the nomograms produced to predict the prognosis of MDGA patients with DM had considerable discriminatory and calibrating power.

Figure 7

Figure 7. Evaluation of the ability of the nomogram to predict CSS. ROC curves validating the CSS prediction nomogram for 1-, 3-, and 5-year RFS in the training set (A) and validation set (B). Calibration curves validating the CSS prediction nomograms for 1-, 3-, and 5-year survival in the training cohort (C–E) and validation cohort (F–H). Decision curve analysis validating the CSS prediction nomogram for 1-, 3-, and 5-year RFS in the training set (I–K) and validation set (L–N).

5 Analysis of the impact of more detailed LN indicators on the occurrence of DM and prognosis of MDGA

The above studies have suggested a strong association between multiple lymph node indices and DM and the prognosis of MDGA. Although good predictive efficacy can be achieved by categorizing the number of positive LNs (0, 1 to 3, 3+), 70% of patients in the database had a positive lymph node clearance of 0. This suggests that the existing lymph node indices may not describe a patient’s prognosis specifically; thus, more diversified ways of evaluating the metastasis and immune mechanisms of patients are needed. Lymph node positivity, the specific number of negative/positive lymph nodes, and visualization of LN metastasis may be better indicators of DM risk and survival; therefore, we collected more detailed data from our institution and performed a logistic analysis to identify risk factors associated with DM.

We collected data from 109 patients with moderately differentiated gastric adenocarcinoma in our hospital. Data such as LODDS and the number of Neg LNs were analyzed and calculated, followed by logistic regression to explore the risk factors for distant metastasis in patients with MDGA and Cox regression to analyze the risk factors affecting the prognosis of patients with MDGA. Our univariate logistic regression results showed that the number of negative LNs and the LODDS were considered to be influential factors for the occurrence of DM in MDGA (Table 7). However, it is noteworthy that in regard to our multifactor logistic regression analysis of the variables of interest, our results lacked statistical significance. We conducted single- and multivariate Cox analyses of our patient data. As shown in Table 8, 15 variables were included. The results of the univariate analysis revealed that nine variables, including the number of Reg LNs, LNR, age >80 years, TNM stage, tumor size, gross LN metastasis, and number of Reg LNs, had an impact on the prognosis of MDGA patients (P < 0.05). These findings were subsequently incorporated into a multifactorial analysis, which indicated that the LNR, T stage (1 and 2), and gross LN metastasis 3 cm away from the tumor were independent risk factors, whereas the number of Reg LNs and the number of Reg LNs in groups 1–3 were considered protective factors. More specific data are shown in Tables 7 and 8.

Table 7

Table 7. The risk factors for developing DM in MDGA patients were analyzed by logistic regression based on our data.

Table 8

Table 8. Cox regression analysis of risk factors affecting patient OS based on our data.

6 Discussion

Moderately differentiated gastric adenocarcinoma is common in clinical practice and has a high risk of metastasis and individual variability (34). Once a patient develops DM, the prognosis becomes extremely poor (35, 36). The OS of MDGA patients without DM is generally considered to be more than 22.3 months after surgical treatment (37). However, after the onset of DM, survival decreases in patients receiving conventional chemotherapy, with a median overall survival of just under 12 months (22, 38). Determining whether a patient has distant metastases is therefore particularly important and is vital for providing individualized prevention and treatment strategies in the clinic. In addition, the current prognostic method for patients with DM is relatively limited, and some DM-related indices, especially lymph node indices such as the LNR and LODDS, are considered to be important indicators of prognosis (39, 40). However, its specific clinical effects have still not been extensively and comprehensively tested.

Our major objectives for the investigation were to develop a forecasting system to predict the development of DM in persons with MDGA and to analyze the risk factors influencing the prognosis of persons with DM. In addition, this study analyzed the specific ability of six lymph node indicators in our patients to predict DM and prognosis using logistic and Cox regression. Nine machine learning samples were utilized for predicting distant metastases, with the RF model considered the most effective. Multivariate Cox regression analysis for MDGA patients who already had DM indicated that higher T stage (2 and 3), primary site, chemotherapy, and number of Reg LNs were independent risk factors for prognosis. Moreover, specialized nomograms created from our analysis results were evaluated and tested to show convincing prognostic discrimination and calibration capabilities.

Notably, the categorization of the number of positive LNs (0, 1 to 3, 4 or more) in the SSER database may achieve good predictive efficacy. However, nearly 70% of the patients in the database had a positive lymph node clearance of 0. This suggests that our single reliance on lymph node clearance results may not be effective in characterizing the prognosis of patients, and more diverse classifications and metrics are needed to evaluate a patient’s metastatic and prognostic condition. In this study, based on our patient data, we revealed that the LNR, gross LN metastasis, and the number of Reg LNs were found to be independent factors influencing the prognosis of MDGA patients with DM.

It is worth noting that the AUC of the validation set in this paper is generally slightly lower than that of the training set, which is a relatively common phenomenon, and the possible reasons are that the training set adopts the U.S. population samples from the SEER database, and extrapolation is not strong enough in the Chinese population, or the sample size of the validation set is not sufficient. In future research, we will further consider the extrapolation of the population and the adequacy of the samples to deepen and improve the prediction ability of the validation set.

Nevertheless, it must be noted that the research has been limited by its retrospective nature. Although the SEER database is very detailed and reliable, there are some more exhaustive data that it is unable to provide (41). For example, data on some noteworthy laboratory tests were not included, and some of the more nuanced pictures of the lymph nodes, as previously mentioned, were lacking. Furthermore, for the practical application of the nomogram, additional clinical information must be considered, including the ethnicity of the patient, their geographical location, and other pertinent factors. These data, which are absent from the database and not included in the study, have an impact on the results, and more information is required to enhance the nomogram. For our data, because of the sample size and other reasons, it is not as effective as it should be in carrying out some statistical studies, and in the future, it is necessary to collect more case and patient information for more in-depth analysis and studies.

7 Conclusion

In conclusion, this research investigated the variables linked to the development of DM in MDGA, including T stage, N stage, primary site, tumor size, number of positive LNs, and chemotherapy. Then, we investigated the prognostic factors, including T stage, primary location, chemotherapy, and number of Reg LNs, in MDGA patients with DM. Additionally, based on the prognostic analysis, separate nomograms of OS and CSS were produced for relevant influencing factors. Finally, the effect of multiple lymph node indicators on the metastasis and prognosis of MDGA patients was investigated. This study provides a reference for subsequent clinical studies and further suggests the importance of lymph node indicators.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by The Ethics Committee of the Second Affiliated Hospital of Nanchang University (examination and approval no. review (2019) no. (105)). The studies were conducted in accordance with the local legislation and institutional requirements. The human samples used in this study were acquired from primarily isolated samples as part of your previous study for which ethical approval was obtained. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author contributions

KY: Writing – original draft. JW: Writing – original draft. TX: Writing – original draft. YZ: Writing – original draft. WL: Writing – original draft. LY: Writing – review & editing.

Funding

The present research is funded by grants from the National Natural Science Foundation of China (81960103 to LY).

Acknowledgments

The graphical abstracts were created with BioRender software (BioRender.com).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2024.1398685/full#supplementary-material

References

1. Lin Y, Zheng Y, Wang HL, Wu J. Global patterns and trends in gastric cancer incidence rates (1988-2012) and predictions to 2030. Gastroenterology. (2021) 161:116–27.e8. doi: 10.1053/j.gastro.2021.03.023

PubMed Abstract | Crossref Full Text | Google Scholar

2. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J Clin. (2021) 71:209–49. doi: 10.3322/caac.21660

PubMed Abstract | Crossref Full Text | Google Scholar

3. López Sala P, Leturia Etxeberria M, Inchausti Iguíñiz E, Astiazaran Rodríguez A, Aguirre Oteiza MI, Zubizarreta Etxaniz M. Gastric adenocarcinoma: A review of the TNM classification system and ways of spreading. Radiologia. (2023) 65:66–80. doi: 10.1016/j.rxeng.2022.10.011

PubMed Abstract | Crossref Full Text | Google Scholar

4. Nagtegaal ID, Odze RD, Klimstra D, Paradis V, Rugge M, Schirmacher P, et al. The 2019 WHO classification of tumours of the digestive system. Histopathology. (2020) 76:182–8. doi: 10.1111/his.13975

PubMed Abstract | Crossref Full Text | Google Scholar

5. Xiang Y, Yao L. Retrospective analysis of diagnosis and treatment of gastric cancer at Huzhou central hospital. Altern Ther Health Med. (2023) 29:302–9.

PubMed Abstract | Google Scholar

6. Liao JB, Lee HP, Fu HT, Lee HS. Assessment of EGFR and ERBB2 (HER2) in gastric and gastroesophageal carcinomas: EGFR amplification is associated with a worse prognosis in early stage and well to moderately differentiated carcinoma. Appl Immunohistochem Mol Morphol. (2018) 26:374–82. doi: 10.1097/PAI.0000000000000437

PubMed Abstract | Crossref Full Text | Google Scholar

7. Li H, Zhang YC, Tsuchihashi Y. Invasion and metastasis of SY86B human gastric carcinoma cells in nude mice. Jpn J Cancer Res. (1988) 79:750–6. doi: 10.1111/j.1349-7006.1988.tb02232.x

PubMed Abstract | Crossref Full Text | Google Scholar

8. Lord AC, D'Souza N, Shaw A, Rokan Z, Moran B, Abulafi M, et al. MRI-diagnosed tumor deposits and EMVI status have superior prognostic accuracy to current clinical TNM staging in rectal cancer. Ann Surg. (2022) 276:334–44. doi: 10.1097/SLA.0000000000004499

PubMed Abstract | Crossref Full Text | Google Scholar

9. Guan WL, He Y, Xu RH. Gastric cancer treatment: recent progress and future perspectives. J Hematol Oncol. (2023) 16:57. doi: 10.1186/s13045-023-01451-3

PubMed Abstract | Crossref Full Text | Google Scholar

10. Lin Z, Wang R, Zhou Y, Wang Q, Yang CY, Hao BC, et al. Prediction of distant metastasis and survival prediction of gastric cancer patients with metastasis to the liver, lung, bone, and brain: research based on the SEER database. Ann Trans Med. (2022) 10:16. doi: 10.21037/atm-21-6295

Crossref Full Text | Google Scholar

11. Young JJ, Pahwa A, Patel M, Jude CM, Nguyen M, Deshmukh M, et al. Ligaments and lymphatic pathways in gastric adenocarcinoma. Radiographics: Rev Publ Radiological Soc North America Inc. (2019) 39:668–89. doi: 10.1148/rg.2019180113

Crossref Full Text | Google Scholar

12. Lim JS, Yun MJ, Kim MJ, Hyung WJ, Park MS, Choi JY, et al. CT and PET in stomach cancer: preoperative staging and monitoring of response to therapy. Radiographics: Rev Publ Radiological Soc North America Inc. (2006) 26:143–56. doi: 10.1148/rg.261055078

Crossref Full Text | Google Scholar

13. Kwee RM, Kwee TC. Modern imaging techniques for preoperative detection of distant metastases in gastric cancer. World J Gastroenterol. (2015) 21:10502–9. doi: 10.3748/wjg.v21.i37.10502

PubMed Abstract | Crossref Full Text | Google Scholar

14. Lordick F, Carneiro F, Cascinu S, Fleitas T, Haustermans K, Piessen G, et al. Gastric cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann oncology: Off J Eur Soc Med Oncol. (2022) 33:1005–20. doi: 10.1016/j.annonc.2022.07.004

Crossref Full Text | Google Scholar

15. Xiang R, Song W, Ren J, Wu J, Fu J, Fu T. Identification of stem cell-related subtypes and risk scoring for gastric cancer based on stem genomic profiling. Stem Cell Res Ther. (2021) 12:563. doi: 10.1186/s13287-021-02633-x

PubMed Abstract | Crossref Full Text | Google Scholar

16. Pai RK, Hartman D, Schaeffer DF, Rosty C, Shivji S, Kirsch R, et al. Development and initial validation of a deep learning algorithm to quantify histological features in colorectal carcinoma including tumour budding/poorly differentiated clusters. Histopathology. (2021) 79:391–405. doi: 10.1111/his.14353

PubMed Abstract | Crossref Full Text | Google Scholar

17. Liu X, Zhang D, Liu Z, Li Z, Xie P, Sun K, et al. Deep learning radiomics-based prediction of distant metastasis in patients with locally advanced rectal cancer after neoadjuvant chemoradiotherapy: A multicentre study. EBioMedicine. (2021) 69:103442. doi: 10.1016/j.ebiom.2021.103442

PubMed Abstract | Crossref Full Text | Google Scholar

18. Tian H, Liu Z, Liu J, Zong Z, Chen Y, Zhang Z, et al. Application of machine learning algorithm in predicting distant metastasis of T1 gastric cancer. Sci Rep. (2023) 13:5741. doi: 10.1038/s41598-023-31880-6

PubMed Abstract | Crossref Full Text | Google Scholar

19. Allesøe RL, Thompson WK, Bybjerg-Grauholm J, Hougaard DM, Nordentoft M, Werge T, et al. Deep learning for cross-diagnostic prediction of mental disorder diagnosis and prognosis using danish nationwide register and genetic data. JAMA Psychiatry. (2023) 80:146–55. doi: 10.1001/jamapsychiatry.2022.4076

PubMed Abstract | Crossref Full Text | Google Scholar

20. van der Vliet R, Selles RW, Andrinopoulou ER, Nijland R, Ribbers GM, Frens MA, et al. Predicting upper limb motor impairment recovery after stroke: A mixture model. Ann Neurol. (2020) 87:383–93. doi: 10.1002/ana.25679

PubMed Abstract | Crossref Full Text | Google Scholar

21. Poldrack RA, Huckins G, Varoquaux G. Establishment of best practices for evidence for prediction: A review. JAMA Psychiatry. (2020) 77:534–40. doi: 10.1001/jamapsychiatry.2019.3671

PubMed Abstract | Crossref Full Text | Google Scholar

22. Shitara K. Chemotherapy for advanced gastric cancer: future perspective in Japan. Gastric cancer: Off J Int Gastric Cancer Assoc Japanese Gastric Cancer Assoc. (2017) 20:102–10. doi: 10.1007/s10120-016-0648-7

Crossref Full Text | Google Scholar

23. Li X, Wang W, Ruan C, Wang Y, Wang H, Liang X, et al. Age-specific impact on the survival of gastric cancer patients with distant metastasis: an analysis of SEER database. Oncotarget. (2017) 8:97090–100. doi: 10.18632/oncotarget.21350

PubMed Abstract | Crossref Full Text | Google Scholar

24. Yeoh KG, Tan P. Mapping the genomic diaspora of gastric cancer. Nat Rev Cancer. (2022) 22:71–84. doi: 10.1038/s41568-021-00412-7

PubMed Abstract | Crossref Full Text | Google Scholar

25. Li H, Wang C, Lan L, Behrens A, Tomaschko M, Ruiz J, et al. High expression of vinculin predicts poor prognosis and distant metastasis and associates with influencing tumor-associated NK cell infiltration and epithelial-mesenchymal transition in gastric cancer. Aging (Albany NY). (2021) 13:5197–225. doi: 10.18632/aging.202440

PubMed Abstract | Crossref Full Text | Google Scholar

26. Guo S, Shang MY, Dong Z, Zhang J, Wang Y, Zheng ZC, et al. Clinicopathological features and prognostic analysis of signet ring cell gastric carcinoma: a population-based study. Transl Cancer Res. (2019) 8:1918–30. doi: 10.21037/tcr.2019.09.06

PubMed Abstract | Crossref Full Text | Google Scholar

27. Zhu YF, Liu K, Zhang WH, Song XH, Peng BQ, Liao XL, et al. Is no. 12a lymph node dissection compliance necessary in patients who undergo D2 gastrectomy for gastric adenocarcinomas? A population-based retrospective propensity score matching study. Cancers (Basel). (2023) 15(3):749. doi: 10.3390/cancers15030749

PubMed Abstract | Crossref Full Text | Google Scholar

28. Matsushima J, Sato T, Yoshimura Y, Mizutani H, Koto S, Matsusaka K, et al. Clinical utility of artificial intelligence assistance in histopathologic review of lymph node metastasis for gastric adenocarcinoma. Int J Clin Oncol. (2023) 28:1033–42. doi: 10.1007/s10147-023-02356-4

PubMed Abstract | Crossref Full Text | Google Scholar

29. Zhou R, Zhang J, Sun H, Liao Y, Liao W. Comparison of three lymph node classifications for survival prediction in distant metastatic gastric cancer. Int J Surg. (2016) 35:165–71. doi: 10.1016/j.ijsu.2016.09.096

PubMed Abstract | Crossref Full Text | Google Scholar

30. Feng X, Hong T, Liu W, Xu C, Li W, Yang B, et al. Development and validation of a machine learning model to predict the risk of lymph node metastasis in renal carcinoma. Front Endocrinol (Lausanne). (2022) 13:1054358. doi: 10.3389/fendo.2022.1054358

PubMed Abstract | Crossref Full Text | Google Scholar

31. Wang X, Chen Y, Gao Y, Zhang H, Guan Z, Dong Z, et al. Predicting gastric cancer outcome from resected lymph node histopathology images using deep learning. Nat Commun. (2021) 12:1637. doi: 10.1038/s41467-021-21674-7

PubMed Abstract | Crossref Full Text | Google Scholar

32. Sun Y, Wu X, Lin H, Lu X, Huang Y, Chi P. Lymph node regression to neoadjuvant chemoradiotherapy in patients with locally advanced rectal cancer: prognostic implication and a predictive model. J Gastrointest Surg. (2021) 25:1019–28. doi: 10.1007/s11605-020-04566-x

PubMed Abstract | Crossref Full Text | Google Scholar

33. Gao P, Zhu T, Gao J, Li H, Liu X, Zhang X. Impact of examined lymph node count and lymph node density on overall survival of penile cancer. Front Oncol. (2021) 11:706531. doi: 10.3389/fonc.2021.706531

PubMed Abstract | Crossref Full Text | Google Scholar

34. Feng F, Liu J, Wang F, Zheng G, Wang Q, Liu S, et al. Prognostic value of differentiation status in gastric cancer. BMC Cancer. (2018) 18:865. doi: 10.1186/s12885-018-4780-0

PubMed Abstract | Crossref Full Text | Google Scholar

35. Willmann J, Vlaskou Badra E, Adilovic S, Christ SM, Ahmadsei M, Mayinger M, et al. Distant metastasis velocity as a novel prognostic score for overall survival after disease progression following stereotactic body radiation therapy for oligometastatic disease. Int J Radiat Oncol Biol Phys. (2022) 114:871–82. doi: 10.1016/j.ijrobp.2022.06.064

PubMed Abstract | Crossref Full Text | Google Scholar

36. Chen J, Wu L, Zhang Z, Zheng S, Lin Y, Ding N, et al. A clinical model to predict distant metastasis in patients with superficial gastric cancer with negative lymph node metastasis and a survival analysis for patients with metastasis. Cancer Med. (2021) 10:944–55. doi: 10.1002/cam4.v10.3

PubMed Abstract | Crossref Full Text | Google Scholar

37. Lazăr D, Tăban S, Sporea I, Dema A, Cornianu M, Lazăr E, et al. Gastric cancer: correlation between clinicopathological factors and survival of patients. II. Rom J Morphol Embryol. (2009) 50:185–94.

Google Scholar

38. Sato Y, Okamoto K, Kawano Y, Kasai A, Kawaguchi T, Sagawa T, et al. Novel biomarkers of gastric cancer: current research and future perspectives. J Clin Med. (2023) 12(14):4646. doi: 10.3390/jcm12144646

PubMed Abstract | Crossref Full Text | Google Scholar

39. Spolverato G, Capelli G, Mari V, Lorenzoni G, Gregori D, Poultsides G, et al. Very early recurrence after curative-intent surgery for gastric adenocarcinoma. Ann Surg Oncol. (2022) 29:8653–61. doi: 10.1245/s10434-022-12434-y

PubMed Abstract | Crossref Full Text | Google Scholar

40. Kim J, Park J, Park H, Choi MS, Jang HW, Kim TH, et al. Metastatic lymph node ratio for predicting recurrence in medullary thyroid cancer. Cancers (Basel). (2021) 13(22):5842. doi: 10.3390/cancers13225842

PubMed Abstract | Crossref Full Text | Google Scholar

41. Charlton ME, Kahl AR, McDowell BD, Miller RS, Komatsoulis G, Koskimaki JE, et al. Cancer registry data linkage of electronic health record data from ASCO's cancerLinQ: evaluation of advantages, limitations, and lessons learned. JCO Clin Cancer Inform. (2022) 6:e2100149. doi: 10.1200/CCI.21.00149

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: moderately differentiated gastric adenocarcinoma, prognosis, nomogram, lymph node indicators, distant metastasis, machine learning

Citation: Yang K, Wu J, Xu T, Zhou Y, Liu W and Yang L (2024) Machine learning to predict distant metastasis and prognostic analysis of moderately differentiated gastric adenocarcinoma patients: a novel focus on lymph node indicators. Front. Immunol. 15:1398685. doi: 10.3389/fimmu.2024.1398685

Received: 10 March 2024; Accepted: 29 August 2024;
Published: 19 September 2024.

Edited by:

Petar Ozretić, Rudjer Boskovic Institute, Croatia

Reviewed by:

Somayah Abdullah Albaradei, King Abdullah University of Science and Technology, Saudi Arabia
Arkady Bedzhanyan, Petrovsky National Research Center of Surgery, Russia

Copyright © 2024 Yang, Wu, Xu, Zhou, Liu and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Liang Yang, bmN1X3lhbmdsaWFuZ0AxNjMuY29t

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.