- 1Department of Gastroenterology, Affiliated Hospital of Jiangsu University, Jiangsu University, Zhenjiang, China
- 2Department of Cardiology, Sixth Medical Center, PLA General Hospital, Beijing, China
- 3Department of Cell Biology, School of Medicine, Jiangsu University, Zhenjiang, China
- 4Faculty of Medicine, University of Debrecen, Debrecen, Hungary
- 5Department of Pathology, Affiliated Hospital of Jiangsu University, Jiangsu University, Zhenjiang, China
- 6Faculty of Dentistry, University of Debrecen, Debrecen, Hungary
- 7Department of Clinical Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
- 8Faculty of Chinese Medicine, Nanchang Medical College, Nanchang, China
- 9Department of Oral and Maxillofacial-Head and Neck Oncology, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, College of Stomatology, Shanghai Jiao Tong University, Shanghai, China
- 10National Center for Stomatology and National Clinical Research Center for Oral Diseases, Shanghai JiaoTong University, Shanghai, China
- 11Shanghai Key Laboratory of Stomatology, Shanghai JiaoTong University, Shanghai, China
Background: Gastric cancer (GC) represents a malignancy with a multi-factorial combination of genetic, environmental, and microbial factors. Targeting lysosomes presents significant potential in the treatment of numerous diseases, while lysosome-related genetic markers for early GC detection have not yet been established, despite implementing this process by assembling artificial intelligence algorithms would greatly break through its value in translational medicine, particularly for immunotherapy.
Methods: To this end, this study, by utilizing the transcriptomic as well as single cell data and integrating 20 mainstream machine-learning (ML) algorithms. We optimized an AI-based predictor for GC diagnosis. Then, the reliability of the model was initially confirmed by the results of enrichment analyses currently in use. And the immunological implications of the genes comprising the predictor was explored and response of GC patients were evaluated to immunotherapy and chemotherapy. Further, we performed systematic laboratory work to evaluate the build-up of the central genes, both at the expression stage and at the functional aspect, by which we could also demonstrate the reliability of the model to guide cancer immunotherapy.
Results: Eight lysosomal-related genes were selected for predictive model construction based on the inclusion of RMSE as a reference standard and RF algorithm for ranking, namely ADRB2, KCNE2, MYO7A, IFI30, LAMP3, TPP1, HPS4, and NEU4. Taking into account accuracy, precision, recall, and F1 measurements, a preliminary determination of our study was carried out by means of applying the extra tree and random forest algorithms, incorporating the ROC-AUC value as a consideration, the Extra Tree model seems to be the optimal option with the AUC value of 0.92. The superiority of diagnostic signature is also reflected in the analysis of immune features.
Conclusion: In summary, this study is the first to integrate around 20 mainstream ML algorithms to construct an AI-based diagnostic predictor for gastric cancer based on lysosomal-related genes. This model will facilitate the accurate prediction of early gastric cancer incidence and the subsequent risk assessment or precise individualized immunotherapy, thus improving the survival prognosis of GC patients.
Introduction
Given the characteristics of gastric cancer itself, which is a malignant tumor caused by a combination of genetic, site-specific, and microbiological factors, the overall prognosis of gastric cancer patients has not improved significantly (1, 2). Both the incidence and mortality rates of gastric cancer are among the highest of all malignancies today, and this situation is becoming increasingly alarming in Eastern Asia (3). Under the paradigm of accurate medicine, integrating the exploitation of multi-omics data and combining informatics technology with current traditional clinical screening tools at different levels to achieve more accurate and convenient early cancer screening is a challenge that scholars are currently committed to solving (2, 4, 5). Throughout this process, the exploration of new early diagnostic and prognostic biomarkers or preclinical diagnostic or rating models could benefit the current gastric cancer diagnosis and treatment by targeting prevention and personalized medical care.
In light of the emerging potential usefulness of lysosomes in the treatment of a wide range of malignancies and non-malignancies, such as neurological and cardiovascular diseases, targeting lysosomes presents a novel solution for the prevention of gastric cancer (6). Long thought to act as a caretaker or housekeeper in the context of the individual cell, it was mainly due to its role as a static organelle and its initially perceived function, degradation, which was not related to the altered state of the unit (7). Nevertheless, the continuing pursuit of microscopic perspectives, combined with multi-omics and bioinformatics methodologies, allows for a clearer understanding that lysosomes are far from being isolated islands from other cell organelles (8). More generally, from the understanding that it is a participant in intracellular homeostasis, lysosomes were shown to impact metabolic signaling, cell proliferation and differentiation, immune responses, and other procedures (9). Additionally, the tight association with autophagy allows it to be likewise engaged in diverse modes of cell death, such as ferroptosis, autophagy-dependent cell death, apoptosis, and pyroptosis (10, 11). Considering the crucial role of lysosomes in the progression of various human diseases as well as their prevalence, the value of lysosomes in translational medicine could likewise be maximized by integrating multi-omics data in the era of precision medicine (12).
Therefore, we attempted to integrate and compare more than 20 mainstream classifying algorithms in machine learning to identify the most ideal diagnostic model for STAD, and subsequent insights involving the tumor immune microenvironment and drug sensitivity revealed the potential immunotherapeutic applicability of our lysosomal gene model. In addition, a series of validation of the model constructed genes for IFI30, including expression validation in different dimensions and functional assays, have more or less confirmed that the model constructed genes themselves serve as negligible risk factors for GC (Figure 1).
Materials and methods
Data collection and processing
In the present study, we retrospectively collected the lysosome-related genes from the Msigdb (https://www.gsea-msigdb.org/gsea/msigdb/cards/LYSOSOME) repository and the transcriptomic data with matching clinical information of Stomach Adenocarcinoma (STAD) from the TCGA (https://www.cancer.gov/tcga) cohort (13). All the data involved in the present study were processed by R Foundation and Python software and randomly divided into the training set and the validation set in a ratio of 0.8. Notably, if not specified, a P-value<0.05 is considered statistically significant and might be annotated as * within the figures. Moreover, **, ***, and **** might appear within the figures to indicate the P-value thresholds 0.01, 0.001, and 0.0001, respectively.
Feature gene selection prior to diagnostic model construction
We first used the Recursive Feature Elimination (REF) approach to determine the optimal number of feature genes for model construction (14). Then, we applied Random Forest (RF) algorithm to rank their significance from the highest to the lowest (15). Selected top feature genes were further utilized for model construction.
Machine learning
According to the “No Free Lunch” theorem, we exhaustively ran out of 20 mainstream machine-learning algorithms to find the most ideal diagnostic model for STAD (16). The algorithms include Linear Regression, Ridge Regression, RidgeCV, Linear LASSO, LASSO, ElasticNet, BayesianRidge, Logistic Regression, SGD, SVM, KNN, Naive Bayes, Decision Tree, Bagging, Random Forest, Extra Tree, AdaBoost, GradientBoosting, Voting, and ANN. Their performances were mainly assessed by the diagnostic Receiver Operative Characteristic (ROC) curves in which the Area Under Curve (AUC) represented the predictive power, but we also considered other parameters for evaluation (i.e., accuracy, precision, recall, and F1 measurement) to ensure our criteria were rigorous enough. The greater the AUC value indicated the better accuracy and robustness of the model. All the aforementioned curves were created by the Python package “sklearn”.
Learning curve
A learning curve is a graphical depiction of the connection between proficiency and experience. Through machine learning, artificial intelligence mimicked human behavior, in essence. Therefore, by visualizing its learning process in the training set and its predictive performance in the validation set, we would be able to see if the model worked robustly in a direct manner.
Decision curve analysis
Usually, prognostic models and diagnostic tests are mathematically evaluated with measures of accuracy that do not consider clinical outcomes. To overcome this disadvantage, DCA, which is often used to compare the efficacy of different predictive models to maximize the clinical benefits when false positives and false negatives are inevitable, was introduced into the present study (17, 18).
Single-sample gene set enrichment analysis
It is an extension of Gene Set Enrichment Analysis (GSEA) that produces distinct enrichment scores for every possible pairing of a sample and gene set (19, 20). Each ssGSEA score would show the extent to which the genes in a certain gene set are coordinately up- or down-regulated within an individual sample. In the present study, we defined the sum of the ssGSEA score in focus on the feature genes selected from the above as Lysosomal Index (LI). All the TCGA samples were allocated into high- and low-LI groups in the following bioinformatic analytics.
Functional enrichment analysis
Traditionally, the statistical principle of enrichment analysis is to use hypergeometric distribution to test the significance of a certain functional class in a group of genes. In recent years, scientists also tended to perform such analytics at a gene set level, treating numerous genes of interest as a whole. In the present study, the functional enrichment analysis was carried out not in both ways.
Analysis of the tumor microenvironment
The R package “ESTIMATE” was employed to calculate the scores of stromal cells, infiltrating immune cells, and tumor purity on the basis of gene expression. In this way, we unraveled the in-depth correlation between the LI and the surroundings of the malignant cells. The immune cell infiltration analysis was done by the CIBERSORT algorithm which is a widely used immunoinformatic tool to uncover the immunological implications of various diseases nowadays (21).
Cell culturing
Human gastric mucosal epithelial cells GES-1 and human gastric cancer cell lines SGC-7901, BGC-823, MGC-803, and HGC-27, which were identified by DNA typing of STR sequences, were purchased from Shanghai Institutes of Biological Sciences (CAS). The above cells were cultured at 37°C in a humidified incubator containing 5% CO2, cultured with DMEM (HyClone) mixed with 10% fetal bovine serum (Gibco, Carlsad, CA, USA), where the culture medium was changed once a day.
Real-time PCR
After extraction of total RNA using RNAiso Plus (Takara, Dalian, China), RT-qPCR was performed using Revertaid First Strand cDNA synthesis kit (Thermo Fisher Science, Waltham, MA, USA), which was conducted under the manufacturer’s protocol. An SYBR Green-based real-time fluorescent quantitative polymerase chain reaction was carried out using GAPDH as an internal reference, in which the primers involved were GAPDH-F: GGTGAAGGTCGGTGTGAACG; GAPDH-R: ZCTCGCTCCTGGAAGATGGTG, IFI30-F: GTGGGAGTTCAAGTGCCAGCAT; IFI30-R: GCAGACAATGGTCAGGAAGGCT. The aforementioned results were calculated by the 2-ΔΔCT method to get the relative fold change of RNA expression.
Western blot
The target cells subjected to cold PBS wash were treated with cell lysis buffer according to the reagent manufacturer’s recommendation. And the extracted proteins were electrophoresed and separated on SDS-PAGE gels as background, and then transferred to PVDF membranes. After closed fixation with 5% bovine serum albumin, they were incubated with primary antibodies overnight at 4°C, followed by incubation with secondary antibodies.
Immunofluorescence staining
The five cells mentioned above GES-1, SGC-7901, BGC-823, MGC-803, and HGC-27, were treated as recommended by the reagent supplier and incubated with IFI30 primary antibody, and after 12 h incubation with specific secondary antibody at 37°C in the dark for 2 h. After staining with DAPI at room temperature, fluorescent images were captured with a confocal microscope system.
Immunohistochemical staining
Pre-fixed and paraffin-embedded pathological material was cut to 5 mm width and dewaxed. Endogenous peroxidase was inactivated with 3% H2O2. The treated slices were incubated overnight at 4°C with the corresponding protein antibodies, then incubated with secondary antibodies for 30 min followed by final chromogenic color development with freshly prepared DAB reagent.
Transwell, invasion and wound healing assay
Transfected gastric cancer cells with high expression of IFI30 were inoculated in 6-well plates, and the cells were scratched manually to create scratches, which were placed under standard culture conditions for 48 h. Every 12 h, photographs were taken to observe the healing of the scratches. The concentration of 100,000 pretreated cells (100 μL) and the control cells were inoculated into transwell chambers, and the transwell assay and invasion assay were performed with or without matrix gel, after 24 h of cell growth, the cells were fixed with paraformaldehyde and stained with 0.05% crystalline violet for over 30 mins and then counted.
Statistical analysis
The bioinformatics of this study involved was operated by R software as well as python software, t-test and Kruskal-wallis test were performed for the evaluation of pairwise transcriptomic data and Pearson or Spearman methods were employed for the evaluation of the correlation tests involved, p-values < 0.05 (*P < 0.05) were considered significant, for which **, p<0.01, ***, p < 0.001, and ****, p < 0.0001.
Results
8 lysosome-related genes were chosen to construct the predictor
Based on the stratification of tumor samples and healthy controls, we exhaustively screened the differentially expressed genes (DEGs), within which the majority of lysosome-related genes were presented (Figure 2A). Then, by utilizing the Recursive Feature Elimination (REF) algorithm, it was observed that when the number of genes involved in model construction was less than 8, the Root Mean Square Deviation (RMSE) increased significantly. Meanwhile, when this number exceeded 8, the RMSE fluctuated in an acceptable range (Figure 2B). Therefore, it was determined that using 8 lysosome-related genes for predictor construction was the most ideal solution. To specify the top 8 candidate genes, we used Random Forest (RF) algorithm to rank their importance in Stomach Adenocarcinoma (STAD) diagnosis. As a result, ADRB2, KCNE2, MYO7A, IFI30, LAMP3, TPP1, HPS4, and NEU4 were chosen (Figure 2C). The inter-correlation analyses between these genes were also conducted to give references to their characterization in STAD and healthy controls (Supplementary S1). Additionally, we explored the difference in the enrichment of lysosome-related gene sets. Of note that lysosome and lysosomal membrane were among the most enriched items across both KEGG and Reactome databases (Figure 3A). The detailed expression level of each gene for the construction of the predictive model was shown in the manner of a box plot (Figure 3B). Aiming to gain a deeper appreciation of the above modeled genes, we probed the expression of these genes in the gastric cancer single cell dataset GSE167297 (Figures 3C, D). The above genes were all found to be expressed in clusters of cells, with LAMP3, IFI30, TPP1, KCNE2, ADRB2 being more significantly displayed. Among them, LAMP3, IFI30, TPP1, KCNE2 and ADRB2 were more significantly represented on functional cells such as DC cells, macrophages, endothelial cells and mast cells, suggesting more or less the reliability of the current model (Figure 3E).
Figure 2 The detailed process of feature gene selection. (A) Volcano plot demonstrating the up- and down-regulated genes in Stomach Adenocarcinoma (STAD). (B) Scree plot demonstrating the change of cross-validation Root Mean Square Deviation (RMSE) with different amounts of feature genes involved in the construction of diagnostic predictor. (C) Importance ranking by Random Forest (RF) algorithm. The top 8 genes were selected.
Figure 3 (A) Box plot demonstrating the enriched items with statistical significance. (B) Box plot demonstrating the expression of selected genes in the TCGA dataset. (C) Annotation of all cell types in GSE167297 and percentage of each cell type. (D) Illustrations of the percentage of cells in different samples. (E) Expression of ADRB2, KCNE2, MYO7A, IFI30, LAMP3, TPP1, HPS4, and NEU4 in diverse cells. *p< 0.05; ***, p < 0.001.
Extra tree was the most superior machine learning algorithm
To ensure a comprehensive comparison of the 20 mainstreamed machine-learning algorithms, we did not solely elucidate our models from ROC-AUC values, but in multiple aspects including accuracy, precision, recall, and F1 measurement. Regarding the accuracy, recall, and F1 measurement, Extra Tree and Random Forest were found to be the most well-performing models (Figures 4A, C, D), while for precision, except for Linear LASSO and LASSO, all the rest of models exerted quite satisfying predictions (Figure 4B). On the other hand, while all the models were holding a high ROC-AUC value of over 0.7, the general bar to consider a model was good enough in classifying questions, Extra Tree possessed a leading ROC-AUC value of up to 0.92, followed by Bagging and Naïve Bayes (Figure 4E). Then, we inspected the clinical benefits that the Extra Tree model could bring into real-world practice through the DCA curve. As indicated, the model offered betterment when compared with the treat-all and treat-none groups (Figure 4F). We also reviewed the learning process of the AI behind the model, for which it was visualized in the form of a learning curve (Figure 4G). Through the curve, it was observed that the learning score in the training set was stable and perfect. Overall, the difference between the training score and the testing score was less than 10%, therefore, it was deemed as a model with high generalizing ability.
Figure 4 Multifaceted evaluation of 20 mainstream machine-learning models. (A-D) Radar plots demonstrating accuracy, recall, and F1 measurement in the training set and test set, respectively. (E) Receiver Operative Curve (ROC) in which the Area Under Curve (AUC) value of each machine learning model was compared. In general, an AUC value over 0.7 was thought to be a good predictive performance. (F) Decision Curve Analysis (DCA) for the Extra Tree model. The Guilherme position of the curve, the greater the clinical benefits. (G) Learning curve of the Extra Tree model. The closer the learning and testing results, the more robustness the model possesses.
Patients with high- and low-lysosome index showed significant morphological changes in their external gastric appearance
Traditional enrichment analysis was performed to identify whether the enriched gene ontology (GO) and signaling pathways were distinguished between the high- and low-LI groups. The results according to the GO database indicated that cornification was the most distinguishable biological process, followed by digestion, keratinocyte differentiation, muscle contraction, and keratinization (Figure 5A). Notably, differences in cellular components including contractile fiber and cornified envelope were also prominent. In short, different LI seemingly raise morphological changes on the gastric surface. On the other hand, we found the most enriched pathways were not similar when the KEGG database and the Reactome database were applied separately. From the KEGG side, we observed that pathways relevant to secretion were outstanding, such as Pancreatic secretion, Bile secretion, Salivary secretion, Gastric acid secretion, and Insulin secretion (Figure 5B). However, through the Reactome database, the GO enrichment results were supported as the top pathways contained the Formation of the cornified envelope, Keratinization, Muscle contraction, and so on (Figure 5C). Under such circumstances, we further conducted a GSEA analysis to determine the secrets behind it. Again, changes in the epithelial morphology were seen in the GO enrichment results (Figure 5D), while the results of KEGG enrichment remained secretion-centered (Figure 5E). Moreover, consequently, Keratinization appeared again among the most enriched Reactome pathways (Figure 5F).
Figure 5 LI-based enrichment analysis. (A-C) Presentation of the top 10 differential pathways from GO (A), KEGG (B), and Reactome (C) enrichment analysis via the traditional method. (D-F) Results of GSEA analysis according to the GO (D), KEGG (E), and Reactome (F) databases, respectively.
Patients in the high- and low-LI groups possessed different tumor immunological microenvironment characteristics, predictive immunotherapy efficacy, and chemosensitivity
The R package “ESTIMATE” was used to elucidate the general appearance of TIME quantitatively, through which we found that except for the stromal score, the immune score, ESTIMATE score, and tumor purity were statistically significant and that higher immune and ESTIMATE scores were observed in the low-LI group than that of the high LI group (Figure 6A). Therefore, we also explored the abundance of infiltrating immune cells in each patient. As a result, although certain fluctuation was observed in their distribution, conclusively, it was thought that the immune cell infiltration was obvious regarding both LI groups (Figure 6B). To show the difference between the high- and low-LI groups, we revealed an informative but visually clear comparison as a heatmap, through which specific immune cells such as CD4 memory T cells, CD8 T cells, follicular T helper cells, regulatory T cells, Macrophages M0, M1, and M2, etc. were with statistical significance (Figure 6C). Furthermore, the TIDE algorithm was applied to predict the immunotherapy efficacy. Subsequently, we confirmed the close association of CD8 T cells, the main force against tumor malignancy, with LI groups (Figure 6D). Meanwhile, according to the explanation of the developers, higher TIDE scores are usually accompanied by poor immunotherapy efficacy. Through the results, it was found that the higher LI group corresponded with lower TIDE scores, hindering a potential advantage for the higher LI population to receive immunotherapy. Finally, we screened the possible drugs targeting LI genes from the authorized database Cancer Genome Project (CGP). Remarkable differences in the IC50 values of 8 drugs, including Cisplatin, Elesclomol, FMK, GSK1070916, GSK429286A, HG−5−113−01, T0901317, and Talazoparib were noticed between high- and low-LI groups (Figure 6E). Of note that the lower LI groups demonstrated reduced IC-50 values for all 8 drugs, which suggested that patients in the lower LI group were more sensitive to chemotherapy.
Figure 6 TIME characteristics, predictive immunotherapy efficacy, and chemosensitivity in high- and low-LI groups. (A) ESTIMATEScore, ImmuneScore, StromalScore, and tumor purity of the high- and low-LI groups. (B) Stacked graph demonstrating the abundance and distribution of the infiltrating immune cell in each sample. (C) Heatmap demonstrating the statistically significant infiltrating immune cells in high- and low-LI groups. (D-E) Box plots demonstrating the results of TIDE prediction of high- and low-LI groups (D) and the comparisons of chemosensitivity for each drug (E). Ns, p≥0.05; **, p<0.01; ***, p < 0.001.
Aberrant overexpression of IFI30 in gastric cancer impacts on tumor cell viability
To further confirm the high diagnostic efficacy of the model we constructed, after excluding the relatively well-studied genes in GC in previous literature, we decided to focus on IFI30 as a wet-lab validation. We examined the expression levels of the mRNA and protein in normal gastric mucosal epithelial tissues and four gastric cancer cell lines. At the transcriptional level, the results of RT-qPCR revealed that IFI30 was significantly higher expressed in tumorous cell lines than that in GSL-1 (Figure 7A). This was further verified by the results of Western Blot at the protein level (Figure 7B). Overall, the trend was consistent for both validations, where BGC-823 exhibited the highest IFI30 expression levels, followed by SGC-7901, HGC-27, and MGC-803. As a supplementary, such conclusions were also supported by our immunofluorescence assays (Figure 7C). The immunohistochemical staining of the 3 pairs of real patients’ samples together with their paracancerous tissues likewise supported the aforementioned conclusions (Figure 7D). Further to our study, the siRNA for IFI30 was designed and PCR assays verified that the current siRNA was able to attenuate the expression of IFI30 significantly (Figure 8A). When the expression of IFI30 in GC cells was attenuated, a significant decrease in migration and invasive ability was observed, suggesting that the genes involved in the model construction are a major risk factor for gastric cancer, regardless of the model itself (Figures 8B-H).
Figure 7 Validation of IFI30 as a potential diagnostic biomarker in GC. (A) Results of RT-qPCR of IFI30 in GES-1, BGC-823, SGC-7901, HGC-27, and MGC-803 cell lines, respectively. (B) Results of Western blot in GC cell lines. (C) IFS slides in 100X and 400X magnification demonstrated the expressional abundance of IFI30 in GC cell lines. (D) Immunohistochemical staining of the 3 pairs of real patients’ samples together with their para-cancerous tissues(20X). *p< 0.05; **, p<0.01; ***, p < 0.001.
Figure 8 (A) siRNA-IFI30 efficiently depresses the expression of IFI30. (B) Relative cell migration number of migration assay. (C) Relative cell invasion numbers in the invasion assay. (D) Relative scratch healing area of BGC-823 and SGC-7901. (E) Migration assay after reduction of IFI30 expression in BGC-823 and SGC-7901. (F) Invasion assay after reducing the expression of IFI30 in BGC-823 and SGC-7901. (G, H) Wound healing assay after reducing the expression of IFI30 in BGC-823 and SGC-7901. *p< 0.05; **, p<0.01; ***, p < 0.001.
Discussion
What is expected to change the plight of gastric cancer treatment is that early diagnosis of gastric cancer gives patients a 90% chance of survival, while advanced gastric cancer has less than a third chance of survival due to significant heterogeneity (22). Therefore, the early diagnosis of gastric cancer is still the field of efforts of many scholars, even more critical (22, 23). On the other hand, targeting lysosomes showed immense potential in the treatment of diseases ranging from malignancies, although current therapeutic tools are limited by the precise targeting of lysosomes and subsequent modulation measures (9). Indeed, both the process of autophagy itself and the autophagic lysosomal pathway has been the subject of a flurry of research in human cancers due to their potential as a treasure trove for deciphering diverse diseases (24). Autophagy represents a regulatory mechanism to sustain cellular dynamic homeostasis by degrading cellular components eliminated by a series of stresses like senescence or damaging (25, 26). The appropriate conduct of the autophagic system relies on the degradative capacity of lysosomes, a process that carries out in response to tolerance to cellular stress induced by starvation or proteotoxic aggregates (25, 27). When cells themselves undergo growth imbalance, or straightforwardly, cancer, the autophagy-lysosome pathway adapts in response to abnormal stress signals in the tumor microenvironment, thereby differentially affecting tumor progression, a process that involves key hallmarks such as immune infiltration and tumor metabolism and could either suppress tumors or contribute to cancer (26).
For gastric cancer, Tan et al. previously used analysis of genome-wide association study data to confirm that genetic variation in genes across the autophagic lysosomal pathway may be significantly associated with susceptibility to gastric cancer (24). More than data analytics, Kuang et al. made their attempts to develop a new lysosomal-targeted therapeutic agent for gastric cancer, significantly highlighting such possibilities (28). Meanwhile, machine learning methods are nowadays mainly applied to the processing of medical images of gastric cancer, including endoscopy, radiological imaging, and pathology techniques, from which radiomics and pathomics have been derived (29, 30). However, the exploitation of artificial intelligence for genomic information to develop a more complete study on which to establish a diagnostic prediction model for gastric cancer would be complementary to macroscopic features and would provide new personalized medicine opportunity for patients with gastric cancer.
Based on the assumptions described above, our study confirmed such a possibility. To begin with, we selected eight lysosomal-related genes for predictive model construction based on the inclusion of RMSE as a reference standard and RF algorithm for ranking, namely ADRB2, KCNE2, MYO7A, IFI30, LAMP3, TPP1, HPS4, and NEU4. The ADRB2 signaling pathway has been recognized in previous studies as being able to serve in gastric carcinogenesis and metastasis as a β- adrenergic stress activation and might involve autophagy in this process, along with being shown to act as a prognostically negative biomarker for gastric cancer (31–33). Multiple histological evidence proves that KCNE2 is expressed at lower levels in gastric cancer than in normal tissues, and its deficiency is likely to be a potential risk factor for gastric cancer (34, 35). Unlike the former two, the biological function of LAMP3 (CD208) is mainly through its influence on the tumor microenvironment of gastric cancer. In an earlier study, Ishigami et al. noted that LAMP3 could be considered a marker of mature dendritic cells owing to its specific expression upon activation of human dendritic cells, speculating that the degree of infiltration of CD208-positive cells was negatively correlated with surgical outcome in patients undergoing radical gastric cancer surgery (36). This was detailed by Sun et al. as the involvement of LAMP3+ DC in mediating T-cell activity and with the ability to form aggregation sites for cell-to-cell interactions in the gastric cancer tumor microenvironment, from which they draw, and emphasized the possibility of targeting LAMP3 for GC (36, 37). Besides, TPP1 was also demonstrated to act as a biomarker for gastric cancer during its progression (38). IFI30, MYO7A, HPS4, and NEU4 have not been mentioned in studies on the background of gastric cancer, and we speculate that these four genes possess sufficient potential to influence the progression of gastric cancer, and further exploration could potentially facilitate the betterment of the current situation in gastric cancer research and clinical application. In fact, our experimental validation of IFI30, the most contributing gene, did confirm such claims to a certain extent. Previous studies similarly demonstrated the ability of IFI30 either to affect the redox of cells, leading to the regulation of autophagy, cell activation and proliferation, or to modulate the T-cell tolerance and thus bridge the potentially arising autoimmunity (39). In recent years, multitudes of scholars extended this to the direction of tumor immunity, thus revealing the great potential of IFI30 in the tumor immune microenvironment. By way of example, in melanoma, IFI30 could boost the processing and presentation of tumor antigens, TRP1 and TRP2, resulting in enhanced anti-tumor T-cell responses and ultimately higher patient survival (40–42). Same potential was observed in DLBC, BRCA, COAD, GBM and elsewhere (42, 43).
Furthermore, we compared current mainstream machine learning algorithms based on the eight genes screened in an attempt to discover the optimal predictive model. Taking accuracy, precision, recall, and F1 measurements into account, a preliminary determination of our study was carried out by means of applying the extra tree and random forest algorithms, incorporating the ROC-AUC value as a consideration, we concluded that the Extra Tree model, constructed based on lysosomal genes, would be the optimal option for diagnosis, which was further supplementally evidenced by the DCA curves and learning curves. Thereafter, we hypothesized that, together with gastroscopy, which is currently the main screening tool for early gastric cancer, the Extra Tree model would be applied to provide complementary screening and, in line with the predictions of the model, a comprehensive assessment of risk factors and actively control them to reduce or delay the occurrence and progression of the disease, or to promptly initiate secondary and tertiary prevention to counteract the deterioration of the disease (44–46).
To further confirm the feasibility of these observations, samples were scored and grouped using ssGSEA, with the three dominant enrichment analyses conducted on the different groups focusing primarily on the two key terms, secretion, and keratinization. We speculated on this from a pathological and morphological point of view, given that the cancerous tissue itself allows for deregulated growth beyond the normal structure in terms of hallmark, and that, in relation to the structure of the gastro-glandular body, it is conceivable that gastric glandular cells with a cancerous tendency would exhibit abnormalities in both gastric acid secretion and keratinization of their cells, which is somewhat consistent with our existing study (47). Additionally, we tried to pursue immunotherapy as a direction to get a response (48). Firstly, we assessed the abundance of immune cells in two subgroups based on LI and revealed that immune infiltrating cells, which perform an integral role in the tumor microenvironment, differed significantly between the two subgroups, which seems to confirm our suspicions, as current studies proved that differences in the immune microenvironment could be an essential contributor to drug resistance or sensitivity to the immunotherapy (49–51). We then used the TIDE algorithm to predict the effect of immunotherapy based on our current model, with the results implying that a higher LI group corresponds to a lower TIDE score, thus giving an indication that patients in this subgroup might potentially enjoy an advantage in receiving immunotherapy, which could largely optimize current gastric cancer treatment regimens. Not only that, the available data prove that the clinical efficacy of single therapies is not sufficiently superior to the combination therapies currently being explored, mainly combined chemotherapy, targeted therapy and radiotherapy, for which escalating therapeutic combinations would offer a personalized therapeutic weapon for gastric cancer patients (50, 52). Prediction of drug targets based on transcriptomic data is now a common tool (53). Taking this as a starting point, we screened possible drugs targeting the LI gene. 8 drugs such as cisplatin, eletriptanil, FMK, GSK1070916, GSK429286A, HG-5-113-01, T0901317 and talazopanib caught our attention and the results implied that patients in the lower LI group were more sensitive to chemotherapy with the above drugs, which might possibly contribute some help to individualized treatment of GC going forward.
Overall, our study essentially focused on the potential application of lysosomal-related genes in GC itself by integrating around 20 mainstream machine learning algorithms to construct an AI-based diagnostic predictor, whose development of the lysosomal index (LI) enables excellent immune assessment of patients as well as drug prediction, which would render a unique benefit to clinically intelligent adjunctive therapeutic approaches for GC patients, thereby facilitating the application of personalized management regimens.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
Data retrieved from the TCGA database was collected from patients who provided informed consent based on guidelines laid out by the TCGA Ethics, Law, and Policy Group. The procedures used in this study adhere to the tenets of the Declaration of Helsinki and approval was obtained from the ethics committee of the Affiliated Hospital of Jiangsu University.
Author contributions
QW, YL, ZZL, and YT designed the present study. QW, YL, ZZL, and YT prepared the figure and drafted this manuscript. XH, SZ, LW, and YL provided professional guidance in pathology. XH, SZ, LW, and YL are responsible for data curation and preprocessing. WL, HX, and ZZL are in charge for visualization. WL, HX, and ZZL enhanced the figures. WL, HX, and ZZL polished the language. ZRL and MX edited and revised the manuscript. ZRL and MX supervised the project. MX is responsible for funding acquisition. All authors contributed to the article and approved the submitted version.
Funding
This study was supported by the National Natural Science Foundation of China (No. 82072754), Jiangsu Provincial Key Research and Development Program (No. BE2018689), Natural Science Foundation of Jiangsu Province (No. M2020011), and Zhenjiang Key Research and Development Program (No. SH2018033).
Acknowledgments
We want to express our deep gratitude to the public databases, including TCGA, GeneCards, OMIM, CTD, and more, for providing open-accessible and high-quality research resources. We also sincerely thank the National Natural Science Foundation of China and the Natural Science Foundation of Jiangsu Province for their generosity to financialize the present study.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2023.1182277/full#supplementary-material
References
1. Kinoshita T, Uyama I, Terashima M, Noshiro H, Nagai E, Obama K, et al. Long-term outcomes of laparoscopic versus open surgery for clinical stage II/III gastric cancer: a multicenter cohort study in Japan (LOC-a study). Ann Surg (2019) 269(5):887–94. doi: 10.1097/SLA.0000000000002768
2. Golubnitschaja O, Baban B, Boniolo G, Wang W, Bubnov R, Kapalla M, et al. Medicine in the early twenty-first century: paradigm and anticipation - EPMA position paper 2016. Epma J (2016) 7(1):23. doi: 10.1186/s13167-016-0072-4
3. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin (2021) 71(3):209–49. doi: 10.3322/caac.21660
4. Hirasawa T, Ikenoyama Y, Ishioka M, Namikawa K, Horiuchi Y, Nakashima H, et al. Current status and future perspective of artificial intelligence applications in endoscopic diagnosis and management of gastric cancer. Dig Endosc (2021) 33(2):263–72. doi: 10.1111/den.13890
5. Zou Y, Xie J, Zheng S, Liu W, Tang Y, Tian W, et al. Leveraging diverse cell-death patterns to predict the prognosis and drug sensitivity of triple-negative breast cancer patients after surgery. Int J Surg (2022) 107:106936. doi: 10.1016/j.ijsu.2022.106936
6. Iulianna T, Kuldeep N, Eric F. The achilles' heel of cancer: targeting tumors via lysosome-induced immunogenic cell death. Cell Death Dis (2022) 13(5):509. doi: 10.1038/s41419-022-04912-8
7. Ballabio A, Bonifacino JS. Lysosomes as dynamic regulators of cell and organismal homeostasis. Nat Rev Mol Cell Biol (2020) 21(2):101–18. doi: 10.1038/s41580-019-0185-4
8. Zhang Z, Yue P, Lu T, Wang Y, Wei Y, Wei X. Role of lysosomes in physiological activities, diseases, and therapy. J Hematol Oncol (2021) 14(1):79. doi: 10.1186/s13045-021-01087-1
9. Cao M, Luo X, Wu K, He X. Targeting lysosomes in human disease: from basic research to clinical applications. Signal Transduct Target Ther (2021) 6(1):379. doi: 10.1038/s41392-021-00778-y
10. Mahapatra KK, Mishra SR, Behera BP, Patil S, Gewirtz DA, Bhutia SK. The lysosome as an imperative regulator of autophagy and cell death. Cell Mol Life Sci (2021) 78(23):7435–49. doi: 10.1007/s00018-021-03988-3
11. Noguchi M, Hirata N, Tanaka T, Suizu F, Nakajima H, Chiorini JA. Autophagy as a modulator of cell death machinery. Cell Death Dis (2020) 11(7):517. doi: 10.1038/s41419-020-2724-5
12. Lu M, Zhan H, Liu B, Li D, Li W, Chen X, et al. N6-methyladenosine-related non-coding RNAs are potential prognostic and immunotherapeutic responsiveness biomarkers for bladder cancer. Epma J (2021) 12(4):589–604. doi: 10.1007/s13167-021-00259-w
13. Zhang J, Bajari R, Andric D, Gerthoffert F, Lepsa A, Nahal-Bose H, et al. The international cancer genome consortium data portal. Nat Biotechnol (2019) 37(4):367–9. doi: 10.1038/s41587-019-0055-9
14. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learning (2002) 46(1):389–422. doi: 10.1023/A:1012487302797
15. Tin Kam H. (1995). Random decision forests. in: Proceedings of 3rd International Conference on Document Analysis and Recognition. 1:278–82. doi: 10.1109/ICDAR.1995.598994
16. Wolpert DH, Macready WG. No free lunch theorems for search, working papers. (1995) Santa Fe Institute.
17. Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res (2019) 3:18. doi: 10.1186/s41512-019-0064-7
18. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making (2006) 26(6):565–74. doi: 10.1177/0272989X06295361
19. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet (2003) 34(3):267–73. doi: 10.1038/ng1180
20. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA (2005) 102(43):15545–50. doi: 10.1073/pnas.0506580102
21. Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling tumor infiltrating immune cells with CIBERSORT. Methods Mol Biol (2018) 1711:243–59. doi: 10.1007/978-1-4939-7493-1_12
22. Grady WM, Yu M, Markowitz SD. Epigenetic alterations in the gastrointestinal tract: current and emerging use for biomarkers of cancer. Gastroenterology (2021) 160(3):690–709. doi: 10.1053/j.gastro.2020.09.058
23. Mabe K, Inoue K, Kamada T, Kato K, Kato M, Haruma K. Endoscopic screening for gastric cancer in Japan: current status and future perspectives. Dig Endosc (2022) 34(3):412–9. doi: 10.1111/den.14063
24. Tan J, Fu L, Chen H, Guan J, Chen Y, Fang J. Association study of genetic variation in the autophagy lysosome pathway genes and risk of eight kinds of cancers. Int J Cancer (2018) 143(1):80–7. doi: 10.1002/ijc.31288
25. Russell RC, Guan KL. The multifaceted role of autophagy in cancer. EMBO J (2022) 41(13):e110031. doi: 10.15252/embj.2021110031
26. Poillet-Perez L, Sarry JE, Joffre C. Autophagy is a major metabolic regulator involved in cancer therapy resistance. Cell Rep (2021) 36(7):109528. doi: 10.1016/j.celrep.2021.109528
27. Jain V, Singh MP, Amaravadi RK. Recent advances in targeting autophagy in cancer. Trends Pharmacol Sci (2023) 44(5):290–302. doi: 10.1016/j.tips.2023.02.003
28. Kuang S, Liao X, Zhang X, Rees TW, Guan R, Xiong K, et al. FerriIridium: a lysosome-targeting Iron(III)-activated Iridium(III) prodrug for chemotherapy in gastric cancer cells. Angew Chem Int Ed Engl (2020) 59(8):3315–21. doi: 10.1002/anie.201915828
29. Sharma P, Hassan C. Artificial intelligence and deep learning for upper gastrointestinal neoplasia. Gastroenterology (2022) 162(4):1056–66. doi: 10.1053/j.gastro.2021.11.040
30. Hussein M, González-Bueno Puyal J, Lines D, Sehgal V, Toth D, Ahmad OF, et al. A new artificial intelligence system successfully detects and localises early neoplasia in barrett's esophagus by using convolutional neural networks. United Eur Gastroenterol J (2022) 10(6):528–37. doi: 10.1002/ueg2.12233
31. Zhi X, Li B, Li Z, Zhang J, Yu J, Zhang L, et al. Adrenergic modulation of AMPK−dependent autophagy by chronic stress enhances cell proliferation and survival in gastric cancer. Int J Oncol (2019) 54(5):1625–38. doi: 10.3892/ijo.2019.4753
32. Zong C, Yang M, Guo X, Ji W. Chronic restraint stress promotes gastric epithelial malignant transformation by activating the Akt/p53 signaling pathway via ADRB2. Oncol Lett (2022) 24(3):300. doi: 10.3892/ol.2022.13420
33. Zhang X, Zhang Y, He Z, Yin K, Li B, Zhang L, et al. Chronic stress promotes gastric cancer progression and metastasis: an essential role for ADRB2. Cell Death Dis (2019) 10(11):788. doi: 10.1038/s41419-019-2030-2
34. Roepke TK, Purtell K, King EC, La Perle KM, Lerner DJ, Abbott GW. Targeted deletion of Kcne2 causes gastritis cystica profunda and gastric neoplasia. PloS One (2010) 5(7):e11451. doi: 10.1371/journal.pone.0011451
35. Yanglin P, Lina Z, Zhiguo L, Na L, Haifeng J, Guoyun Z, et al. KCNE2, a down-regulated gene identified by in silico analysis, suppressed proliferation of gastric cancer cells. Cancer Lett (2007) 246(1-2):129–38. doi: 10.1016/j.canlet.2006.02.010
36. Ishigami S, Ueno S, Matsumoto M, Okumura H, Arigami T, Uchikado Y, et al. Prognostic value of CD208-positive cell infiltration in gastric cancer. Cancer Immunol Immunother (2010) 59(3):389–95. doi: 10.1007/s00262-009-0758-8
37. Sun K, Xu R, Ma F, Yang N, Li Y, Sun X, et al. scRNA-seq of gastric tumor shows complex intercellular interaction with an alternative T cell exhaustion trajectory. Nat Commun (2022) 13(1):4943. doi: 10.1038/s41467-022-32627-z
38. Huang K, Chen S, Xie R, Jiang P, Yu C, Fang J, et al. Identification of three predictors of gastric cancer progression and prognosis. FEBS Open Bio (2020) 10(9):1891–9. doi: 10.1002/2211-5463.12943
39. Rausch MP, Meador LR, Metzger TC, Li H, Qiu S, Anderson MS, et al. GILT in thymic epithelial cells facilitates central CD4 T cell tolerance to a tissue-restricted, melanoma-associated self-antigen. J Immunol (2020) 204(11):2877–86. doi: 10.4049/jimmunol.1900523
40. Nguyen J, Bernert R, In K, Kang P, Sebastiao N, Hu C, et al. Gamma-interferon-inducible lysosomal thiol reductase is upregulated in human melanoma. Melanoma Res (2016) 26(2):125–37. doi: 10.1097/CMR.0000000000000230
41. Hathaway-Schrader JD, Doonan BP, Hossain A, Radwan FFY, Zhang L, Haque A. Autophagy-dependent crosstalk between GILT and PAX-3 influences radiation sensitivity of human melanoma cells. J Cell Biochem (2018) 119(2):2212–21. doi: 10.1002/jcb.26383
42. Buetow KH, Meador LR, Menon H, Lu YK, Brill J, Cui H, et al. High GILT expression and an active and intact MHC class II antigen presentation pathway are associated with improved survival in melanoma. J Immunol (2019) 203(10):2577–87. doi: 10.4049/jimmunol.1900476
43. Ye C, Zhou W, Wang F, Yin G, Zhang X, Kong L, et al. Prognostic value of gamma-interferon-inducible lysosomal thiol reductase expression in female patients diagnosed with breast cancer. Int J Cancer (2022) 150(4):705–17. doi: 10.1002/ijc.33843
44. Gu YJ, Chen LM, Gu ME, Xu HX, Li J, Wu LY. Body mass index-based predictions and personalized clinical strategies for colorectal cancer in the context of PPPM. Epma J (2022) 13(4):615–32. doi: 10.1007/s13167-022-00306-0
45. Ainiwan M, Wang Q, Yesitayi G, Ma X. Identification of FERMT1 and SGCD as key marker in acute aortic dissection from the perspective of predictive, preventive, and personalized medicine. Epma J (2022) 13(4):597–614. doi: 10.1007/s13167-022-00302-4
46. Shi W, Li C, Wartmann T, Kahlert C, Du R, Perrakis A, et al. Sensory ion channel candidates inform on the clinical course of pancreatic cancer and present potential targets for repurposing of FDA-approved agents. J Pers Med (2022) 12(3):478. doi: 10.3390/jpm12030478
47. Battista S, Ambrosio MR, Limarzi F, Gallo G, Saragoni L. Molecular alterations in gastric preneoplastic lesions and early gastric cancer. Int J Mol Sci (2021) 22(13):6652. doi: 10.3390/ijms22136652
48. Hamdy NM, Shaker FH, Zhan X, Basalious EB. Tangled quest of post-COVID-19 infection-caused neuropathology and what 3P nano-bio-medicine can solve? Epma J (2022) 13(2):261–84. doi: 10.1007/s13167-022-00285-2
49. Kono K, Nakajima S, Mimura K. Current status of immune checkpoint inhibitors for gastric cancer. Gastric Cancer (2020) 23(4):565–78. doi: 10.1007/s10120-020-01090-4
50. Walcher L, Kistenmacher AK, Suo H, Kitte R, Dluczek S, Strauß A, et al. Cancer stem cells-origins and biomarkers: perspectives for targeted personalized therapies. Front Immunol (2020) 11:1280. doi: 10.3389/fimmu.2020.01280
51. Shi W, Chen Z, Liu H, Miao C, Feng R, Wang G, et al. COL11A1 as an novel biomarker for breast cancer with machine learning and immunohistochemistry validation. Front Immunol (2022) 13:937125. doi: 10.3389/fimmu.2022.937125
52. Oster P, Vaillant L, McMillan B, Velin D. The efficacy of cancer immunotherapies is compromised by helicobacter pylori infection. Front Immunol (2022) 13:899161. doi: 10.3389/fimmu.2022.899161
Keywords: lysosome, gastric cancer, diagnosis, machine learning, immunotherapy, chemotherapy
Citation: Wang Q, Liu Y, Li Z, Tang Y, Long W, Xin H, Huang X, Zhou S, Wang L, Liang B, Li Z and Xu M (2023) Establishment of a novel lysosomal signature for the diagnosis of gastric cancer with in-vitro and in-situ validation. Front. Immunol. 14:1182277. doi: 10.3389/fimmu.2023.1182277
Received: 08 March 2023; Accepted: 21 April 2023;
Published: 05 May 2023.
Edited by:
Chun Wai Mai, UCSI University, MalaysiaReviewed by:
Xintian Cai, People’s Hospital of Xinjiang Uygur Autonomous Region, ChinaChen Li, Free University of Berlin, Germany
Copyright © 2023 Wang, Liu, Li, Tang, Long, Xin, Huang, Zhou, Wang, Liang, Li and Xu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Zhengrui Li, bHpyXzAxMDhAc2p0dS5lZHUuY24=; Min Xu, cGV0ZXJ4dTE5NzRAMTYzLmNvbQ==
†These authors have contributed equally to this work