- 1Department of Thoracic and Cardiovascular Surgery, School of Medicine, The Second Affiliated Hospital of Nantong University, Nantong University, Nantong, China
- 2Department of Thoracic and Cardiovascular Surgery, Tongji Hospital, School of Medicine, Tongji University, Shanghai, China
- 3Department of Vascular Surgery, The Second Affiliated Hospital of Nantong University, Nantong, China
Background: Coronary artery disease (CAD) is a main cause leading to increasing mortality of cardiovascular disease (CVD) worldwide. We aimed to discover marker genes and develop a diagnostic model for CAD.
Methods: CAD-related target genes were searched from DisGeNET. Count expression data and clinical information were screened from the GSE202626 dataset. edgeR package identified differentially expressed genes (DEGs). Using online STRING tool and Cytoscape, protein-protein reactions (PPI) were predicted. WebGestaltR package was employed to functional enrichment analysis. We used Metascape to conduct module-based network analysis. VarElect algorithm provided genes-phenotype correlation analysis. Immune infiltration was assessed by ESTIMATE package and ssGSEA analysis. mRNAsi was determined by one class logistic regression (OCLR). A diagnostic model was constructed by SVM algorithm.
Results: 162 target genes were screened by intersection 1,714 DEGs and 1,708 CAD related target genes. 137 target genes of the 162 target genes were obtained using PPI analysis, in which those targets were enriched in inflammatory cytokine pathways, such as chemokine signaling pathway, and IL-17 signaling pathway. From the above 137 target genes, four functional modules (MCODE1-4) were extracted. From the 162 potential targets, CAD phenotype were directly and indirectly associated with 161 genes and 22 genes, respectively. Finally, 5 hub genes (CCL2, PTGS2, NLRP3, VEGFA, LTA) were screened by intersections with the top 20, directly and indirectly, and genes in MCODE1. PTGS2, NLRP3 and VEGFA were positively, while LTA was negatively correlated with immune cells scores. PTGS2, NLRP3 and VEGFA were negatively, while LTA was positively correlated with mRNAsi. A diagnostic model was successfully established, evidenced by 92.59% sensitivity and AUC was 0.9230 in the GSE202625 dataset and 94.11% sensitivity and AUC was 0.9706 in GSE120774 dataset.
Conclusion: In this work, we identified 5 hub genes, which may be associated with CAD development.
Introduction
Coronary heart disease (CAD), as a kind of cardiovascular disease, has high incidence, mortality, recurrence rate, showing an unfavorable prognosis (1). The basic pathological change of CAD is atherosclerosis. Previous studies (2, 3) have confirmed that lipid metabolism and inflammatory response are involved in the pathogenesis of CAD. The use of antiplatelet drugs and statins has a positive effect on the prevention of CAD induced adverse events (4), However, the bleeding risk of antiplatelet therapy and the liver toxicity of statins also affect the treatment effect of CAD to a certain extent (5). Cardiac MRI, single-photon emission computed tomography and angiography are often used in the diagnosis of CAD with high accuracy, however, unstable image quality and invasive examination methods limit their clinical application. Therefore, finding a less invasive and more accurate examination method has become the focus of clinical attention (6).
With the maturation of gene sequencing technology and the reduction of cost, the application of genomics in the field of medicine has gradually expanded. Compared with the validation of a single target or pathway in traditional research ideas, bioinformatics methods based on genomics can obtain and analyze massive gene expression data in a short time, and dig into the underlying mechanisms and core pathways of diseases (7). The Gene Expression Database (GEO) of the National Center for Biotechnology Information (NCBI) is the largest disease database available (8). In this study, bioinformatics methods were used to mine CAD related gene datasets in GEO, screen DEGs and conducted functional enrichment analysis for exploring the potential targets and mechanisms of CAD, so as to improve the theoretical basis for CAD diagnosis and treatment.
Materials and methods
Identification of DEGs
Count expression data and clinical information of dataset GSE202625 were obtained from the public database of GEO. DEGs between CAD and the control group in the GSE202625 dataset were analyzed by edgeR package, with p < 0.05 and | a Fold Change (FC) | >1. 2 as a threshold for screening.
Collection of CAD related genes
DisGeNET (http://www.disgenet.org/) was used for acquiring CAD-related targets through searching the term “coronary artery disease” on the platform. Finally, 1,708 genes were obtained.
Functional enrichment analysis
The R software package WebGestaltR V 0.4.4 was used during GO functional enrichment analysis and KEGG pathways, with the enriched GO entries and pathways being defined as having a p value <0.0 5.
PPI network construction
The intersection of GSE202625 differential genes and CAD-related targets were seen as possible therapeutic targets of CAD. String database (https://stringdb.org/, version 1 1.5) could be used for analyzing known and predicted PPI. Cytoscape (http://cytoscape.org/, version 7 2) can visualize the complex relationships.
Gene phenotype correlation analysis
The VarElect tool is a free web-based phenotype-dependent variant/gene prioritizer. VarElect employs powerful search and scoring functions of GeneCards, an integrated genomic database, and scoring functions, and its algorithm provides inferred direct as well as indirect links between uploaded genes and inputted disease/phenotype. Therefore, the online tool affords a robust facility for ranking genes and pointing out their likelihood to be related to specific diseases. The VarElect tool (http://ve.genecards.org) assessed the association between potential therapeutic targets of CAD and “coronary artery disease” phenotype.
Correlation between hub genes and immunity
The ESTIMATE R package, which estimates stromal and immune cells in CAD tissue based on gene expression data, which generated three scores, including (i) ImmuneScore representing the infiltration of immune cells in CAD tissue, (ii) StromalScore capturing the presence of stroma in CAD tissue, and (iii) ESTIMATEScore inferring tumor purity, were used to predict CAD purity and the presence of infiltrating stromal and immune cells in CAD tissue. And the ESTIMATE software package calculates the three scores (StromalScore, ImmuneScore, ESTIMATEScore) of the dataset GSE202625. To quantify the relative percentage of immune cells in CAD samples, 28 types of immune cells were identified with high sensitivity and specificity using the ssGSEA algorithm. Then we calculated the Pearson correlation coefficient between the immune score and hub genes.
The calculation of mRNAsi
The expression data of pluripotent stem cell samples (ESC and iPSC) from the Progenitor Cell Biology Consortium (PCBC) database were used for predicting and calculating the stem cell index using the one class logistic regression (OCLR) method. Firstly, only the sample data of ESC and iPSC are kept, which are collectively referred to as SC samples. The Ensembl IDs of SC samples are converted into Gene Symbol and only the genes encoding proteins are kept. There are a total of 78 SC samples, expression profiles of 8,087 mRNA genes in each sample. For the obtained expression profile, the average value was used to centralize each sample. Finally, the OCLR method in R package gelnet v1.2.1 was used to calculate the weight vector of each gene for the processed data.
Construction of diagnostic models
The SVM algorithm (9) was used to construct the diagnostic model on the GSE202625 dataset based on hub genes, and 10-fold cross-validation was performed (10). GSE120774 dataset acted as an independent validation dataset.
Results
Identification of target genes for CAD
The flowchart was shown in Figure 1. In the GSE202625 dataset, we identified 1,714 DEGs between CAD and normal samples (Figure 2A). By intersecting 1,708 CAD related genes and 1,714 DEGs, we screened 162 target genes (Figure 2B). We also drew a network of DEGs and CAD related genes in CAD (Figure 2C).
Figure 2. Identification of target genes. (A) DEGs between CAD and normal samples in GSE202625 dataset. (B) Venn of DEGs and CAD related genes. (C) Network of DEGs and CAD related genes. The red diamond in the middle is CAD, the oval is gene, the green represents down-regulated expression, and the yellow represents up-regulated expression.
A total of 162 targets reserved with a confidence score > 0.4 were screened out of all the intersection targets according to the confidence range of defining PPI in String database. With the PPI from String database, to establish relationship network for the targets, Cytoscape software was used. In the PPI network, a sum of 137 target genes were kept after removing nodes with few edges (Figure 3).
Functional enrichment analysis of 137 target genes
For further clarifying multiple mechanisms among the CAD target genes at a system level, GO and KEGG enrichment analysis on the 137 target genes was conducted using WebGestaltR. The top 10 biological process GO entries, molecular function GO entries and KEGG signaling pathways involving these targets were shown in Figures 4A–D. We found that HIF−1 signaling pathway, Chemokine signaling pathway, IL−17 signaling pathway, VEGF signaling pathway, Cytokine−cytokine receptor interaction, PI3K−Akt signaling pathway were enriched.
Figure 4. Functional enrichment analysis of 137 target genes. (A) BP annotated map. (B) CC annotated map. (C) MF annotated map. (D) KEGG pathways.
Module-based network analysis of potential targets
In the network of potential targets, densely correlated protein groups and the biological functions of each group were found and annotated by MCODE algorithm. From the above 137 targets, we extracted four functional modules (Figures 5A–D).
Gene-phenotype correlation analysis of CAD
Direct and indirect genes associated with the CAD phenotype were shown in Figure 5 after correlation study between potential therapeutic targets and the CAD phenotype. The results showed that among the 162 potential targets, 22 were indirectly associated with the CAD phenotype, whereas 161 were directly associated with it (Figure 6). Table 1 displays the top 20 target genes that were indirectly and directly related to CAD phenotype.
Figure 6. Gene–phenotype correlation analysis of CAD. Intersection computes genes were categorised as directly (oval blue font) or indirectly (diamond black font) associated with the diabetic cataract phenotype. The darker the color, the higher the score.
The correlation analysis between immune cells and hub genes
5 hub genes were determined by intersection above the top 20 target genes and genes in MCODE1. ESTIMATE analysis and ssGSEA analysis were used to calculate immune cells scores (Figure 7A). LTA expression was negatively correlated with immune score, whereas PTGS2, NLRP3 and VEGFA expressions were positively correlated with immune score (Figure 7B).
Figure 7. The correlation analysis between 5 hub genes and immunes. (A) Heatmap of correlation between immune cells scores and 5 hub genes. (B) Correlation between immunescores and 5 hub genes.
The correlation analysis between mRNAsi and hub genes
The correlation analysis between 5 hub genes and mRNAsi showed that PTGS2, LTA expression was positively correlated with mRNAsi, while NLRP3 and VEGFA expressions were negatively correlated with mRNAsi (Figure 8).
Establishment and validation of diagnosis model
The 5 hub genes were the features in the training dataset (GSE202625 dataset), we obtained their corresponding expression profiles. A SVM classification model was developed. 48 out of 52 samples were correctly classified. The model had a specificity of 92 and a sensitivity of 92.59, and area under the ROC curve (AUC) was 0.92 (Figure 9A). In the validation dataset (GSE120774 dataset), 35 out of 36 samples were correctly classified. The sensitivity of the model was 97.22, the specificity was 100, and the area under the ROC curve (AUC) was 0.9706 (Figure 9B).
Figure 9. Construction of diagnosis model. (A) The classification results and ROC curve of GSE202625 samples in the diagnostic model. (B) The classification results and ROC curve of GSE120774 samples in the diagnostic model.
Discussion
Numerous studies have shown that inflammatory factors [tumor necrosis factor alpha, interleukin (IL)-1, and IL-6 levels] are elevated in patients with atherosclerotic heart disease (11, 12). In addition, reduction of the inflammatory response has decelerated the development of atherosclerosis and reduced cardiovascular events (13). Increased white blood cell count has been found to be an independent predictor of AMI death in clinical study (14). Lymphocytopenia is associated with a poor prognosis in various diseases such as stable CAD acute coronary syndrome and heart failure (15–17). In addition, low lymphocyte count is positively associated with the occurrence of cardiovascular events (15). In our study, our identified target genes enriched some Inflammatory factor-related pathways, such as Cytokine−cytokine receptor interaction, Chemokine signaling pathway, IL−17 signaling pathway.
Monocyte activity is associated with clinical indicators of atherosclerosis, heart failure syndrome and CKD (18). Previous studies have shown that CD8 T cells secrete a variety of inflammatory cytokines that exacerbate the inflammatory response and increase atherosclerotic plaque instability (19). Conversely, targeting antigen-presenting cells and modulating the cytotoxic activity of CD8 T cell subsets may inhibit atherosclerosis by attenuating the immune response (19). Other immune cell types, including principal cells (20) and neutrophils (21), also play a key role in the development of cardiovascular disease. Above data suggests that the immune system is very important in the development and progression of CAD. To further evaluate the relationship between immune cells and 5 hub genes in CAD, the ESTIMATE and ssGSEA analysis were used to perform a comprehensive evaluation of immune cells infiltration. The results displayed those 5 hub genes were closely associated immune cells scores.
We next constructed a diagnosis model based on 5 hub genes (CCL2, PTGS2, NLRP3, VEGFA and LTA). Among which, a reporter said that PTGS2 expression was upregulated in advanced stages of atherosclerosis, and positively associated with severity of atherosclerosis (22). As a chemokine, CCL2 expression was significantly elevated in diseased arteries and correlated significantly with the predictive value of atherosclerosis (23). In CAD patients, the NLRP3 gene expression was almost doubled (24). Several studies have reported that VEGFA signaling pathway is involved in CAD (25–27). Previous case-control studies suggested the single nucleotide polymorphisms of LTA gene is associated with CAD and myocardial infarction (28). Those data demonstrated reliability of our diagnosis model.
There are still some shortcomings in this study. The results of this study are only obtained from a public database, and experimental and clinical verification is needed. Secondly, the study is based on a relatively small number of samples, and validation in larger independent datasets would be desirable to confirm the robustness of the diagnostic model. Additionally, the study focuses on gene expression data, but the authors do not provide any information on potential post-transcriptional or post-translational regulation of the target genes, which may impact the validity of the findings. In conclusion, there are significant differences in gene expression during the pathogenesis of CAD, and its pathological process is related to inflammation and other signaling pathways. CCL2, PTGS2, NLRP3, VEGFA and LTA may be key genes in the pathogenesis of CAD.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.
Author contributions
ZM and HH conceived and designed the research, PZ acquired the data. TX and HL and XL analyzed and interpreted data, XL and HD obtained funding, XY and CX drafted the manuscript, and CZ revised the manuscript for important intellectual content. All authors contributed to the article and approved the submitted version.
Funding
This subject was funded by Jiangsu Province “Six Talents Peak” High-Level Talent Project (WSN-269) and Nantong Medical Key Talent Project.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcvm.2023.1086127/full#supplementary-material
References
1. Khera AV, Kathiresan S. Genetics of coronary artery disease: discovery, biology and clinical translation. Nat Rev Genet. (2017) 18(6):331–44. doi: 10.1038/nrg.2016.160
2. Cohain AT, Barrington WT, Jordan DM, Beckmann ND, Argmann CA, Houten SM, et al. An integrative multiomic network model links lipid metabolism to glucose regulation in coronary artery disease. Nat Commun. (2021) 12(1):547. doi: 10.1038/s41467-020-20750-8
3. Ali M, Girgis S, Hassan A, Rudick S, Becker RC. Inflammation and coronary artery disease: from pathophysiology to canakinumab anti-inflammatory thrombosis outcomes study (CANTOS). Coron Artery Dis. (2018) 29(5):429–37. doi: 10.1097/MCA.0000000000000625
4. Braun MM, Stevens WA, Barstow CH. Stable coronary artery disease: treatment. Am Fam Physician. (2018) 97(6):376–84.29671538
5. Tung YC, See LC, Chang SH, Liu JR, Kuo CT, Chang CJ. Impact of bleeding during dual antiplatelet therapy in patients with coronary artery disease. Sci Rep. (2020) 10(1):21345. doi: 10.1038/s41598-020-78400-4
6. CT Coronary angiography in patients with suspected angina due to coronary heart disease (SCOT-HEART): an open-label, parallel-group, multicentre trial. Lancet (London, England). (2015);385(9985):2383–91. doi: 10.1016/S0140-6736(15)60291-4
7. Anashkina AA, Leberfarb EY, Orlov YL. Recent trends in cancer genomics and bioinformatics tools development. Int J Mol Sci. (2021) 22(22):12146. doi: 10.3390/ijms222212146
8. Clough E, Barrett T. The gene expression omnibus database. Methods Mol Biol. (2016) 1418:93–110. doi: 10.1007/978-1-4939-3578-9_5
9. Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics. (2018) 15(1):41–51. doi: 10.21873/cgp.20063
10. Hossain SMM, Khatun L, Ray S, Mukhopadhyay A. Pan-cancer classification by regularized multi-task learning. Sci Rep. (2021) 11(1):24252. doi: 10.1038/s41598-021-03554-8
11. Libby P, Ridker PM, Hansson GK. Inflammation in atherosclerosis: from pathophysiology to practice. J Am Coll Cardiol. (2009) 54(23):2129–38. doi: 10.1016/j.jacc.2009.09.009
12. Montecucco F, Liberale L, Bonaventura A, Vecchiè A, Dallegri F, Carbone F. The role of inflammation in cardiovascular outcome. Curr Atheroscler Rep. (2017) 19(3):11. doi: 10.1007/s11883-017-0646-1
13. Soria-Florido MT, Schröder H, Grau M, Fitó M, Lassale C. High density lipoprotein functionality and cardiovascular events and mortality: a systematic review and meta-analysis. Atherosclerosis. (2020) 302:36–42. doi: 10.1016/j.atherosclerosis.2020.04.015
14. Núñez J, Miñana G, Bodí V, Núñez E, Sanchis J, Husser O, et al. Low lymphocyte count and cardiovascular diseases. Curr Med Chem. (2011) 18(21):3226–33. doi: 10.2174/092986711796391633
15. Ommen SR, Gibbons RJ, Hodge DO, Thomson SP. Usefulness of the lymphocyte concentration as a prognostic marker in coronary artery disease. Am J Cardiol. (1997) 79(6):812–4. doi: 10.1016/S0002-9149(96)00878-8
16. Levy WC, Mozaffarian D, Linker DT, Sutradhar SC, Anker SD, Cropp AB, et al. The Seattle heart failure model: prediction of survival in heart failure. Circulation. (2006) 113(11):1424–33. doi: 10.1161/CIRCULATIONAHA.105.584102
17. Núñez J, Núñez E, Bodí V, Sanchis J, Mainar L, Miñana G, et al. Low lymphocyte count in acute phase of ST-segment elevation myocardial infarction predicts long-term recurrent myocardial infarction. Coron Artery Dis. (2010) 21(1):1–7. doi: 10.1097/MCA.0b013e328332ee15
18. Dounousi E, Duni A, Naka KK, Vartholomatos G, Zoccali C. The innate immune system and cardiovascular disease in ESKD: monocytes and natural killer cells. Curr Vasc Pharmacol. (2021) 19(1):63–76. doi: 10.2174/18756212MTA3yNzEe1
19. van Duijn J, Kuiper J, Slütter B. The many faces of CD8+ T cells in atherosclerosis. Curr Opin Lipidol. (2018) 29(5):411–6. doi: 10.1097/MOL.0000000000000541
20. Varricchi G, Marone G, Kovanen PT. Cardiac mast cells: underappreciated immune cells in cardiovascular homeostasis and disease. Trends Immunol. (2020) 41(8):734–46. doi: 10.1016/j.it.2020.06.006
21. Grégory F. Role of mechanical stress and neutrophils in the pathogenesis of plaque erosion. Atherosclerosis. (2021) 318:60–9. doi: 10.1016/j.atherosclerosis.2020.11.002
22. Zhou Y, Zhou H, Hua L, Hou C, Jia Q, Chen J, et al. Verification of ferroptosis and pyroptosis and identification of PTGS2 as the hub gene in human coronary artery atherosclerosis. Free Radical Biol Med. (2021) 171:55–68. doi: 10.1016/j.freeradbiomed.2021.05.009
23. Hernández-Aguilera A, Fibla M, Cabré N, Luciano-Mateo F, Camps J, Fernández-Arroyo S, et al. Chemokine (C-C motif) ligand 2 and coronary artery disease: tissue expression of functional and atypical receptors. Cytokine. (2020) 126:154923. doi: 10.1016/j.cyto.2019.154923
24. Shateri H, Manafi B, Tayebinia H, Karimi J, Khodadadi I. Imbalance in thioredoxin system activates NLRP3 inflammasome pathway in epicardial adipose tissue of patients with coronary artery disease. Mol Biol Rep. (2021) 48(2):1181–91. doi: 10.1007/s11033-021-06208-0
25. Ouyang S, Li Y, Wu X, Wang Y, Liu F, Zhang J, et al. GPR4 Signaling is essential for the promotion of acid-mediated angiogenic capacity of endothelial progenitor cells by activating STAT3/VEGFA pathway in patients with coronary artery disease. Stem Cell Res Ther. (2021) 12(1):149. doi: 10.1186/s13287-021-02221-z
26. Lin J, Jiang J, Zhou R, Li X, Ye J. MicroRNA-451b participates in coronary heart disease by targeting VEGFA. Open Med (Wars). (2018) 15:1–7. doi: 10.1515/med-2020-0001
27. Wang Y, Wu B, Lu P, Zhang D, Wu B, Varshney S, et al. Uncontrolled angiogenic precursor expansion causes coronary artery anomalies in mice lacking Pofut1. Nat Commun. (2017) 8(1):578. doi: 10.1038/s41467-017-00654-w
Keywords: gene, diagnosis, CAD development, cardiovascular disease, coronary artery disease
Citation: Zhu P, Huang H, Xie T, Liang H, Li X, Li X, Dong H, Yu X, Xia C, Zhong C and Ming Z (2023) Identification of 5 hub genes for diagnosis of coronary artery disease. Front. Cardiovasc. Med. 10:1086127. doi: 10.3389/fcvm.2023.1086127
Received: 3 November 2022; Accepted: 19 June 2023;
Published: 5 July 2023.
Edited by:
Seitaro Nomura, The University of Tokyo, JapanReviewed by:
Madankumar Ghatge, The University of Iowa, United StatesBing Wang, Fifth Affiliated Hospital of Zhengzhou University, China
© 2023 Zhu, Huang, Xie, Liang, Li, Li, Dong, Yu, Xia, Zhong and Ming. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Zhibing Ming bXpiZG9jdG9yQDE2My5jb20=
†These authors have contributed equally to this work