- 1Department of Anesthesiology and Perioperative Medicine, School of Medcine, Shanghai Fourth People’s Hospital, Tongji University, Shanghai, China
- 2Translational Research Institute of Brain and Brain-Like Intelligence, School of Medcine, Shanghai Fourth People’s Hospital, Tongji University, Shanghai, China
- 3Clinical Research Center for Anesthesiology and Perioperative Medicine, Tongji University, Shanghai, China
Idiopathic pulmonary fibrosis (IPF) is a progressive disease whose etiology remains unknown. The purpose of this study was to explore hub genes and pathways related to IPF development and prognosis. Multiple gene expression datasets were downloaded from the Gene Expression Omnibus database. Weighted correlation network analysis (WGCNA) was performed and differentially expressed genes (DEGs) identified to investigate Hub modules and genes correlated with IPF. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, and protein-protein interaction (PPI) network analysis were performed on selected key genes. In the PPI network and cytoHubba plugin, 11 hub genes were identified, including ASPN, CDH2, COL1A1, COL1A2, COL3A1, COL14A1, CTSK, MMP1, MMP7, POSTN, and SPP1. Correlation between hub genes was displayed and validated. Expression levels of hub genes were verified using quantitative real-time PCR (qRT-PCR). Dysregulated expression of these genes and their crosstalk might impact the development of IPF through modulating IPF-related biological processes and signaling pathways. Among these genes, expression levels of COL1A1, COL3A1, CTSK, MMP1, MMP7, POSTN, and SPP1 were positively correlated with IPF prognosis. The present study provides further insights into individualized treatment and prognosis for IPF.
Introduction
Idiopathic pulmonary fibrosis (IPF), a chronic and progressive lung disease of unknown etiology (King et al., 2014; Martinez et al., 2017; Raghu et al., 2018), is characterized by diffuse alveolar inflammation and alveolar structural damage (Xu et al., 2016; Claude and Harold, 2018; David et al., 2019). The global annual incidence of IPF is 0.2–93.7 per 100,000 (Hutchinson et al., 2015) and the median survival time is only 2–3 years after diagnosis is confirmed (Tran and Suissa, 2018; Schäfer et al., 2020). Clinical biomarkers reliably reflecting the progression of IPF are small in number. In terms of treatment, medications recommended by current clinical guidelines - nintedanib and pirfenidone have a good therapeutic effect on IPF in general (Maher et al., 2019; Saito et al., 2019; Behr et al., 2021), but not that effective for late stage IPF. Therefore, it is important to illuminate the pathogenesis of IPF and to explore potential biomarkers for early stage IPF in order to improve the therapeutic effect on IPF.
In recent years, the amount of biological data for research has dramatically increased with the development of transcriptomic analysis, providing new viewpoints for exploring the etiology, pathogenesis, and novel targets for clinical treatment (DePianto et al., 2015; Wang et al., 2017). Previous studies primarily tested individual genes in diseased conditions. However, genes with similar expression patterns are likely to be tightly co-regulated in vivo with closely related functions, expressed in the same signaling pathways or processes (Pei et al., 2017). Identifying these genes and their interactions will provide further understanding of biological pathways related to IPF.
In the present study, multiple bioinformatic methods were used to search for hub genes significantly correlated with the occurrence and prognosis of IPF. Their expression has also been validated using quantitative real-time PCR (qRT-PCR). This work will provide further insights into the underlying molecular mechanisms of IPF and potential molecular targets for developing novel interventional strategies.
Materials and Methods
Acquisition of Microarray Data
In this study, seven gene array expression Series Matrix Files of IPF containing GSE110147, GSE53845, GSE24206, GSE32537, GSE68239, GSE10667, and GSE70866 were obtained from the Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/). The main features of these 7 datasets were shown in Table 1. These profiles consisted of gene expression matrices, probe sets, and clinical characteristics. Subsequent analyses were conducted on these datasets.
Removal of the Batch Effect
All of the raw data of GSE110147, GSE53845, GSE24206, GSE32537, and GSE68239 were merged and normalized into the training group using the RMA algorithm provided by the Limma package 3.40.6 of R, followed by removing the batch effect using R package SVA. The principal component analysis (PCA) was used to estimate whether the batch effect was removed. The validation group consisted of data from GSE10667 and GSE70866.
Identification of Significant Modules Using the Weighted Correlation Network Analysis
The WGCNA package was used to determine key genes significantly associated with IPF in the training and validation groups. The best soft threshold power was set to identify the module-trait relationship, module membership (MM), and gene significance (GS). In brief, a weighted adjacency matrix was first constructed based on the selected soft threshold power. Subsequently, the connectivity per gene was deduced through calculating connection strengths with other genes. After validating the module structure preservation using the module preservation R function, the gene expression profile of each module was summarized by the module eigengene on whom IPF traits were regressed in the Limma R package (Huang et al., 2020). The IPF-related module was selected with the highest coefficient square (r2) and the p value <0.001.
Identification of Differentially Expressed Genes
The Limma package 3.40.6 was used to compare DEGs between the normal group and the IPF group in the training group. The Benjamini-Hochberg’s method was used to control the false discovery rate (FDR), and DEGs were selected through adopting the commonly used threshold of FDR p value <0.01 and |Log2 fold-change| > 1. The volcano plot of DEGs was generated by the ffplot2 package, and a heatmap of DEGs was produced using the pheatmap package in the R software.
Identification of Key Genes and Gene Set Enrichment Analysis
An online tool (http://bioinformatics.psb.ugent.be/webtools/Venn/) was employed to construct a Venn diagram to overlap the key genes in the significant modules and DEGs in the training group. The analyses of key genes of Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were conducted using the GOplot, clusterProfiler, DOSE, colorspace, and the stringi packages in the R software.
Protein-Protein Interaction Network Analysis
The Search Tool for the Retrieval of Interacting Genes (STRING, https://www. string-db. org/) was used to identify functional interactions between the products of the key genes in the training group. The PPI network was constructed using STRING by adopting the default threshold of a combined score >0.6. Then, the number of nodes of all the related proteins in the PPI network was counted using the R software and visualized using Cytoscape 3.8.2.
The Correlation Between Hub Genes
The ggcorrpolt and the ggthemes of the R package were used to determine the correlation between hub genes in the training and the validation groups, respectively. p value <0.01 was considered statistically significant.
Ethics Statement and Animal Treatment
In this study, 12 8 week-old male C57BL/6 mice weighing 20–25 g were obtained from Shanghai SLAC Laboratory Animal Ltd., China. Mice were bred at 22–24°C under a 12 h/12 h light/dark cycle and with free access to food and water. All procedures were implemented in conformity to the guidelines for animal care published by the United States’ National Institutes of Health (NIH) for animal care (Guide for the Care and Use of Laboratory Animals, Department of Health and Human Services, NIH publication No. 86–23, revised 1985). All procedures were approved by Renji hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China (approval number: RJ-20170930). 5 mg/kg LPS was intraperitoneally injected for 5 consecutive days to 6 mice to induce pulmonary fibrosis as reported in our previous article (Wan et al., 2019). The other 6 mice served as control. The lung tissue of all mice was collected 30 days after LPS injection.
Hematoxylin and Eosin, Masson’s Trichrome Staining and Immunohistochemistry
Lung tissue collected from both control and LPS injected mice was fixed in 4% paraformaldehyde solution for overnight, followed by dehydration in 70% ethanol and embedding in the paraffin wax. 5 μm thick sections were cut and subject to H&E and Masson’s trichrome staining, respectively, as previously described (Wan et al., 2019). Expression of α-SMA in the lung tissue was evaluated using immunohistochemical staining. Sample sections were deparaffinized and incubated with 5% goat serum containing 0.1% Triton X-100 at room temperature for 2 h, followed by sequential incubation in the primary (rabbit anti-α-SMA, 1:2000, CST, Cat No 19245) and secondary antibody solutions (goat anti-rabbit IgG, 1:1000, Jackson Immunoresearch Laboratory, Cat No 111-035-003) antibodies, each for 2 h. After 3 rinses with 1x PBS, the sections were incubated in a 3,3,-diaminobenzidine (DAB) reaction complex (Vector lab, Burlingame, CA, United States) until an optimal colour developed. At the end of the procedure, the sections were mounted and dehydrated before being coverslipped.
Quantitative Real-Time PCR and Statistical Analysis
Total RNA of the mouse lung tissue was isolated using Trizol (Thermofisher) according to the manufacturer’s instructions. Primer Script RT reagent kit (Takara, Japan) was used to synthesize complementary DNA, and real-time PCR was run on a Quant Studio 1 real-time PCR system (Thermofisher, United States) using the TB Green Premix Ex TaqTM kit (Takara, Japan). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is stably expressed in the lung tissue during the process of pulmonary fibrosis. Therefore, it serves as a reliable endogenous control in the reverse transcription-polymerase chain reaction (Loitsch et al., 1999; Cruz-Bermúdez et al., 2019). The 2−△△Ct method was employed to assess relative expression levels of genes of interest. All of the data were presented as mean ± standard error of mean (SEM) and analyzed using the GraphPad PRISM8 software (United States). Two-tailed Student’s t test was used to compare between the two groups. p < 0.001 was considered statistically significant. Sequences of the primers could be found in Table 2.
Survival Analysis of Hub Genes
The Kaplan-Meier method (K-M method; product-limit method) is suitable for analysis with a small sample size. Survival analysis was performed using survfit R package in the R software. The difference in survival curves for IPF patients with different expression levels of hub genes was analyzed using the log-rank method provided by the survival package.
Results
Removal of Batch Effects by Cross-Platform Normalization
To eliminate batch effects from different platforms and batches of datasets, the ComBat function of SVA package was used (Chen et al., 2020). Before the removal of batch effects, samples were clustered by batches based on the first two principal components (PC) of unnormalized expression values (Figures 1A,C). In contrast, the scatter plot of principal component analysis (PCA) based on normalized expression showed that batch processing effects from different platforms were eliminated (Figures 1B,D). These results confirmed that cross-platform standardization successfully eliminated batch effects.
FIGURE 1. Principal component analysis (PCA) of the training group. The points of the scatter plots represented the samples without (A,C) and with (B,D) the removal of batch effect based on the top two principal components (PC1 and PC2) of gene expression profiles. Samples from five different datasets were color coded (A,B), and the colors represented the IPF group and the normal group, respectively (C,D).
Identification of IPF-Associated Weighted Correlation 26 Network Analysis Modules
A total of 287 samples and 6,263 genes retrieved from the training group were used for the co‐expression network analysis (Figure 2A). An eigengene correlation coefficient square (r2) of 0.8 and a soft threshold power of 3 were set to identify the module-trait relationship (Figure 2B). A hierarchical clustering tree was constructed following a dynamic hybrid cut (Figure 2C). The eigengene dendrogram and heatmap were used to quantify module similarity by eigengene correlation (Figures 2D,E). Among the 15 gene modules identified (Figure 2F), the green module (Figure 2G) showed a relatively higher positive correlation with the IPF group (cor = 0.71, p < 0.001), and the brown module (Figure 2H) exhibited significantly positive correlation with the IPF group (cor = −0.69, p < 0.001).
FIGURE 2. Key modules identified by WGCNA in the training group. (A) Sample clustering to detect outliers. (B) Determination of soft-thresholding power. (C) Clustering dendrogram. Both of the original and the merged modules were presented. (D) Dendrogram of consensus module eigengenes obtained from WGCNA on the consensus correlation. (E) Dendrogram of merged module eigengenes obtained from WGCNA. (F) The module-trait relationships were evaluated by correlating module eigengenes with clinical traits. (G) Scatter plots of module eigengenes in the green modules. (H) Scatter plots of module eigengenes in the brown modules.
Idiopathic Pulmonary Fibrosis-Associated Key Genes and Their Functions
A total of 177 DEGs, including 121 up-regulated and 56 down-regulated genes, were identified in the training group (Figures 3A,B, Table 3). Among them, 72 genes were determined as key genes because they were shared by the green and brown modules (Figure 4A and Table 4). KEGG analysis revealed that the most notably enriched pathways were protein digestion and absorption and focal adhesion (Figure 4B, Supplementary Table S1). GO analysis showed that these genes were mainly involved in nine cellular components (Figure 4D), nine molecular functions (Figure 4E), and nine biological processes, including extracellular matrix organization (CCDC80, COL14A1, COL15A1, COL17A1, COL1A1, COL1A2, COL3A1, COMP, CTSK, FAP, MMP1, MMP7, POSTN, SFRP2, SPP1, SULF1, and VCAM1), extracellular structure organization (CCDC80, COL14A1, COL15A1, COL17A1, COL1A1, COL1A2, COL3A1, COMP, CTSK, FAP, MMP1, MMP7, POSTN, SFRP2, SPP1, SULF1, and VCAM1), response to TGF beta (ACVRL1, ASPN, CLDN1, COL1A1, COL1A2, COL3A1, ID1, LRRC32, LTBP1, MXRA5, POSTN, SMAD6), and others (Figure 4C), indicating that these genes could play important roles in the occurrence of IPF through influencing these biological processes and molecular functions.
FIGURE 3. Identification of DEGs in IPF and the enrichment of these genes in the training group. (A) Volcano plot of all DEGs. (B) Heatmap of all DEGs.
FIGURE 4. GO and KEGG enrichment analysis of key genes in the training group. (A) The Venn diagram demonstrating key genes in the training group. (B) Kyoto encyclopedia of genes and genomes (KEGG) analysis of key genes (E). Gene ontology (GO) analysis of key genes in the training group (C–E), (C) BP: biological process, (D) CC: cellular component, (E) MF: molecular function.
Identification of Hub Genes
To explore hub genes, the PPI network analysis and cytoHubba plugin within cytoscape version 3.8.2 were performed based on the rank of the connection degree (number) of each gene. As shown in Figures 5A,B, nodes of 72 genes in the PPI network were identified according to the medium confidence ≥0.6. Finally, twelve hub genes were confirmed based on their node degrees within cytoHubba plugin (Figure 5A), including collagen alpha-1 chain (I) (COL1A1), collagen alpha-2 chain (I) (COL1A2), collagen alpha-1 (III) chain (COL3A1), collagen alpha-1 chain (XIV) (COL14A1), collagen alpha-1 chain (XV) (COL15A1), periostin (POSTN), secreted phosphoprotein 1 (SPP1), matrix metallopeptidase 1 (MMP1), matrix metallopeptidase 7 (MMP7), asporin (ASPN), cadherin 2 (CDH2), and cathepsin K (CTSK). In addition, correlation analysis showed a highly consistent positive correlation between 12 hub genes in both the training group (Figures 5D,F) and the validation group (Figures 5E,G), such as COL1A2 and COL1A1, COL1A2 and COL3A1, COL1A2 and COL14A1, COL1A1 and COL3A1, COL1A1 and COL14A1, COL3A1 and COL14A1, SPP1 and MMP1, MMP1 and MMP7, POSTN and COL14A1, COL14A1 and COL15A1 (0.17 < cor <0.96, 2.2E-16 < p < 0.039), which further validated the results by PPI and cytoHubba plugin analysis (Figure 5C).
FIGURE 5. Cluster analysis of the PPI network in the training group. (A) Protein-protein interaction (PPI) network for key genes. (B) Histogram of key genes. (C) The hub gene network analyzed by Cytosacpe. (D) Correlation between hub genes in the training group. (E) Correlation between hub genes in the validation group. (F) Display of correlations between ten groups of genes with a correlation coefficient >0.8 in the training group. (G) Display of correlations between ten groups of genes in the validation group.
To further confirm the role of these 12 hub genes in IPF, another WGCNA for 6,252 genes obtained from GSE10667 of the training group was performed. It was found that the black and greenyellow modules, containing 12 hub genes, out of the 14 gene modules identified (Figure S1a) showed strong association with IPF (black module: cor = -0.63, p < 0.001; greenyellow module: cor = 0.81, p < 0.001) (Supplementary Figures S1B–D). These hub genes were involved in many important IPF-related biological processes, such as extracellular matrix organization, response to TGF beta, collagen metabolic process, fibrillar collagen trimer, etc (Supplementary Table S2). These results suggested that the abovementioned hub genes play important roles in the occurrence of IPF.
Expression Validation of Hub Genes by Quantitative Real-Time -PCR
The clustering heatmap showed expression levels of hub genes in the training (Figure 6A) and the validation groups (Figure 6B). Expression levels of 12 hub genes showed consistent upregulation in both the training (Figure 6C) and the validation groups (Figure 6D). To further validate the results, qRT-PCR was conducted in the LPS-induced pulmonary fibrosis model. The modeling process was described in detail in our previous study (Wan et al., 2019). Firstly, mice were intraperitoneally injected with saline or LPS (5 mg/kg) for 5 consecutive days, and samples were collected on day 30 after LPS injection. H&E E/MASSON staining and α-SMA immunohistochemistry (Figure 7A) showed that the model of pulmonary fibrosis was successfully constructed. The lung tissue of 6 control and 6 LPS injected mice was collected. As shown in Figure 7B, expression levels of COL1A1, COL1A2, COL3A1, COL14A1, MMP1, MMP7, ASPN, CDH2, SPP1, POSTN, and CTSK were significantly increased in the lung tissue of LPS injected mice compared with that of control mice (p < 0.001), whereas the expression level of COL15A1 was undetectable in the lung tissue of either the control or LPS injected mice. These results demonstrated important roles COL1A1, COL1A2, COL3A1, COL14A1, MMP1, MMP7, ASPN, CDH2, SPP1, POSTN and CTSK play in IPF occurrence.
FIGURE 6. Correlation analysis of hub genes. (A) Heatmap of the training group. (B) Heatmap of the validation group. (C) Expression levels of twelve hub genes (healthy control samples vs IPF samples) from the training group. (D) Expression levels of twelve hub genes from the validation group (*p < 0.05, **p < 0.01).
FIGURE 7. Expression validation of hub genes. (A) The severity of pulmonary fibrosis was determined by hematoxylin-eosin (HE) staining, collagen deposition was assessed by Masson’s trichrome staining, and α-SMA expression in the lung tissue was detected by immunohistochemistry (magnification, ×200). (B) Expression levels of COL1A1, COL1A2, COL3A1, COL14A1, MMP1, MMP7, ASPN, CDH2, SPP1, POSRN, CTSK mRNAs in lung tissue samples (**p < 0.01).
Survival Analysis of Hub Genes in the Validation Group
To explore the prognostic value of these 12 hub genes for IPF patients, overall survival analysis was performed using the GSE70866 dataset. It was found that IPF patients with high expression levels of COL1A1, CTSK, MMP1, MMP7, and SPP1 had poorer overall survival compared to those with low expression levels (p < 0.05). No significant correlation was found between expression levels of other hub genes and survival of IPF patients (p > 0.05, Figure 8).
FIGURE 8. Survival analysis for hub genes in the validation group. Kaplan-Meier plot depicting association of the expression level of twelve individual hub genes with patient survival.
Discussion
Idiopathic pulmonary fibrosis (IPF) is an end-stage lung disease with a short survival time after diagnosis is confirmed (Liu et al., 2018). Therefore, it is vital to illuminate the underlying mechanism of IPF pathogenesis and to explore potential biomarkers for early diagnosis in order to improve the prognosis of IPF patients.
The present study used bioinformatic methods to screen IPF-related genes and tentatively tested their roles in the occurrence and prognosis of IPF based on the gene expression data of IPF patients in the GEO databases. Compared to other bioinformatics analyses, WGCNA is more valuable due to the comprehensive examination of links between co-expression modules and clinical traits with high reliability and biological significance (Wei et al., 2014; Kong et al., 2020; Lund et al., 2020). In the present study, 11 hub genes (COL1A2, COL1A1, COL3A1, SPP1, MMP1, POSTN, ASPN, CDH2, COL14A1, CTSK, MMP7) were found to be associated with the occurrence of IPF using WGCNA, differential gene expression analysis in the training datasets, PPI network analysis, and cytoHubba plugin analyses. Expression of these genes was validated in the GSE10667 dataset and in pulmonary fibrosis model mice using qRT-PCR. These results are consistent with those of previous studies in that COL1A1, COL1A2, COL3A1, COL14A1, POSTN, SPP1, MMP1, ASPN, MMP7, CDH2, and CTSK were up-regulated in a variety of samples from IPF patients and IPF-related mouse models, including blood, bronchoalveolar lavage, airway fibroblasts, and the lung tissue of human and mice, as well as involved in the occurrence of IPF (Supplementary Table S3) (Tzortzaki et al., 2003; Brule et al., 2005; Obermajer et al., 2006; Ivan et al., 2008; Naik et al., 2012; Vittal et al., 2013; Zhang et al., 2013; Hamai et al., 2016; Ghavami et al., 2018; Liu et al., 2018; Morimoto et al., 2018; Gao et al., 2019; Yu et al., 2020).
The collagen binding protein ASPN, a member of the family of leucine-rich repeat proteins, is produced by fibroblasts (Nakajima et al., 2007). A previous study showed that the expression of ASPN was up-regulated in the IPF(+) soluble fractions (Åhrman et al., 2018). COL1A1, a functional gene that encodes the alpha 1 chain of type I collagen, is the main constituent of ECM component (Li et al., 2020) and participates in cell proliferation, infiltration, metastasis, and angiogenesis. Its expression is related to many types of tumors (Bi et al., 2017; Ma et al., 2019; Dong et al., 2020). It is known that type III collagen interacts with type I and type II collagen in fibril formation and is an essential regulator of fibril diameter. The increase in type I, III, XIV, XV collagen content will lead to the formation of fibrils. Like many fibrotic disorders, IPF is characterized by enhanced deposition and remodeling of ECM (Tjin et al., 2017). Matrix metalloproteinases (MMPs) are a family of calcium- and zinc-dependent endopeptidases (Visse and Nagase, 2003) whose function is to regulate abnormal epithelial response to injury, fibroblast proliferation, extracellular matrix accumulation, and aberrant tissue remodeling (Ivan et al., 2008). MMPs play a role in different pathological processes such as atherosclerosis (Wågsäter et al., 2011), arthritis (Huang et al., 2017), tumor invasion, and pulmonary fibrosis (Ivan et al., 2008). POSTN plays a notable role in ECM structure and organization and principally in collagen assembly. Secreted phosphoprotein 1 (SPP1), also known as osteopontin-like protein, is a multifunctional secretory acidic glycoprotein (Zhang et al., 2017). Earlier research displayed that POSTN can promote myofibroblast differentiation and type I collagen production, contributing to aberrant lung fibrosis. Serum levels of POSTN may predict the clinical progression of IPF (O'Dwyer and Moore, 2017; Morse et al., 2019; Alimperti and Andreadis, 2015). SPP1 can be secreted by various cells, such as osteoclasts, macrophages, epithelial cells, and endothelial cells. Prior studies reported that low-level local proliferation of macrophages that highly express SPP1 in the normal lung tissue was strikingly increased in IPF lungs, and macrophages that highly express SPP1 importantly contribute to lung fibrosis in IPF34. CDH2, a calcium-dependent cell adhesion protein and also a mesenchymal cell marker, preferentially mediates homotypic cell-cell adhesion by dimerization with a CDH2 chain from another cell (Alimperti and Andreadis, 2015). Previous study showed that CDH2 was involved in epithelial-mesenchymal transition (EMT) which contributes to pulmonary fibrosis (Gao et al., 2019). CTSK displays potent endoprotease activity against fibrinogen (Obermajer et al., 2006), and the expression level of CTSK was up-regulated in the silica induced pulmonary fibrosis model (Brule et al., 2005). These suggested that 11 hub genes identified in the present study play important roles in IPF occurrence.
To explore the prognostic value of these hub genes, survival analysis was performed. It was found that COL1A1, CTSK, MMP1, MMP7, and SPP1 could be potential biomarkers for poor prognosis of IPF patients because the survival rate of IPF patients with high expression levels of COL1A1, CTSK, MMP1, MMP7, and SPP1 was lower than that of those with low expression levels of the abovementioned genes. A number of studies have reported that type I (COL1A1) collagen and matrix metalloproteinases (MMP1 and MMP7) are fibrotic genes associated with the occurrence and prognosis of IPF (Ivan et al., 2008; Giménez et al., 2017; Adegunsoye et al., 2020; Liu et al., 2020; Xu et al., 2020; Mishra et al., 2021). However, further clinical investigation on these genes is warranted.
GO and KEGG pathway enrichment analyses revealed that COL1A1, COL1A2, COL3A1, COL14A1, POSTN, SPP1, MMP1, ASPN, MMP7, CDH2, and CTSK were involved in multiple biological processes related to extracellular matrix organization and collagen metabolism by protein digestion and absorption signaling pathways. In addition, correlation analysis, and PPI as well as cytoHubba plugin analyses showed a highly consistent positive correlation between hub genes in both the training group and the validation group. These results suggest that altered expression of these genes and their crosstalk might impact the development of IPF by modulating the above biological processes and signaling pathways.
The present study analyzed results of seven GEO datasets including 337 IPF samples and 120 control samples to identify possible biomarkers that facilitate elucidation of the underlying molecular mechanism of IPF using a variety of bioinformatics methods (WGCNA, different gene expression analysis, PPI, correlation analysis, etc.). These results were confirmed in 7 GEO datasets, indicating a high reliability and validity. They will shed light on the potential pathogenic mechanism of IPF and guide the development of more potent drugs for IPF.
Conclusion
In summary, results of the present study suggest that ASPN, CDH2, COL1A1, COL1A2, COL3A1, COL14A1, MMP1, POSTN, SPP1, MMP7, and CTSK are potential biomarkers of IPF. Altered expression of these genes and their cross-talk might impact the development of IPF by modulating IPF-related biological processes and signaling pathways. COL1A1, CTSK, MMP1, MMP7, and SPP1 are positively correlated with IPF prognosis. This study provides further insights into individualized treatment and prognosis for IPF.
Data Availability Statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.
Ethics Statement
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. The animal study was reviewed and approved by the Animal Research Ethics Committee of Renji hospital, Shanghai Jiao Tong University School of Medicine.
Author Contributions
HW and XH designed the experiments, prepared the manuscript and performed data analysis. PC, MH, AC, TW, DD, WL, LT and XG did the data curation. LX and HL oversaw the project and revised the manuscript.
Funding
The present study was supported by a grant from Shanghai Fourth People’s Hospital (No. 2019001).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb.2021.711239/full#supplementary-material
References
Adegunsoye, A., Alqalyoobi, S., Linderholm, A., Bowman, W. S., Lee, C. T., Pugashetti, J. V., et al. (2020). Circulating Plasma Biomarkers of Survival in Antifibrotic-Treated Patients with Idiopathic Pulmonary Fibrosis. Chest 158, 1526–1534. doi:10.1016/j.chest.2020.04.066
Åhrman, E., Hallgren, O., Malmström, L., Hedström, U., Malmström, A., Bjermer, L., et al. (2018). Quantitative Proteomic Characterization of the Lung Extracellular Matrix in Chronic Obstructive Pulmonary Disease and Idiopathic Pulmonary Fibrosis. J. Proteomics 189, 23–33. doi:10.1016/j.jprot.2018.02.027
Alimperti, S., and Andreadis, S. T. (2015). CDH2 and CDH11 Act as Regulators of Stem Cell Fate Decisions. Stem Cel Res. 14, 270–282. doi:10.1016/j.scr.2015.02.002
Behr, J., Nathan, S. D., Wuyts, W. A., Mogulkoc Bishop, N., Bouros, D. E., Antoniou, K., et al. (2021). Efficacy and Safety of Sildenafil Added to Pirfenidone in Patients with Advanced Idiopathic Pulmonary Fibrosis and Risk of Pulmonary Hypertension: a Double-Blind, Randomised, Placebo-Controlled, Phase 2b Trial. Lancet Respir. Med. 9, 85–95. doi:10.1016/s2213-2600(20)30356-8
Bi, S., Chai, L., Yuan, X., Cao, C., and Li, S. (2017). MicroRNA-98 Inhibits the Cell Proliferation of Human Hypertrophic Scar Fibroblasts via Targeting Col1A1. Biol. Res. 50, 22. doi:10.1186/s40659-017-0127-6
Brule, S., Misson, P., Buhling, F., Lison, D., and Huaux, F. (2005). Overexpression of Cathepsin K during Silica-Induced Lung Fibrosis and Control by TGF-Beta. Respir. Res. 6, 84. doi:10.1186/1465-9921-6-84
Chen, W., Zhang, S., Williams, J., Ju, B., Shaner, B., Easton, J., et al. (2020). A Comparison of Methods Accounting for Batch Effects in Differential Expression Analysis of UMI Count Based Single Cell RNA Sequencing. Comput. Struct. Biotechnol. J. 18, 861–873. doi:10.1016/j.csbj.2020.03.026
Claude, J., and Harold, A. (2018). Idiopathic Pulmonary Fibrosis: Cell Death and Inflammation Revisited. Am. J. Respir. Cel Mol. Biol. 59, 137–138. doi:10.1165/rcmb.2018-0083ED
Cruz-Bermúdez, A., Laza-Briviesca, R., Vicente-Blanco, R. J., García-Grande, A., Coronado, M. J., Laine-Menéndez, S., et al. (2019). Cancer-associated Fibroblasts Modify Lung Cancer Metabolism Involving ROS and TGF-β Signaling. Free Radic. Biol. Med. 130, 163–173. doi:10.1016/j.freeradbiomed.2018.10.450
DePianto, D. J., Chandriani, S., Abbas, A. R., Jia, G., N'Diaye, E. N., Caplazi, P., et al. (2015). Heterogeneous Gene Expression Signatures Correspond to Distinct Lung Pathologies and Biomarkers of Disease Severity in Idiopathic Pulmonary Fibrosis. Thorax 70, 48–56. doi:10.1136/thoraxjnl-2013-204596
Dong, X.-Z., Zhao, Z.-R., Hu, Y., Lu, Y.-P., Liu, P., and Zhang, L. (2020). LncRNA COL1A1-014 Is Involved in the Progression of Gastric Cancer via Regulating CXCL12-CXCR4 axis. Gastric Cancer 23, 260–272. doi:10.1007/s10120-019-01011-0
Gao, L., Jiang, D., Geng, J., Dong, R., and Dai, H. (2019). Hydrogen Inhalation Attenuated Bleomycin‐induced Pulmonary Fibrosis by Inhibiting Transforming Growth Factor‐β1 and Relevant Oxidative Stress and Epithelial‐to‐mesenchymal Transition. Exp. Physiol. 104, 1942–1951. doi:10.1113/ep088028
Ghavami, S., Yeganeh, B., Zeki, A. A., Shojaei, S., Kenyon, N. J., Ott, S., et al. (2018). Autophagy and the Unfolded Protein Response Promote Profibrotic Effects of TGF-Β1 in Human Lung Fibroblasts. Am. J. Physiology-Lung Cell Mol. Physiol. 314, L493–L504. doi:10.1152/ajplung.00372.2017
Giménez, A., Duch, P., Puig, M., Gabasa, M., Xaubet, A., and Alcaraz, J. (2017). Dysregulated Collagen Homeostasis by Matrix Stiffening and TGF-Β1 in Fibroblasts from Idiopathic Pulmonary Fibrosis Patients: Role of FAK/Akt. Ijms 18, 2431. doi:10.3390/ijms18112431
Hamai, K., Iwamoto, H., Ishikawa, N., Horimasu, Y., Masuda, T., Miyamoto, S., et al. (2016). Comparative Study of Circulating MMP-7, CCL18, KL-6, SP-A, and SP-D as Disease Markers of Idiopathic Pulmonary Fibrosis. Dis. Markers 2016, 4759040. doi:10.1155/2016/4759040
Huang, T.-L., Mu, N., Gu, J.-T., Shu, Z., Zhang, K., Zhao, J.-K., et al. (2017). DDR2-CYR61-MMP1 Signaling Pathway Promotes Bone Erosion in Rheumatoid Arthritis through Regulating Migration and Invasion of Fibroblast-like Synoviocytes. J. Bone Miner Res. 32, 407–418. doi:10.1002/jbmr.2993
Huang, X., Lv, D., Yang, X., Li, M., and Zhang, H. (2020). m6A RNA Methylation Regulators Could Contribute to the Occurrence of Chronic Obstructive Pulmonary Disease. J. Cel. Mol. Med. 24 (21), 12706–12715. doi:10.1111/jcmm.15848
Hutchinson, J., Fogarty, A., Hubbard, R., and McKeever, T. (2015). Global Incidence and Mortality of Idiopathic Pulmonary Fibrosis: a Systematic Review. Eur. Respir. J. 46, 795–806. doi:10.1183/09031936.00185114
Ivan, O., Thomas, J., and Kazuhisa, K. (2008). MMP1 and MMP7 as Potential Peripheral Blood Biomarkers in Idiopathic Pulmonary Fibrosis. Plos Med. 5, e93. doi:10.1371/journal.pmed.0050093
King, T. E., Bradford, W. Z., Castro-Bernardini, S., Fagan, E. A., Glaspole, I., Glassberg, M. K., et al. (2014). A Phase 3 Trial of Pirfenidone in Patients with Idiopathic Pulmonary Fibrosis. N. Engl. J. Med. 370, 2083–2092. doi:10.1056/nejmoa1402582
Kong, Y., Feng, Z. C., Zhang, Y. L., Liu, X. F., Ma, Y., and Zhao, Z. M. (2020). Identification of Immune-Related Genes Contributing to the Development of Glioblastoma Using Weighted Gene Co-expression Network Analysis. Front. Immunol. 11, 1281. doi:10.3389/fimmu.2020.01281
Li, H., Chang, H. M., Shi, Z., and Leung, P. C. K. (2020). The P38 Signaling Pathway Mediates the TGF‐β1‐induced Increase in Type I Collagen Deposition in Human Granulosa Cells. FASEB j. 34, 15591–15604. doi:10.1096/fj.202001377r
Liu, Y., Wang, Y., Lu, F., Wang, L., Miao, L., and Wang, X. (2020). BTB and CNC Homology 1 Inhibition Ameliorates Fibrosis and Inflammation via Blocking ERK Pathway in Pulmonary Fibrosis. Exp. Lung Res. 26, 1–11. doi:10.1080/01902148.2020.1849448
Liu, Y., Zhu, M., Geng, J., Ban, C., Zhang, S., Chen, W., et al. (2018). Incidence and Radiologic‐pathological Features of Lung Cancer in Idiopathic Pulmonary Fibrosis. Clin. Respir. J. 12, 1700–1705. doi:10.1111/crj.12732
Loitsch, S. M., Kippenberger, S., Dauletbaev, N., Wagner, T. O., and Bargon, J. (1999). Reverse Transcription-Competitive Multiplex PCR Improves Quantification of mRNA in Clinical Samples-Application to the Low Abundance CFTR mRNA. Clin. Chem. 45 (5), 619–624. doi:10.1093/clinchem/45.5.619
Lund, J. B., Li, S., Baumbach, J., Christensen, K., Li, W., Mohammadnejad, A., et al. (2020). Weighted Gene Coregulation Network Analysis of Promoter DNA Methylation on All-Cause Mortality in Old-Aged Birth Cohorts Finds Modules of High-Risk Associated Biomarkers. J. Gerontol. A. Biol. Sci. Med. Sci. 75, 2249–2257. doi:10.1093/gerona/glaa066
Ma, H.-P., Chang, H.-L., Bamodu, O. A., Yadav, V. K., Huang, T.-Y., Wu, A. T. H., et al. (2019). Collagen 1A1 (COL1A1) Is a Reliable Biomarker and Putative Therapeutic Target for Hepatocellular Carcinogenesis and Metastasis. Cancers 11, 786. doi:10.3390/cancers11060786
Maher, T. M., Stowasser, S., Nishioka, Y., White, E. S., Cottin, V., Noth, I., et al. (2019). Biomarkers of Extracellular Matrix Turnover in Patients with Idiopathic Pulmonary Fibrosis Given Nintedanib (INMARK Study): a Randomised, Placebo-Controlled Study. Lancet Respir. Med. 7, 771–779. doi:10.1016/S2213-2600(19)30255-3
Martinez, F. J., Collard, H. R., Pardo, A., Raghu, G., Richeldi, L., Selman, M., et al. (2017). Idiopathic Pulmonary Fibrosis. Nat. Rev. Dis. Primers 3, 17074. doi:10.1038/nrdp.2017.74
Mishra, S., Shah, M. I., Udhaya Kumar, S., Thirumal Kumar, D., Gopalakrishnan, C., Al-Subaie, A. M., et al. (2021). Network Analysis of Transcriptomics Data for the Prediction and Prioritization of Membrane-Associated Biomarkers for Idiopathic Pulmonary Fibrosis (IPF) by Bioinformatics Approach. Adv. Protein Chem. Struct. Biol. 123, 241–273. doi:10.1016/bs.apcsb.2020.10.003
Morimoto, Y., Hirahara, K., Kiuchi, M., Wada, T., Ichikawa, T., Kanno, T., et al. (2018). Amphiregulin-Producing Pathogenic Memory T Helper 2 Cells Instruct Eosinophils to Secrete Osteopontin and Facilitate Airway Fibrosis. Immunity 49, 134–150. doi:10.1016/j.immuni.2018.04.023
Morse, C., Tabib, T., Sembrat, J., Buschur, K. L., Bittar, H. T., Valenzi, E., et al. (2019). Proliferating SPP1/MERTK-Expressing Macrophages in Idiopathic Pulmonary Fibrosis. Eur. Respir. J. 54, 1802441. doi:10.1183/13993003.02441-2018
Naik, P. K., Bozyk, P. D., Bentley, J. K., Popova, A. P., Birch, C. M., Wilke, C. A., et al. (2012). Periostin Promotes Fibrosis and Predicts Progression in Patients with Idiopathic Pulmonary Fibrosis. Am. J. Physiology-Lung Cell Mol. Physiol. 303, L1046–L1056. doi:10.1152/ajplung.00139.2012
Nakajima, M., Kizawa, H., Saitoh, M., Kou, I., Miyazono, K., and Ikegawa, S. (2007). Mechanisms for Asporin Function and Regulation in Articular Cartilage. J. Biol. Chem. 282, 32185–32192. doi:10.1074/jbc.m700522200
O'Dwyer, D. N., Ashley, S. L., Gurczynski, S. J., Xia, M., Wilke, C., Falkowski, N. R, et al. (2019). Lung Microbiota Contribute to Pulmonary Inflammation and Disease Progression in Pulmonary Fibrosis. Am. J. Respir. Crit. Care Med. 199, 1127–1138. doi:10.1164/rccm.201809-1650OC
O'Dwyer, D. N., and Moore, B. B. (2017). The Role of Periostin in Lung Fibrosis and Airway Remodeling. Cell Mol Life Sci. 74, 4305–4314. doi:10.1007/s00018-017-2649-z
Obermajer, N., Premzl, A., Zavašnik Bergant, T., Turk, B., and Kos, J. (2006). Carboxypeptidase Cathepsin X Mediates β2-integrin-dependent Adhesion of Differentiated U-937 Cells. Exp. Cel Res. 312, 2515–2527. doi:10.1016/j.yexcr.2006.04.019
Pei, G., Chen, L., and Zhang, W. (2017). WGCNA Application to Proteomic and Metabolomic Data Analysis. Methods Enzymol. 585, 135–158. doi:10.1016/bs.mie.2016.09.016
Raghu, G., Remy-Jardin, M., Myers, J. L., Richeldi, L., Ryerson, C. J., Lederer, D. J., et al. (2018). Diagnosis of Idiopathic Pulmonary Fibrosis. An Official ATS/ERS/JRS/ALAT Clinical Practice Guideline. Am. J. Respir. Crit. Care Med. 198, e44–e68. doi:10.1164/rccm.201807-1255st
Saito, S., Alkhatib, A., Kolls, J. K., Kondoh, Y., and Lasky, J. A. (2019). Pharmacotherapy and Adjunctive Treatment for Idiopathic Pulmonary Fibrosis (IPF). J. Thorac. Dis. 11, S1740–S1754. doi:10.21037/jtd.2019.04.62
Schäfer, S. C., Funke-Chambour, M., and Berezowska, S. (2020). Idiopathische Lungenfibrose - Epidemiologie, Ursachen und klinischer Verlauf. Pathologe 41, 46–51. doi:10.1007/s00292-019-00747-x
Tjin, G., White, E. S., Faiz, A., Sicard, D., Tschumperlin, D. J., Mahar, A., et al. (2017). Lysyl Oxidases Regulate Fibrillar Collagen Remodelling in Idiopathic Pulmonary Fibrosis. Dis. Model. Mech. 10, 1301–1312. doi:10.1242/dmm.030114
Tran, T., and Suissa, S. (2018). The Effect of Anti-acid Therapy on Survival in Idiopathic Pulmonary Fibrosis: a Methodological Review of Observational Studies. Eur. Respir. J. 51. doi:10.1183/13993003.00376-2018
Tzortzaki, E. G., Tischfield, J. A., Sahota, A., Siafakas, N. M., Gordon, M. K., and Gerecke, D. R. (2003). Expression of FACIT Collagens XII and XIV during Bleomycin-Induced Pulmonary Fibrosis in Mice. Anat. Rec. 275A, 1073–1080. doi:10.1002/ar.a.10120
Visse, R., and Nagase, H. (2003). Matrix Metalloproteinases and Tissue Inhibitors of Metalloproteinases. Circ. Res. 92, 827–839. doi:10.1161/01.res.0000070112.80711.3d
Vittal, R., Mickler, E. A., Fisher, A. J., Zhang, C., Rothhaar, K., Gu, H., et al. (2013). Type V Collagen Induced Tolerance Suppresses Collagen Deposition, TGF-Beta and Associated Transcripts in Pulmonary Fibrosis. PLoS One 8, e76451. doi:10.1371/journal.pone.0076451
Wågsäter, D., Zhu, C., Björkegren, J., Skogsberg, J., and Eriksson, P. (2011). MMP-2 and MMP-9 Are Prominent Matrix Metalloproteinases during Atherosclerosis Development in the Ldlr(-/-)Apob(100/100) Mouse. Int. J. Mol. Med. 28, 247–253. doi:10.3892/ijmm.2011.693
Wan, H., Xie, T., Xu, Q., Hu, X., Xing, S., Yang, H., et al. (2019). Thy-1 Depletion and Integrin β3 Upregulation-Mediated PI3K-Akt-mTOR Pathway Activation Inhibits Lung Fibroblast Autophagy in Lipopolysaccharide-Induced Pulmonary Fibrosis. Lab. Invest. 99 (11), 1636–1649. doi:10.1038/s41374-019-0281-2
Wang, Y., Yella, J., Chen, J., McCormack, F. X., Madala, S. K., and Jegga, A. G. (2017). Unsupervised Gene Expression Analyses Identify IPF-Severity Correlated Signatures, Associated Genes and Biomarkers. BMC Pulm. Med. 17, 133. doi:10.1186/s12890-017-0472-9
Wei, C., An-Lin, C., Marco, B., Chou, W. C., Cheng, A. L., and Brotto, M. (2014). Visual Gene-Network Analysis Reveals the Cancer Gene Co-expression in Human Endometrial Cancer. BMC Genomics 15, 300. doi:10.1186/1471-2164-15-300
Xu, Y., Mizuno, T., Sridharan, A., Du, Y., Guo, M., Tang, J., et al. (2016). Single-cell RNA Sequencing Identifies Diverse Roles of Epithelial Cells in Idiopathic Pulmonary Fibrosis. JCI Insight 1, e90558. doi:10.1172/jci.insight.90558
Xu, Z., Mo, L., Feng, X., Huang, M., and Li, L. (2020). Using Bioinformatics Approach Identifies Key Genes and Pathways in Idiopathic Pulmonary Fibrosis. Medicine (Baltimore) 99 (36), e22099. doi:10.1097/md.0000000000022099
Yu, D. H., Ruan, X. L., Huang, J. Y., Liu, X. P., Ma, H. L., Chen, C., et al. (2020). Analysis of the Interaction Network of Hub miRNAs-Hub Genes, Being Involved in Idiopathic Pulmonary Fibers and its Emerging Role in Non-small Cell Lung Cancer. Front. Genet. 11, 302. doi:10.3389/fgene.2020.00302
Zhang, X., Liu, H., Hock, T., Thannickal, V., and Sanders, Y. (2013). Histone Deacetylase Inhibition Downregulates Collagen 3A1 in Fibrotic Lung Fibroblasts. Ijms 14, 19605–19617. doi:10.3390/ijms141019605
Keywords: idiopathic pulmonary fibrosiss, differentially expressed gene, hub genes, bioinformatics, WGCNA (weighted gene Co-expression network analyses), PPI
Citation: Wan H, Huang X, Cong P, He M, Chen A, Wu T, Dai D, Li W, Gao X, Tian L, Liang H and Xiong L (2021) Identification of Hub Genes and Pathways Associated With Idiopathic Pulmonary Fibrosis via Bioinformatics Analysis. Front. Mol. Biosci. 8:711239. doi: 10.3389/fmolb.2021.711239
Received: 18 May 2021; Accepted: 02 August 2021;
Published: 12 August 2021.
Edited by:
Xianglin Yuan, Huazhong University of Science and Technology, ChinaReviewed by:
Sayan Chatterjee, Guru Gobind Singh Indraprastha University, IndiaSercin Ozlem Caliskan, Uşak University, Turkey
Copyright © 2021 Wan, Huang, Cong, He, Chen, Wu, Dai, Li, Gao, Tian, Liang and Xiong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Lize Xiong, mzkxlz@126.com; Huazheng Liang, andyliang2018@126.com
†These authors have contributed equally to this work