- 1Harris Interdisciplinary Research, Davenport University, Grand Rapids, MI, United States
- 2Institute for Cyber-Enabled Research, Michigan State University, East Lansing, MI, United States
Background: Severe Acute Respiratory Syndrome (SARS) corona virus (CoV) infections are a serious public health threat because of their pandemic-causing potential. This work is the first to analyze mRNA expression data from SARS infections through meta-analysis of gene signatures, possibly identifying therapeutic targets associated with major SARS infections.
Methods: This work defines 37 gene signatures representing SARS-CoV, Middle East Respiratory Syndrome (MERS)-CoV, and SARS-CoV2 infections in human lung cultures and/or mouse lung cultures or samples and compares them through Gene Set Enrichment Analysis (GSEA). To do this, positive and negative infectious clone SARS (icSARS) gene panels are defined from GSEA-identified leading-edge genes between two icSARS-CoV derived signatures, both from human cultures. GSEA then is used to assess enrichment and identify leading-edge icSARS panel genes between icSARS gene panels and 27 other SARS-CoV gene signatures. The meta-analysis is expanded to include five MERS-CoV and three SARS-CoV2 gene signatures. Genes associated with SARS infection are predicted by examining the intersecting membership of GSEA-identified leading-edges across gene signatures.
Results: Significant enrichment (GSEA p<0.001) is observed between two icSARS-CoV derived signatures, and those leading-edge genes defined the positive (233 genes) and negative (114 genes) icSARS panels. Non-random significant enrichment (null distribution p<0.001) is observed between icSARS panels and all verification icSARSvsmock signatures derived from human cultures, from which 51 over- and 22 under-expressed genes are shared across leading-edges with 10 over-expressed genes already associated with icSARS infection. For the icSARSvsmock mouse signature, significant, non-random significant enrichment held for only the positive icSARS panel, from which nine genes are shared with icSARS infection in human cultures. Considering other SARS strains, significant, non-random enrichment (p<0.05) is observed across signatures derived from other SARS strains for the positive icSARS panel. Five positive icSARS panel genes, CXCL10, OAS3, OASL, IFIT3, and XAF1, are found across mice and human signatures regardless of SARS strains.
Conclusion: The GSEA-based meta-analysis approach used here identifies genes with and without reported associations with SARS-CoV infections, highlighting this approach’s predictability and usefulness in identifying genes that have potential as therapeutic targets to preclude or overcome SARS infections.
Background
Human β-coronaviruses (CoV) are enveloped, positive-sense RNA viruses that infect humans and a variety of animal species (1). Human CoV infections typically cause mild upper respiratory distress, referred to as the common cold, and generally were non-lethal (1, 2). However, in 2002 a novel CoV was found to cause the potentially life-threatening disease, severe acute respiratory syndrome (SARS). The initial SARS-CoV outbreak infected an estimated 8,400 people with over a 9% mortality rate (2–5). In 2012, another highly lethal CoV causing Middle East Respiratory Syndrome (MERS) emerged with an over 30% mortality rate (6, 7). Fortunately, both the SARS-CoV and MERS-CoV outbreaks were quickly contained through aggressive infection control measures. Efforts are still ongoing to control the pandemic of SARS-CoV2, a closely related CoV strain with almost 80% genome similarity to SARS-CoV and 50% similarity to MERS-CoV (8, 9). SARS-CoV2 is the causative agent of coronavirus disease 2019 (COVID-19), which is already responsible for over a 3.8 million deaths worldwide as of July 2021 (10–12). Several new variants of SARS-CoV2 are circulating the globe already, with some variants having increased infectiousness such as B.1.1.7 which originated in the United Kingdom in early 2021, prompting scientists to consider outbreaks of a future SARS-CoV3 strain (10, 13–15).
Options to successfully treat SARS-infected patients are critical to improving patient outcomes. Limited therapeutic options for SARS-CoV infections were available at the time. For example, the RNA-dependent RNA polymerase inhibitor, ribavirin, and corticosteroids were cornerstones of SARS-CoV treatment. Unfortunately, several reports exist showing a lack of efficacy for ribavirin and debating the effectiveness of corticosteroid therapy in SARS-CoV treatment (5, 16). Remdesivir is a nucleotide analogue that inhibits viral RNA synthesis in SARS via an unknown mechanism. Remdesivir achieved mixed results when treating SARS patients (17–19). MERS-CoV treatment options were limited also, with most therapies being existing drugs for other disease indications shown to target MERS-CoV replication in vitro (20). Clinical therapies previously used to treat SARS-CoV, like ribavirin and remdesivir, were common treatments for MERS-CoV infections and there was little investment in developing new therapeutic options for SARS infections (21). That initiative changed dramatically over the past year with many resources devoted to improving treatment options for SARS-CoV2 due to its pandemic status. For example, over 284 clinical trials at different phases are underway to examine efficacy and safety for repurposed drug and new therapeutic molecule development as potential treatment options against COVID-19 (21). Therapies previously used to treat SARS-CoV and MERS-CoV, like ribavirin and remdesivir, are being re-evaluated as treatments for COVID-19 (18, 19, 21). Newer therapies, such as inhibitors of membrane fusion, viral replicases, and human and viral proteases, also are under extensive examination as potential therapeutics against COVID-19 (21). Of particular interest are emerging therapies that target immune response, particularly the cytokine storm frequently described in COVID-19 patients, such as interferon (IFN) and interleukin (IL)−6 receptor antagonist therapies (21–27). SARS viruses encode at least 10 proteins to combat the induction and antiviral action of IFN (28). A delayed induction of the IFN-stimulating gene expression through virus-induced modulation of the basal activity of transcriptional activity of STAT1 and PKR pathways leads to a peak of coronavirus replication prior to IFN-I response, suggesting that IFN-I response at least partially controls SARS infection and potentially contributing to the progression of the disease. While these various treatment options show promise, no specific therapy is available currently to treat SARS-CoV2 infections as death tolls from the virus continue to increase (24). New therapeutic strategies are needed to successfully treat current and future SARS infections.
A complete understanding of the molecular changes driving SARS infections can assist in the development of new therapies to fight COVID-19 and other future SARS outbreaks. Many studies have been done to elucidate molecular changes associated with a SARS infection by examining gene expression changes and some changes have been confirmed via experimental techniques such as quantitative real-time polymerase chain reaction (qRT-PCR). These studies conducted differential expression analysis of individual genes using statistical methods such as fold change and/or T-test p−value to identify genes of interest (2, 29–33). This approach usually generates an exhaustive list of several hundred genes that meet pre-established statistical criteria (e.g., fold change <2 and/or T−test p−value<0.05). Further analysis is needed to refine lists of identified genes to identify the most promising gene candidates to target therapeutically. Since experimental work is time consuming and expensive, computational approaches to interpret and prioritize gene lists are used widely.
Several different computational approaches have been used to interpret experimentally generated differentially expressed gene lists. For example, functional enrichment was performed to interpret generated gene lists using a variety of bioinformatic approaches including network analysis (29, 30, 33, 34) and/or hypergeometric test-based approaches (e.g., Fisher’s Exact Test) that utilize established gene sets from public knowledgebases like Gene Ontology (GO) (2, 30–32). These studies identified several gene expression changes associated with SARS-CoV infection including increased expression of inflammatory mediators IL-6, IL−8, CXCL10 (i.e., INF γ-induced protein 10), IFN-λ, IFIT1, OASL, and OAS3 in SARS-CoV infected human lung cultures and mouse lung samples (29, 31, 33, 34). Comparative analysis of differential gene expression across SARS strains found significant increases in gene expression of some genes, like CXCL10, IFIT1, and OAS3, or their gene relations, such as CXCL1, CXCL2, and IFIT3, in MERS-CoV and/or SARS-CoV2 infections in human epithelial lung cells and lung samples from cynomolgus monkey and mice (35–38). Other genes were unique to a SARS strain, such as XAF1 which has been reported only in SARS-CoV2 infections (35–38). Hypergeometric test-based approaches are known to be limited because only genes that meet an established cut-off (e.g., T-test p−value<0.05) are considered (39). To overcome this limitation by considering all genes in an expression dataset, other studies utilized Gene Set Enrichment Analysis (GSEA) to calculate the enrichment of an established gene set from a public knowledgebase (e.g., MSigDB, Blood Transcriptional Modules, and/or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways) in a gene signature. This was done by defining a gene list ranked by differential expression between mock and SARS infected samples by an appropriate statistical method (29, 40). Findings from these studies included positively enriched modules associated with antiviral IFN, cell cycle and proliferation, and monocytes and dendritic cells in peripheral blood mononuclear cells from SARS-CoV2 patients (40). Further, GSEA has been used to confirm enrichment for a query gene set defined from a network analysis approach, and identified regulatory genes associated with the pathogenicity of SARS-CoV have been identified (29). This demonstrated that GSEA is a useful computational tool to validate gene candidates, though all reported studies had used GSEA on established query genes sets rather than using GSEA to establish new sets. Therefore, we sought to use a GSEA-based approach to identify gene candidates from SARS mRNA expression datasets here.
In a prior study to identify genes associated with macrolide resistance in Streptococcus pneumoniae, we demonstrated a GSEA-approach for gene identification that compared differential gene expression between mRNA expression datasets (41). This study successfully identified known and novel genes though it was limited due to incomplete genome annotation, a common problem for many microbial genomes (42–49). Applying our GSEA-approach to mRNA expression data derived from a species with a more completely annotated genome, such as human or mouse, our achievable results would improve. Therefore, in this paper we use the GSEA-approach previously used on Streptococcus pneumoniae to identify gene expression changes associated with SARS infection in human lung cell cultures and mouse lung samples. Using GSEA in this way generates a list of gene candidates associated with a SARS infection which is similar in nature to gene candidate lists generated previously by using single-gene analysis. To refine and prioritize the exhaustive gene lists generated from such an analysis, we performed a GSEA-based meta-analysis that incorporates over 35 different gene signatures. We hypothesized that therapeutically targeting gene candidates identified from our nested GSEA meta-analysis has the potential to improve treatment options for SARS infections
Methods
mRNA Expression Resources
SARS-CoV Datasets
To identify molecular changes associated with SARS infection in human lung cell cultures, we searched the Gene Expression Omnibus (GEO) repository (50–52) to find seven datasets for use in our study (Table 1). Since most datasets contained samples infected with an infectious clone of SARS-CoV Urbani (icSARS), we selected the published SuperSeries GSE47963, which contained three independent datasets (GSE47960, GSE47961, and GSE47962) that examined gene expression in icSARS and mock infected human primary tracheobronchial epithelium cells (29). We used this SuperSeries to identify and verify gene expression changes associated with icSARS infection mechanisms. These datasets also contained gene expression data for cells infected with two other strains: 1) icSARS-dORF6, the icSARS strain with a genetic deletion causing a lack of expression of ORF6 which encodes an IFN antagonist protein that increases virulence by blocking human nuclear translocation (2, 3, 29), or 2) SARS-BatSRBD, a SARS-CoV like virus isolated from bats that was synthetically modified to contain the spike receptor binding domain (SRBD) from the wild type Urbani strain to allow for infection of human and non-human primate cells (29, 53). To extend our verification of identified gene expression changes associated with icSARS infection into another lung cell type, we selected unpublished datasets GSE37827 and GSE48142 that examined gene expression in icSARS and mock infected human lung adenocarcinoma 2B-4 cells. GSE37827 also contained SARS-BatSRBD infected samples, and GSE48142 had samples infected with mutant strains containing either 1) a genetic modification of NSP16 (deltaNSP16) that encodes a non-structural 2’O methyltransferase whose disruption increases sensitivity to type 1 and 3 IFN responses (32), or 2) ExoNI, which has no formal description in public knowledgebases. All gene expression data for these five datasets were profiled on the commercial probe name version of the Agilent-014850 Whole Human Genome Microarray 4x44K G4112F platform (GEO GPL6480). Further, we selected GSE33267 because it had mock, icSARS, and dORF6 infected samples of Calu-3 cells, from which 2B-4 was a clonal derivative. GSE33267 was profiled on the feature number version of the Agilent-014850 Whole Human Genome Microarray 4x44K G4112F (GEO GPL4133). To compare identified changes associated with icSARS infection to changes associated with the icSARS’s parent Urbani SARS strain, we selected GSE17400 since it had samples of 2B-4 with mock or Urbani infection (31). GSE17400 was profiled on the Affymetrix Human Genome U133 Plus 2.0 Array (GEO GPL570).
To determine if the molecular changes associated with SARS infection observed in human lung cell cultures were reproducible in an in vivo mouse model, we selected four datasets that examined lung samples from mock or SARS-CoV infected 20-week-old C57BL6 mice. GSE50000 contained samples from mice infected with icSARS, SARS-BatSRBD, or MA15, a mouse adapted SARS-CoV strain, at two different inoculation doses (104 and 105). GSE33266 had samples over an inoculation dose range (102, 103, 104, and 105). GSE49262 had samples from mice infected with MA15 (105 inoculation dose) or dORF6 mutant strains, and GSE49263 had samples infected with MA15 (105 inoculation dose) or dNSP16 mutant strains. All gene expression data for these datasets were profiled on the Probe Name version of the commercial Agilent-014868 Whole Mouse Genome Microarray 4x44K G4122F platform (GEO GPL7202), except for GSE33266 which was profiled on the Feature Number version of the same platform (GEO GPL4134).
All SARS-CoV datasets measured gene expression over a time course, so we used time-matched samples collected at 48hrs post-infection unless otherwise stated. Expression data provided by GEO for all datasets except GSE49262 and GSE49263 were unnormalized intensities, so we z−scored across all samples within the dataset for normalization prior to use when appropriate. For data cleaning, we converted probes to Entrez ID for each gene using the GEO provided platform data table when necessary. Probes with no provided Entrez ID were removed from analysis. If multiple probes with the same Entrez ID existed, the probe with the highest coefficient of variance across duplicate probes was selected.
MERS-CoV Datasets
To compare molecular changes identified in SARS-CoV to changes associated with MERS-CoV infections in human and mouse lung cell cultures, we found three MERS-CoV datasets in GEO for use in our study (Table 2). GSE81909 and GSE100504 examined cultures of primary human airway epithelial cells infected with mock or wild type MERS-CoV (icMERS). Gene expression data for these datasets were profiled on the Probe Name version of the commercial Agilent-026652 Whole Human Genome Microarray 4x44K v2 (GEO GPL13497). GSE108594 had mouse lung cell cultures that were mock or MERS-CoV infected across an inoculation dose range (0, 104, 105, and 106) and profiled on the Probe Name version of Agilent-026655 Whole Mouse Genome Microarray 4x44K v2 (GEO GPL11202). For all MERS-CoV datasets, expression data provided by GEO were already normalized so we selected data for the 48hr post-infection time point as needed and cleaned data as previously described for SARS-CoV datasets.
SARS-CoV2 Datasets
To compare molecular changes identified in SARS-CoV to changes associated with SARS-CoV2 infections in human lung cell cultures, we found three SARS-CoV2 datasets in GEO with mock or SARS-CoV2 infected samples collected at 48hrs for use in our study (Table 2). GSE152586 contained gene expression data for cultures of human alveolar type II cell organoids (54) that were profiled on HiSeq X Ten (GEO GPL20795 and GPL28694). GSE160435 and GSE155518 contained gene expression data from organoids generated from human primary alveolar type II cells and were profiled on Illumina NovaSeq 6000 (GEO GPL24676 and/or GPL29320). GEO had no available SARS-CoV2 datasets utilizing mouse cultures or samples appropriate for use in this study. All provided SARS-CoV2 expression data was z-score normalized and cleaned as previously described for SARS-CoV datasets.
Defining Gene Signatures
We measured differential gene expression by Welch’s two-sample T-test score of normalized values to generate gene signatures (i.e., gene lists ranked by differential gene expression between SARS and mock infected samples). Genes that were over-expressed in SARS- compared to mock infected samples (e.g., positive T-score) fall within the positive tail of the gene signature while under-expressed genes (e.g., negative T-score) fall in the negative tail (Figure 1A). Genes with no substantial change in expression between SARS and mock infected samples (e.g., T-score around 0) were located toward the middle of the gene signature. Therefore, genes that fall within the tails of a gene signature likely changed in response to a specific SARS infection. We noted that three signatures were substantially skewed (i.e., rank of genes where T-score crosses from positive to negative was in top or bottom quartile of signature). For substantially skewed signatures, we adjusted all T-scores in the signature by the T-score of the gene at mid-point so that genes in the signature were balanced between positive and negative T-scores.
Figure 1 Gene Signature Definition and Generation icSARS Gene Panels. (A) Schematic definition of a gene signature. Differences in gene expression between two groups, such as SARS and mock infected lung cells, are measured by Welch’s two-sample T−test score. Gene signatures are ranked lists of genes from high (red) to low (blue) differential mRNA expression between groups. (B) Generation of icSARS gene panels for use in this study. To identify differentially expressed genes associated with icSARS infection in human airway epithelial cell cultures, query gene sets containing either the 500 most over- or under-expressed genes from positive or negative tails of the gene signature generated from the Gene Expression Omnibus (GEO) accession number GSE47960 mRNA expression dataset. The positive and negative tail query sets were compared individually to the gene signature generated from the GEO GSE47961 dataset, which was used as reference for Gene Set Enrichment Analysis (GSEA). From this GSEA computed two enrichment plots, one for each query set, and their associated normalized enrichment score (NES) and p-value which represent the extent of enrichment between query set and reference signature. GSEA also identified leading-edge genes, which are genes that contribute most to achieving maximum enrichment. Two gene panels were defined from leading-edge genes identified in each query set. These gene panels were used in this study for three purposes: 1) identification of gene expression changes associated with icSARS infection in human airway epithelial cell cultures, 2) verification of identified findings in independent datasets, and 3) comparison to other gene signatures representing changes in gene expression associated with other SARS infections.
From the 17 SARS datasets previously described, we defined a total of 37 gene signatures. We generated 29 SARS-CoV gene signatures, which included 17 gene signatures from human lung cultures and 12 signatures from mouse lung samples (Table 3). Specifically, we defined seven icSARSvsmock signatures (six in human cultures, one in mouse samples), one Urbanivsmock signature with the Urbani strain in human cultures, eight MA15vsmock signatures doses in mouse samples representing varying inoculation, five ORF6vsmock signatures (four in human cultures, one in mouse samples), five BATSRBDvsmock signatures (four in human cultures, one in mouse samples), two NSP16vsmock signatures (one in human cultures, one in mouse samples), and one ExoNIvsmock signature from human cultures. For MERS-CoV datasets, we defined five gene signatures that included two gene signatures from human lung cultures and three signatures from mouse lung samples (Table 4). For SARS-CoV2 datasets, we defined three gene signatures from human lung cultures (Table 4).
Overview of Gene Set Enrichment Analysis
We used Gene Set Enrichment Analysis (GSEA) as a statistical method that estimated enrichment between a query gene set (i.e., unranked list of genes) and a reference gene signature (39). GSEA used T-score to calculate a running summation enrichment score where hits (i.e., matches between query set and reference signature) increased the enrichment score proportional to the ranking statistical metric (e.g., T-score) and a miss (i.e., non-matches between query set and reference signature) decreased the enrichment score. From this, GSEA determined a maximum enrichment score for the specific query set and reference signature. Leading-edge genes contributed to reaching the maximum enrichment score, indicating leading-edge genes were associated with cellular response to a specific β−coronavirus infection. Further, GSEA calculated a normalized enrichment score (NES) from 1000 permutations of the reference signature to estimate the significance of enrichment between a specific query set and reference signature. This work used the javaGSEA Desktop Application release 3.0 version of GSEA available from Broad Institute to perform gene T−ranking for signature formation and GSEA for gene identification, verification, and comparison.
Identification of icSARS Associated Genes
To identify gene expression changes associated with icSARS infection, we generated two icSARS gene panels (Figure 1B). To do this, we selected 500 genes from the positive and negative tails from the GSE47960-derived icSARSvsmock gene signature and used them to form two individual query gene sets. GSEA compared each query gene set to the GSE47961-derived icSARSvsmock gene signature (reference). Leading-edge genes from each analysis were used to define the two icSARS gene panels, one panel per tail, and we identified genes included in icSARS panels as being associated with icSARS infection. Pathway enrichment analysis was performed on both icSARS gene panels using Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8. DAVID was a web-accessible knowledgebase with a comprehensive set of functional annotation tools for researchers to understand biological meaning behind large list of genes (55, 56).
Verification of icSARS Gene Panels
To verify the icSARS gene panels, we performed GSEA between icSARS gene panels and GSE47962-derived, GSE37827-derived, GSE48142-derived, and GSE33267-derived icSARSvsmock signatures. To assess if results generated from GSEA could be achieved randomly, we randomly selected 1000 gene panels consisting of either 233- or 114-genes, to match the number of genes in the positive and negative icSARS panels, respectively, from the GPL6480 platform. These analyses generated a null distribution of NES to which we compared the NES achieved by icSARS gene panels for each reference gene signature and counted the number of equal or better NES to estimate significance (i.e., distribution p-value). Histogram data and associated graphs (e.g., distribution curves and box and whiskers plot) were generated using XLStat version 2020.3 (57, 58), which was a user-friendly, commercial, data analysis add-on for Microsoft Excel.
Comparison to icSARS-Induced Gene Expression Changes in Mice and Across Other SARS Strains
To examine differential gene expression of genes from the icSARS gene panels in mice, we performed GSEA between icSARS gene panels (queries) and the GSE50000-derived icSARSvsmock signature (reference). We compared gene membership across leading-edges identified in the analysis of icSARS verification signatures and the icSARS mouse signature to identify genes associated with icSARS infection between models. Any icSARS panel genes that were not included the dataset’s platform still received consideration when examining across identified leading-edge genes. To expand this analysis by comparing gene signatures across infections caused by other SARS-CoV strains, we repeated this GSEA-based meta-analysis approach on infections with other strains, such as Urbani, MA15, SARS-BatSRBD, and mutants in ORF6, NSP16, or ExoNI. All SARS-CoV strains except ExoNI had strain-matched samples in both mouse samples and human lung cultures. We included MA15-infected mouse signatures over a range of inoculation doses (Table 3) to mimic the range of infection severity that would be encountered clinically (i.e., asymptomatic to severe patient presentation). Genes associated with SARS-CoV infection were identified by gene membership across leading-edges identified in these 22 analyses. Further, we repeated this GSEA-based meta-analysis approach to include gene signatures derived from MERS-CoV and SARS-CoV2 infections to compare gene signatures across major SARS infectious strains and examine gene membership across leading-edges. Random modelling for all comparisons were done as previously described. Heat maps were generated by a user-friendly, web-based program from Broad Institute that produces customizable heat maps named Morpheus, https://software.broadinstitute.org/morpheus.
Results
Gene Signature Approach Identified Gene Expression Changes Associated With icSARS Infection for Human Lung Epithelium Cells In Vitro
To identify genes associated with response to an icSARS infection, we first defined the GSE47960-derived and GSE47961-derived icSARSvsmock gene signatures (Table 2). We used the GSE47960-derived icSARSvsmock to generate two gene sets containing the 500 most differentially expressed genes from the positive and negative tails of GSE47960-derived icSARSvsmock (T-score >2.9 and <-3.2 for positive and negative tails, respectively). We chose 500 genes to capture maximum coverage of the signature that was allowable by GSEA (39). To assess similarity between these two gene signatures, we calculated enrichment using GSEA between either GSE47960-derived icSARSvsmock positive or negative tail gene sets and the GSE47961-derived icSARSvsmock and achieved NES=3.28 and NES=-1.87 for positive and negative tail query gene sets, respectively, both with a GSEA p−value<0.001. We defined separate positive and negative icSARS gene panels from the 233 and 114 leading-edge genes identified (Table 5, details in Supplementary Table 1), representing over- and under-expressed genes associated with icSARS infection.
To explore potential factors that could impact our results, we wanted to see if similar results were generated from panels defined using 1) reversed query and reference signatures, 2) different query set sizes, and 3) different time points. First, we repeated the process of defining panels except reversing the query and reference signatures by using the 500-gene tails from GSE47961-derived icSARSvsmock signature and GSE47960-derived icSARSvsmock, respectively. We noted that the size of the positive panel did not substantially change from 233 genes (Table 1) to 240 genes (Supplementary Table 2). However, we noted that the size of the negative icSARS panel did change substantially from 114 to 331 genes. Despite this substantial change in size, we observed that enrichment did not substantially change for either positive or negative tail query gene sets when reversed (Supplementary Figure 1). From this we conclude that reversing which signature was used as reference and to derive query gene sets does not change the overall result. Next, we repeated the process of defining panels except we used smaller query sizes (100, 200, 300, and 400 genes). We found similar enrichment regardless of query size (Supplementary Figure 1), highlighting the strong similarity between these signatures. Finally, we examined how time point selection (24, 48, 72, and 96hrs) may impact our results. We found reproducible, statistically significant enrichment beginning at 48hrs, suggesting that 48hrs was the earliest timepoint where gene expression changes could be reliably detected. Based on these results collectively, we focused our meta-analysis on the 48hr positive (233 genes) and negative (114 genes) icSARS gene panels defined from using the 500-gene tails of GSE47960-derived icSARSvsmock gene signature as query, since identified leading-edge genes represented over- and under-expressed genes associated with icSARS infection at the earliest time point with consistent detectable gene signature similarities.
Among genes in positive icSARS panel, we found 13 genes (5.5% of the panel) had previously reported associations with icSARS infections via single-gene analysis with minimum fold change of 2.0 and maximum false discovery rate-corrected p-value of 0.05 between mock and icSARS time-matched samples (29), demonstrating the predictive ability of our gene signature approach. There were no genes in the negative icSARS gene panel with established associations with icSARS infection. We also identified genes with no prior association with icSARS infection. These genes included 12 genes had a zinc finger in addition to the two that were already reported (5.2% of the panel), six genes encoding IFN-induced proteins (2.6%), five genes from the solute carrier family (2.1%), and six genes encoding for uncharacterized proteins (2.6%) in the positive icSARS gene panel. In the negative icSARS gene panel, we found three zinc finger protein genes (2.6%), three genes from the solute carrier family (2.6%), and five genes encoding for uncharacterized proteins (4.4%). We speculated that these identified genes without previously reported associations with icSARS infections in human lung epithelium cultures were also associated with an icSARS infection.
To expand on our analysis, we examined the cellular roles that genes in the icSARS panels were involved in. We used DAVID to calculate enrichment between the icSARS panels and pathways in commonly used knowledgebases. We noted that GO Biological Processes (BP) database returned the most identified significantly enriched pathways compared to other databases (data not shown), so we focused this discussion on GO-BP data to avoid confusion from overlapping pathway and gene inclusion variations across different knowledgebases. DAVID identified 49 significant GO-BP pathways (EASE score p-value<0.05) from the positive icSARS panel and 10 significant pathways from the negative icSARS panel (Supplementary Table 3). Most significantly enriched pathways have experimentally established associations with icSARS and other β-coronavirus infections, such as up-regulation of type I IFN signaling pathway (GO:0060337, p−value<0.001), NF-kappaB processes (transcription factor activity: GO:0051092, p-value=0.001; signaling: GO:0043123, p-value=0.043), p38MAPK cascade (GO:1900745, p-value=0.015) and apoptotic process (GO:0006915, p-value=0.018) and down-regulation of cilium morphogenesis (GO: 0060271, p−value=0.038) and assembly (GO: 0042384, p−value=0.0.030), demonstrating our gene signature approach’s ability to detect pathways associated with response to an icSARS infections (59–61). We also find several pathways, such as SMAD protein signal transduction (GO:0060395, p−value=0.037), with no prior associations to icSARS infection though they have been identified in mouse infections with the dORF strain (2). Therefore, we speculated that pathways without prior association to icSARS infections identified here were also involved in response to an icSARS infection.
Enrichment of icSARS Gene Panels and Specific icSARS Panel Genes Verified in Independent Datasets
To verify our icSARS gene panels were associated with response to an icSARS infection, we used GSEA to calculate enrichment between our icSARS panels and four icSARSvsmock verification gene signatures derived from independent datasets (Table 2). We found significant similarity between positive icSARS panels and all icSARSvsmock verification signatures (Figures 2A–D, NES>2.95, GSEA p-value <0.001). We also observed significant similarity between negative icSARS panels and the icSARSvsmock verification signatures (Figures 2E–H, NES<−2.14, p-value <0.001). To determine how likely the NES achieved for icSARS panels would be achieved by random chance, we generated a random model to match the size and potential composition of our icSARS panels. We observed NES achieved by our icSARS panels were consistently outside the range of NES generated randomly (null distribution p−value<0.001, Figures 2I–L), illustrating that NES achieved by our icSARS panels were non-random. Taken together, these results demonstrated that the enrichment achieved from our icSARS panels was true and reproducible.
Figure 2 Verification of icSARS Gene Panels in Independent Datasets. (A) Gene Set Enrichment Analysis (GSEA) calculated enrichment, as determined by normalized enrichment score (NES), between the positive icSARS gene panel and the GSE47962-derived icSARSvsmock gene signature. (B) GSEA between the positive icSARS gene panel and GSE37827-derived icSARSvsmock gene signature. (C) GSEA between the positive icSARS gene panel and GSE48142-derived icSARSvsmock gene signature. (D) GSEA between the positive icSARS gene panel and GSE33267-derived icSARSvsmock gene signature. (E) GSEA between the negative icSARS panel and the GSE47962-derived icSARSvsmock signature. (F) GSEA between the negative icSARS panel and the GSE37827-derived icSARSvsmock signature. (G) GSEA between the negative icSARS panel and the GSE48142-derived icSARSvsmock signature. (H) GSEA between the negative icSARS panel and the GSE33267-derived icSARSvsmock signature. (I) Distribution plot of NES from 1000 randomly generated gene panels (individual queries) compared to the GSE47962-derived icSARSvsmock signature. (J) Distribution plot of NES from 1000 randomly generated gene panels compared to the GSE37827-derived icSARSvsmock signature. (K) Distribution plot of NES from 1000 randomly generated gene panels compared to the GSE48142-derived icSARSvsmock signature. (L) Distribution plot of NES from 1000 randomly generated gene panels compared to the GSE33267-derived icSARSvsmock signature.
To determine which icSARS panel genes were verified across all signatures, we examined leading-edge genes identified by GSEA for each verification signature (Supplementary Table 4). We observed 51 genes from the positive icSARS panel and 22 genes from the negative icSARS panel were shared across all verification signatures. Of the 51 positive icSARS panel genes, we noted 10 genes had reported associations with icSARS infection via single-gene analysis with minimum fold change of 2.0 and maximum false discovery rate-corrected p-value of 0.05 between mock and icSARS time-matched samples (29): nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha (Entrez ID 4792), zinc finger CCCH-type containing 12A (ID 80149), cysteine-serine-rich nuclear protein 1 (ID 64651), zinc finger protein 433 (ID 163059), hairy and enhancer of split 1 from Drosophila (ID 3280), transformer 2 beta homolog (ID 6434), V-rel reticuloendotheliosis viral oncogene homolog B (ID 5971), pim-3 oncogene (ID 415116), chemokine (C-X-C motif) ligand 2 (ID 2920), and C-X-C motif chemokine ligand 10 (ID 3627). These data together supported the conclusion that our shared leading-edge genes were associated with icSARS infection in human lung cultures and supported the hypothesis that identified genes without previously reported associations were also associated with icSARS infection in human lung cultures.
Positive icSARS Gene Panel Enriched in Mouse-Derived icSARS Gene Signature
To determine if icSARS gene panels were also associated with response to an icSARS infection in a mouse model, we used GSEA to calculate enrichment between icSARS panels and the GSE50000-derived icSARSvsmock gene signature (Table 3). We observed significant enrichment with the positive icSARS panel (NES=1.54, GSEA p-value<0.001, Figure 3A) but not the negative icSARS panel (NES=0.81, p−value=0.848, Figure 3B). Achieved NES were non-random for the positive icSARS panel (NES range: -1.65 to 1.57, null distribution p-value=0.002, Figure 3C), but not for the negative icSARS panel (NES range: -1.72 to 1.64, p-value=0.601). These results suggested that genes in the positive icSARS panel were associated with an icSARS infection in both in human lung cultures and mouse lung samples, but the same cannot be concluded for the negative icSARS panel. Therefore, we did not consider leading-edge genes identified by GSEA from the negative icSARS panel.
Figure 3 Positive icSARS Panel Enrichment in icSARS Infected Mouse Model Revealed Genes Associated with icSARS Infection. (A) Gene Set Enrichment Analysis (GSEA) calculated enrichment, as determined by normalized enrichment score (NES), between the positive icSARS gene panel and the GSE50000-derived icSARSvsmock gene signature. (B) GSEA between the negative icSARS panel and the GSE50000-derived icSARSvsmock signature. (C) Distribution plot of NES from 1000 randomly generated gene panels (individual queries) compared to the GSE50000-derived icSARSvsmock signature (reference). (D) Venn diagram of the inclusion and overlap of positive icSARS panel genes in identified leading-edges and dataset platforms across icSARS-CoV human and mouse gene signatures. (E) Heat map of T−scores for the 20 positive icSARS panel leading-edge genes identified in (D).
When examining across leading-edge genes identified by GSEA from the positive icSARS panel (Supplementary Table 5), our meta-analysis approach found nine of the 51 genes verified across human lung cultures were also in the leading-edge from mouse lung samples (Figure 3D): IFN−induced protein with tetratricopeptide repeats 3 (human Entrez ID: 3437, mouse Entrez ID: 15959, gene symbol: IFIT3), C−X-C motif chemokine ligand 10 (human: 3627, mouse: 15945, symbol: CXCL10), XIAP associated factor 1 (human: 54739, mouse: 327959, symbol: XAF1), 2’-5’-oligoadenylate synthetase 3 100kDa (human: 4940, mouse: 246727, symbol: OAS3), zinc finger CCHC domain containing 2 (human: 54877, mouse: 227449, symbol: ZCCHC2), IFN−induced protein 35 (human: 3430, mouse: 70110, symbol: IFI35), yrdC domain containing (human: 79693, mouse: 230734, symbol: YRDC), poliovirus receptor-related 4 (human: 81607, mouse: 71740, symbol: PVRL4), and cyclin L1 (human: 57018, mouse: 56706, symbol: CCNL1). We also noted that several icSARS panel genes were not included in the mouse dataset’s platform (Supplementary Table 6) with nine of these genes verified across leading-edges for icSARS gene signatures derived from human cell cultures and two genes not included in all verification icSARS signatures. We did not exclude genes from further consideration based on platform inclusion. A heat map of differential gene expression (i.e., T-scores) for these 20 genes across all six icSARS gene signatures revealed that only XAF1 was statistically significant (Welch’s two-sampled, two-sided T−test p−value<0.05) in all icSARS gene signatures (Figure 3E), highlighting the advantage of using a GSEA-based rather than single-gene (e.g., T-score only) meta-analysis, which would have likely missed the other genes identified here because of borderline significance in at least one signature.
Positive icSARS Gene Panel Significantly Enriched Across Other SARS-CoV Strains
After identifying and verifying genes associated with an icSARS infection, we wanted to find differential gene expression similarities between icSARS infection and infections from other SARS-CoV stains with varying levels of virulence (29, 31–34). Identified gene similarities represented genes associated with a SARS-CoV infection regardless of virulence, and we hypothesized that shared genes may be useful targets clinically to preclude or overcome SARS infection. To identify genes associated with SARS-CoV infection across strains, we used GSEA to compare icSARS gene panels to 18 gene signatures derived from samples of human lung cultures or mouse lung samples mock or SARS-CoV infected with one of six strains (Urbani, MA15, dORF6, BAT-SRBD, dNSP16, and ExoNI; Table 2). Urbani and MA15 were fully virulent, while BAT, dORF6, dNSP and ExoNI were attenuated by established differences in host range or gene mutation mechanisms. These strains were selected so all available GEO datasets containing SARS-CoV and mock samples collected at 48hrs post infection were included in our meta-analysis. We found the positive icSARS panel significantly enriched (NES>1.66, GSEA p−value<0.01) across all 18 gene signatures (Figure 4A). We confirmed via random modelling as previously described for all icSARS signatures that achieved significant positive NES were non-random (null distribution p−value<0.002, Figure 4B), except for the GSE33266-derived MA15(102)vsmock signature which was the lowest infectious dose examined in this study. Due to this finding, we remove the MA15(102)vsmock signature from further analysis. The negative icSARS panel was significantly enriched across most signatures (NES<−1.52, p−value<0.01) except two BATvsmock signatures, the GSE47961-derived signature from human cultures (NES=-0.95, p−value=0.569) and the GSE50000-derived signature from mouse samples (NES=-0.81, p−value=0.846), and the GSE33266 MA15(103)vsmock signature from mouse samples (NES=0.90, p−value=0.635). Random modelling with gene sets the same size as the negative icSARS panel supported enrichment findings observed with the negative icSARS panel since enriched signatures were not randomly enriched (Figure 4C, p−value<0.002) but unenriched signatures were randomly enriched (p-value>0.075). As additional verification of our findings, we re-ran GSEA using the panels generated from GSE47961-derived icSARSvsmock (500-gene queries), and noticed this subtle change did not affect observed enrichment across 27 signatures (Supplementary Figure 2) Taken together, these data supported our previous findings of consistent enrichment of the positive icSARS panel and inconsistent enrichment of the negative icSARS panel across icSARS signatures, suggesting that genes shared across positive icSARS panel leading-edges were associated with SARS-CoV infections generally.
Figure 4 icSARS Panel Enrichment Detected Differential Gene Expression Similarities Across SARS Strains with Varying Virulen. (A) Heat map of Gene Set Enrichment Analysis (GSEA) calculated normalized enrichment scores (NES) of the positive and negative icSARS panels across SARS-CoV strains with varying levels of virulence in both human lung cultures and mouse lung samples. (B) Box and whisker plots of NES from 1000 randomly generated gene panels containing 233 genes (individual queries) compared to gene signatures (individual references) used in (A). (C) Box and whisker plots of NES from 1000 randomly generated gene panels containing 114 genes (individual queries) compared to gene signatures (individual references) used in (A).
Meta-Analysis of Leading-Edge Genes From Positive icSARS Panel GSEA Revealed Five Top Gene Candidates
To determine which positive icSARS panel genes were most associated with SARS-CoV infections, we compared inclusion of leading-edge genes identified by GSEA across 22 gene signatures derived from five SARS-CoV strains. We examined leading-edge gene inclusion in each SARS-CoV strain specifically (Figure 5A). Genes identified through leading-edge intersections represent genes associated with infection of that specific SARS-CoV strain. We noted 13 of the 61 leading-edge genes identified from the Urbanivsmock signature in human lung cultures were shared in leading-edges across the seven MA15vsmock signatures from mouse lung samples (Supplementary Tables 7, 8). There were 18 leading-edge genes from the Urbanivsmock signature that were not included across all platforms used to profile MA15 gene expressions and 10 positive icSARS panel genes not included in all platforms used to profile Urbani and MA15 gene expressions. We found 12 of the 49 leading-edge genes identified across the four BATvsmock signatures from human lung cultures were shared in BATvsmock signature leading-edge from mouse lung samples (Supplementary Tables 9, 10). There were 20 positive icSARS panel genes not included in all platforms used to profile BAT gene expressions that we included in further analysis with 10 of these genes shared across leading-edges from BATvsmock human culture signatures. Out of the 49 leading-edge genes identified across the four ORF6vsmock signatures from human lung cultures, 12 were shared with the ORF6vsmock signature leading-edge from mouse lung samples (Supplementary Tables 11, 12). There were 14 positive icSARS panel genes not included in all platforms used to profile ORF6 gene expressions. Finally, we found 43 of the 99 leading-edge genes identified in the NSP16vsmock signature from human lung cultures were shared in the NSP16vsmock signature leading-edge from mouse lung samples (Supplementary Tables 13, 14). There were 21 positive icSARS panel genes not included in all platforms used to profile NSP16 gene expressions. There were 73 leading-edge genes identified from the one ExoNIvsmock gene signature used in this study (Supplementary Table 15). Overall, we identified several genes with known associations to infection with specific SARS-CoV strains, such as fos and jun that were found in leading-edges from signatures derived from dORF6 (2) and Urbani (31) infections in human lung cell cultures. We also found several genes with no previously reported associations to that specific SARS infection. These findings were akin to findings from our meta-analysis of icSARS infection earlier that demonstrate the ability of our meta-analysis approach to identify genes associated with specific SARS infections and predict new genes associated with infections of specific SARS strains.
Figure 5 Meta-analysis Across 28 Gene Signatures Representing Seven SARS-CoV Strains Varying in Virulence Identified Five Over-Expressed Genes Associated with SARS-CoV Infection. (A) Venn diagrams of the inclusion and overlap of positive icSARS panel genes in identified leading-edges and dataset platforms from human and mouse gene signatures shared in individual strains of SARS-CoV. (B) Venn diagram of the inclusion and overlap of shared positive icSARS panel genes identified in SARS-CoV leading-edges across individual strains of SARS-CoV. (C) Heat map of T−scores for the five positive icSARS panel leading-edge genes identified in (B).
Next, we analyzed the intersection of common leading-edge genes across all six SARS strains examined in this study. We found five positive icSARS panel genes, IFN-induced protein with C−X−C motif chemokine ligand 10 (CXCL10), 2’-5’-oligoadenylate synthetase 3 (OAS3), 2’-5’-oligoadenylate synthetase-like (OASL), tetratricopeptide repeats 3 (IFIT3), and XIAP associated factor 1 (XAF1), in all 22 positive icSARS panel leading-edges (Figure 5B). As additional verification of our findings, we re-ran GSEA using the panels generated from GSE47961-derived icSARSvsmock (500-gene queries) and noticed this change did not affect the inclusion of our five gene candidates across leading-edges. We further noted all five gene candidates identified here were included in leading-edges identified by the 500 gene query set size at the 72hr and 96hr time points first examined in Supplementary Figure 1, suggesting that these gene candidates likely would have been identified if later time points were selected for use, thus not altering our overall result. Differential gene expression (i.e., T-scores) heat maps illustrated the strong consistency and extent of expression changes observed across gene signatures (Figures 3C, 5C), further supporting the conclusion that these five genes were associated with SARS-CoV infection regardless of strain.
Top Five Gene Candidates Also Associated With MERS-CoV and SARS-CoV2
We expanded our analysis to examine icSARS panel enrichment and inclusion of leading-edge genes identified by GSEA across five MERS-CoV gene signatures and three SARS-CoV2 signatures (Figure 6). Two MERS-CoV gene signatures and all SARS-CoV2 signatures were in human lung cultures while the other three MERS-CoV gene signatures were derived from mouse lung cultures with various inoculation doses (54). Using GSEA to calculate enrichment between icSARS panels and MERS-CoV, we found both icSARS panels significantly enriched (NES>1.97, GSEA p−value<0.001 for positive panel, NES<−1.68, p-value<0.003 for negative panel) across all five gene signatures (Figure 6A). For comparison to SARS-CoV2 gene signatures, we found the positive icSARS panel significantly enriched (NES>1.85, p-value<0.001) for two of the three signatures while the negative icSARS panel was not consistently enriched (NES<-1.13, p-value<0.210), which was not surprising since enrichment of the negative icSARS panel was inconsistent in icSARS signatures. We confirmed that achieved significant NES for the positive icSARS panel were non-random (null distribution p−value<0.02, Figure 6B) via random modelling for all signatures, which did not hold true for the negative icSARS panel (Figure 6C, p-value<0.573). Interestingly, we noted that NES for the positive icSARS panel compared to the GSE155518 signature was negative. To explain this result, we directed our inquiry to the depositor on record for GSE155518, who shared that the quality of the RNA collected for this dataset was reduced compared to their GSE160435 dataset, suggesting that the quality of RNA may impact the detection ability of our computational approach.
Figure 6 Five Over-Expressed Genes Identified in SARS-CoV Meta-analysis Found in Meta-analysis of MERS-CoV and SARS-CoV2 Signatures. (A) Heat map of Gene Set Enrichment Analysis calculated normalized enrichment scores for positive and negative icSARS panels across gene signatures derived from MERS-CoV and SARS-CoV2 infections in human or mouse lung cultures. (B) Box and whisker plots of normalized enrichment scores from 1000 randomly generated gene panels containing 233 genes (individual queries) compared to MERS-CoV and SARS-CoV2 gene signatures (individual references). (C) Box and whisker plots of normalized enrichment scores from 1000 randomly generated gene panels containing 114 genes (individual queries) compared to MERS-CoV and SARS-CoV2 gene signatures (individual references). (D) Venn diagram of the inclusion and overlap of positive icSARS panel genes in identified leading-edges and dataset platforms across MERS-CoV human and mouse gene signatures. (E) Venn diagram of the inclusion and overlap of shared positive icSARS panel genes in identified in SARS-CoV (Figure 6), MERS-CoV (from D), and SARS-CoV2 gene signatures. (F) Heat map of T−scores for the five positive icSARS panel leading-edge genes identified in (E).
When examining leading-edge gene inclusion in each SARS strain specifically (Figure 6D), we noted 32 of the 60 leading-edge genes identified from two MERSvsmock signatures in human lung cultures were shared in leading-edges across the three MERSvsmock signatures from mouse lung cultures (Supplementary Tables 16, 17). There were six leading-edge genes from MERSvsmock signatures in human cultures that were not included across all platforms used to profile MERS gene expressions. For SARS-CoV2 signatures, there were 44 leading-edge genes shared across the three signatures (Supplementary Table 18). These 44 genes are detailed in Table 6. When looking at leading-edges across all SARS strains examined in this study, we again found the same five gene candidates with strong consistency and extent of expression changes observed across gene signatures as determined by T-score (Figure 6E), supporting the conclusion that these five genes were associated with SARS infection overall.
Discussion
SARS infections remain a serious public health threat due to their strong pandemic causing potential. While efforts have gone into developing effective therapeutics for SARS-infected patients, treatment options were still limited, due in part to an incomplete understanding of the molecular changes driving SARS infections. Identification of differentially expressed genes associated with SARS infections can improve our understanding of SARS-induced molecular changes, potentially contributing to the development of new therapeutic options to use in the fight against SARS and future SARS outbreaks. This work performed a meta-analysis of gene signatures generated from mRNA expression data across SARS-CoV, MERS-CoV, and SARS-CoV2 infections to reveal differentially expressed genes associated with SARS infections.
Among genes identified by our GSEA-based meta-analysis approach, we found five IFN-inducible gene candidates, CXCL10, OAS3, OASL, IFIT3, and XAF1, stood out consistently across SARS strains. CXCL10 was an IFN γ-induced protein with a strong connection to inflammatory and infectious diseases including viral infections (62). CXCL10 was associated with infections of several SARS strains, specifically icSARS (29) and Urbani (31) in human lung cell cultures, MA15 in mouse lung samples (34), and SARS-CoV2 in clinical bronchoalveolar lavage fluid and plasma samples (63). Also, increased levels of CXCL10 were associated with acute respiratory distress syndrome (ARDS), a clinical result of the cytokine storm frequently described across SARS infections, especially SARS-CoV2 where high CXCL10 levels have been implicated in increased disease severity and poorer patient outcomes (24, 63–66). The two OAS genes (OAS3 and OASL) were anti-viral restriction factors (67). Both OAS genes had reported associations with Urbani infections in human lung cell cultures (31). Further, associations between the OAS pathway and MERS-CoV and SARS-CoV2 infections have been reported (68) and single nucleotide polymorphisms in OAS genes were found to be involved in the protective effects of Neandertal haplotypes against SARS-CoV2 (69, 70). IFIT3 was one of four IFN-induced proteins with tetratricopeptide repeats whose expression was greatly enhanced by viral infection, IFN treatment, and pathogen-associated molecular patterns (71). While IFIT3 was not specifically named in published SARS-CoV reports examining these datasets used in this study, IFIT1 had reported associations with MA15 infections in mouse lung samples (33, 34). All IFIT proteins including IFIT3 have been connected to SARS-CoV2 infection in human lung cell cultures and samples (28). XAF1 was an apoptotic gene whose reduced or absent expression in tumor samples and cell lines leads to poorer survival in gastric adenocarcinomas (72, 73). This paper was the first study to report an association between XAF1 and SARS-CoV infections for any strain, to the best of our knowledge, though a recent report examining differential gene expression associated with SARS-CoV-2 infection identified increased expression of CXCL10, OAS3, IFIT3, and XAF1 in human epithelial lung cells and lung samples from Cynomolgus maca (cynomolgus monkey) and mice (35). Taken together, these results supported the conclusion that targeting IFN response therapeutically, particularly one or more of these five identified genes, might improve outcomes for patients with SARS infections. Our results supported recent reports detailing the connections between SARS-CoV2 and type I IFN response, including one report demonstrating that pretreatment with IFNβ protected both Calu3 and Caco2 (human intestinal epithelium) cells against SARS-CoV2 infection (26, 28). Several reports also examine the success of IFN therapy for SARS-CoV2 patients with promising results though questions around timing, type of IFN, and administration route remain debatable (21, 25–27, 74). While these five genes were noted among those identified by pathway enrichment analysis like the one performed here using GO (Supplementary Table 3), our meta-analysis approach improves upon the existing method by refining the gene candidate list and examining a larger number of datasets.
In our study, we observed our meta-analysis approach had threshold of gene detection limits that may have biological implications. For example, we failed to observe non-random enrichment in the GSE33266 MA15(102)vsmock signature, which was the lowest inoculation dose used in this meta-analysis (Figure 4B). While we removed the GSE33266 MA15(102)vsmock signature from inclusion in our meta-analysis because its observed enrichment was not statistically different from NES achieved randomly, we noted that IFIT3 was the only top candidate in the leading-edge of the MA15(102)vsmock signature and IFIT3 was not statistically significant individually (Welch’s T-test p−value=0.160). These findings suggest there was a lower inoculation dose limit to our approach’s ability to detect relevant genes, which should be considered when applying our meta-analysis to future research. Further, since gene expression has been known to change over a time course (2), we repeated the process of defining icSARS gene panels over a range of time points (24, 48, 72, and 96hrs) and found consistent enrichment and identification of gene candidates for all timepoints but the 24hr timepoint (Supplementary Figure 1). This finding suggested that gene expression changes generated by an icSARS infection become more predictable at later time points. These gene detection threshold limits were interesting considering recent reports of that SARS-CoV2 suppresses IFN response in early infection stages to establish infection and early IFN treatments of SARS-CoV2 patients improved mortality (21, 27, 28). We cannot rule out the idea that the threshold of gene detection limits observed in our study were not the result of IFN therapeutic limitations previously reported.
While this meta-analysis revealed genes with already well-established associations to SARS infections, this purely bioinformatic work was limited by a lack of direct experimental evidence. We were unable to conduct follow-up experiments using other techniques, such as Western blotting or qRT-PCR, to confirm our top gene candidate predictions that were generated only from mRNA expression data. Further, datasets selected for this study contained only human lung cultures or mouse lung samples due to their public availability at the time this study was conducted and inclusion of 48hr time point data. Based on results from this study, gene expression data from IFN-treated and untreated SARS-CoV2 patients would be of particular interest for future studies. We predict IFNtreatedSARS2vsuntreatedSARS2 gene signatures would be reversed (i.e., positive icSARS panel achieves significant enrichment with a negative NES), indicating IFN-treated samples looked more like untreated cultures and samples from this study, further supporting the conclusion of targeting IFN as viable therapeutic option for SARS infections.
Conclusion
This work used mRNA expression data to predict genes associated with SARS infections through a meta-analysis examining gene signatures. Through our GSEA-based meta-analysis approach, we identified five over-expressed IFN-inducible genes, CXCL10, OAS3, OASL, IFIT3, and XAF1, as being most associated with SARS infections. We concluded from this finding that targeting type I IFN response either in a stand−alone or combination therapy, particularly the five genes identified here, might improve treatment options for SARS and other β−coronavirus infections. Our conclusion supported prior reports of successful outcomes from IFN therapy in patients with less severe SARS infections and at earlier time points. Overall, this work demonstrated the gene detection ability and reproducibility of our meta-analysis approach and presents it as a useful computational approach through application on mRNA expression data from SARS and mock infected cultures and samples.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.
Author Contributions
LH lead manuscript preparation. AP generated the supplemental materials and time course data. All authors contributed to the article and approved the submitted version.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Acknowledgments
The authors would like to thank Jared Geller for the pathway analysis and Terri Pulice for the graphic design assistance.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2021.694355/full#supplementary-material
References
1. Li X, Luk HKH, Lau SKP, Woo PCY. Human Coronaviruses: General Features. Reference Module Biomed Sci (2019) 2019:B978–970-912-801238-801233.895704-801230. doi: 10.1016/B978-0-12-801238-3.95704-0
2. Sims AC, Tilton SC, Menachery VD, Gralinski LE, Schafer A, Matzke MM, et al. Release of Severe Acute Respiratory Syndrome Coronavirus Nuclear Import Block Enhances Host Transcription in Human Lung Cells. J Virol (2013) 87(7):3885–902. doi: 10.1128/JVI.02520-12
3. Yount B, Roberts RS, Sims AC, Deming D, Frieman MB, Sparks J, et al. Severe Acute Respiratory Syndrome Coronavirus Group-Specific Open Reading Frames Encode Nonessential Functions for Replication in Cell Cultures and Mice. J Virol (2005) 79(23):14909–22. doi: 10.1128/JVI.79.23.14909-14922.2005
4. Boulos MN. Descriptive Review of Geographic Mapping of Severe Acute Respiratory Syndrome (SARS) on the Internet. Int J Health Geogr (2004) 3(1):2. doi: 10.1186/1476-072X-3-2
5. Lai ST. Treatment of Severe Acute Respiratory Syndrome. Eur J Clin Microbiol Infect Dis (2005) 24(9):583–91. doi: 10.1007/s10096-005-0004-z
6. Petrosillo N, Viceconte G, Ergonul O, Ippolito G, Petersen E. COVID-19, SARS and MERS: Are They Closely Related? Clin Microbiol Infect (2020) 26(6):729–34. doi: 10.1016/j.cmi.2020.03.026
7. Bahadur S, Long W, Shuaib M. Human Coronaviruses With Emphasis on the COVID-19 Outbreak. Virusdisease (2020) 31(2):1–5. doi: 10.1007/s13337-020-00594-y
8. Bhattacharya M, Sharma AR, Mallick B, Sharma G, Lee SS, Chakraborty C. Immunoinformatics Approach to Understand Molecular Interaction Between Multi-Epitopic Regions of SARS-CoV-2 Spike-Protein With TLR4/MD-2 Complex. Infect Genet Evol (2020) 85:104587. doi: 10.1016/j.meegid.2020.104587
9. Rabaan AA, Al-Ahmed SH, Haque S, Sah R, Tiwari R, Malik YS, et al. SARS-CoV-2, SARS-CoV, and MERS-COV: A Comparative Overview. Infez Med (2020) 28(2):174–84.
10. Yi Y, Lagniton PNP, Ye S, Li E, Xu RH. COVID-19: What has Been Learned and to be Learned About the Novel Coronavirus Disease. Int J Biol Sci (2020) 16(10):1753–66. doi: 10.7150/ijbs.45134
11. Chen Z, Hu J, Liu L, Zhang Y, Liu D, Xiong M, et al. Clinical Characteristics of Patients With Severe and Critical COVID-19 in Wuhan: A Single-Center, Retrospective Study. Infect Dis Ther (2021) 10(1):421–38. doi: 10.1007/s40121-020-00379-2
12. Dong E, Du H, Gardner L. An Interactive Web-Based Dashboard to Track COVID-19 in Real Time. Lancet Infect Dis (2020) 20(5):533–4. doi: 10.1016/S1473-3099(20)30120-1
13. Baric RS. Emergence of a Highly Fit SARS-CoV-2 Variant. N Engl J Med (2020) 383(27):2684–6. doi: 10.1056/NEJMcibr2032888
14. Wu Y, Ho W, Huang Y, Jin DY, Li S, Liu SL, et al. SARS-CoV-2 is an Appropriate Name for the New Coronavirus. Lancet (2020) 395(10228):949–50. doi: 10.1016/S0140-6736(20)30557-2
15. Kirby T. New Variant of SARS-CoV-2 in UK Causes Surge of COVID-19. Lancet Respir Med (2021) 9(2):e20–1. doi: 10.1016/S2213-2600(21)00005-9
16. Yu WC, Hui DS, Chan-Yeung M. Antiviral Agents and Corticosteroids in the Treatment of Severe Acute Respiratory Syndrome (SARS). Thorax (2004) 59(8):643–5. doi: 10.1136/thx.2003.017665
17. Saha A, Sharma AR, Bhattacharya M, Sharma G, Lee SS, Chakraborty C. Probable Molecular Mechanism of Remdesivir for the Treatment of COVID-19: Need to Know More. Arch Med Res (2020) 51(6):585–6. doi: 10.1016/j.arcmed.2020.05.001
18. Singh AK, Singh A, Singh R, Misra A. Remdesivir in COVID-19: A Critical Review of Pharmacology, Pre-Clinical and Clinical Studies. Diabetes Metab Syndr (2020) 14(4):641–8. doi: 10.1016/j.dsx.2020.05.018
19. Williamson BN, Feldmann F, Schwarz B, Meade-White K, Porter DP, Schulz J, et al. Clinical Benefit of Remdesivir in Rhesus Macaques Infected With SARS-CoV-2. Nature (2020) 585(7824):273–6. doi: 10.1038/s41586-020-2423-5
20. Modjarrad K. Treatment Strategies for Middle East Respiratory Syndrome Coronavirus. J Virus Erad (2016) 2(1):1–4. doi: 10.1016/S2055-6640(20)30696-8
21. Saha RP, Sharma AR, Singh MK, Samanta S, Bhakta S, Mandal S, et al. Repurposing Drugs, Ongoing Vaccine, and New Therapeutic Development Initiatives Against COVID-19. Front Pharmacol (2020) 11:1258. doi: 10.3389/fphar.2020.01258
22. Chakraborty C, Sharma AR, Bhattacharya M, Sharma G, Lee SS, Agoramoorthy G. COVID-19: Consider IL-6 Receptor Antagonist for the Therapy of Cytokine Storm Syndrome in SARS-CoV-2 Infected Patients. J Med Virol (2020) 92(11):2260–2. doi: 10.1002/jmv.26078
23. Pooladanda V, Thatikonda S, Godugu C. The Current Understanding and Potential Therapeutic Options to Combat COVID-19. Life Sci (2020) 254:117765. doi: 10.1016/j.lfs.2020.117765
24. Chakraborty C, Sharma AR, Sharma G, Bhattacharya M, Lee SS. SARS-CoV-2 Causing Pneumonia-Associated Respiratory Disorder (COVID-19): Diagnostic and Proposed Therapeutic Options. Eur Rev Med Pharmacol Sci (2020) 24(7):4016–26. doi: 10.26355/eurrev_202004_20871
25. Haji Abdolvahab M, Moradi-Kalbolandi S, Zarei M, Bose D, Majidzadeh AK, Farahmand L. Potential Role of Interferons in Treating COVID-19 Patients. Int Immunopharmacol (2021) 90:107171. doi: 10.1016/j.intimp.2020.107171
26. Shuai H, Chu H, Hou Y, Yang D, Wang Y, Hu B, et al. Differential Immune Activation Profile of SARS-CoV-2 and SARS-CoV Infection in Human Lung and Intestinal Cells: Implications for Treatment With IFN-Beta and IFN Inducer. J Infect (2020) 81(4):e1–e10. doi: 10.1016/j.jinf.2020.07.016
27. Calabrese LH, Lenfant T, Calabrese C. Interferon Therapy for COVID-19 and Emerging Infections: Prospects and Concerns. Cleve Clin J Med (2020). doi: 10.3949/ccjm.87a.ccc066
28. Sa Ribero M, Jouvenet N, Dreux M, Nisole S. Interplay Between SARS-CoV-2 and the Type I Interferon Response. PloS Pathog (2020) 16(7):e1008737. doi: 10.1371/journal.ppat.1008737
29. Mitchell HD, Eisfeld AJ, Sims AC, McDermott JE, Matzke MM, Webb-Robertson BJ, et al. A Network Integration Approach to Predict Conserved Regulators Related to Pathogenicity of Influenza and SARS-CoV Respiratory Viruses. PloS One (2013) 8(7):e69374. doi: 10.1371/journal.pone.0069374
30. Gralinski LE, Bankhead A, Jeng S, VD M, Proll S, SE B, et al. Mechanisms of Severe Acute Respiratory Syndrome Coronavirus-Induced Acute Lung Injury. mBio (2013) 4(4):e00271–13. doi: 10.1128/mBio.00271-13
31. Yoshikawa T, Hill TE, Yoshikawa N, Popov VL, Galindo CL, Garner HR, et al. Dynamic Innate Immune Responses of Human Bronchial Epithelial Cells to Severe Acute Respiratory Syndrome-Associated Coronavirus Infection. PloS One (2010) 5(1):e8729. doi: 10.1371/journal.pone.0008729
32. Menachery VD, Gralinski LE, Mitchell HD, Dinnon KH 3rd, SR L, Yount BL Jr., et al. Combination Attenuation Offers Strategy for Live Attenuated Coronavirus Vaccines. J Virol (2018) 92(17):e00710–18. doi: 10.1128/JVI.00710-18
33. Totura AL, Whitmore A, Agnihothram S, Schafer A, Katze MG, Heise MT, et al. Toll-Like Receptor 3 Signaling via TRIF Contributes to a Protective Innate Immune Response to Severe Acute Respiratory Syndrome Coronavirus Infection. mBio (2015) 6(3):e00638–00615. doi: 10.1128/mBio.00638-15
34. Gralinski LE, Sheahan TP, Morrison TE, Menachery VD, Jensen K, Leist SR, et al. Complement Activation Contributes to Severe Acute Respiratory Syndrome Coronavirus Pathogenesis. mBio (2018) 9(5):e01753–18. doi: 10.1128/mBio.01753-18
35. Hachim MY, Al Heialy S, Hachim IY, Halwani R, Senok AC, Maghazachi AA, et al. Interferon-Induced Transmembrane Protein (IFITM3) Is Upregulated Explicitly in SARS-CoV-2 Infected Lung Epithelial Cells. Front Immunol (2020) 11:1372. doi: 10.3389/fimmu.2020.01372
36. Alsamman AM, Zayed H. The Transcriptomic Profiling of SARS-CoV-2 Compared to SARS, MERS, EBOV, and H1N1. PloS One (2020) 15(12):e0243270. doi: 10.1371/journal.pone.0243270
37. Ochsner SA, Pillich RT, McKenna NJ. Consensus Transcriptional Regulatory Networks of Coronavirus-Infected Human Cells. Sci Data (2020) 7(1):314. doi: 10.1038/s41597-020-00628-6
38. Jang Y, Seo SH. Gene Expression Pattern Differences in Primary Human Pulmonary Epithelial Cells Infected With MERS-CoV or SARS-CoV-2. Arch Virol (2020) 165(10):2205–11. doi: 10.1007/s00705-020-04730-3
39. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. Proc Natl Acad Sci USA (2005) 102(43):15545–50. doi: 10.1073/pnas.0506580102
40. Gardinassi LG, Souza COS, Sales-Campos H, Fonseca SG. Immune and Metabolic Signatures of COVID-19 Revealed by Transcriptomics Data Reuse. Front Immunol (2020) 11:1636. doi: 10.3389/fimmu.2020.01636
41. Goad B, Harris LK. Identification and Prioritization of Macrolide Resistance Genes With Hypothetical Annotation in Streptococcus Pneumoniae. Bioinformation (2018) 14(9):488–98. doi: 10.6026/97320630014488
42. Mohan R, Venugopal S. Computational Structural and Functional Analysis of Hypothetical Proteins of Staphylococcus Aureus. Bioinformation (2012) 8(15):722–8. doi: 10.6026/97320630008722
43. Ijaq J, Chandrasekharan M, Poddar R, Bethi N, Sundararajan VS. Annotation and Curation of Uncharacterized Proteins- Challenges. Front Genet (2015) 6:119. doi: 10.3389/fgene.2015.00119
44. Bharat Siva Varma P, Adimulam YB, Kodukula S. In Silico Functional Annotation of a Hypothetical Protein From Staphylococcus Aureus. J Infect Public Health (2015) 8(6):526–32. doi: 10.1016/j.jiph.2015.03.007
45. School K, Marklevitz J, W KS, L KH. Predictive Characterization of Hypothetical Proteins in Staphylococcus Aureus NCTC 8325. Bioinformation (2016) 12(3):209–20. doi: 10.6026/97320630012209
46. Sivashankari S, Shanmughavel P. Functional Annotation of Hypothetical Proteins - A Review. Bioinformation (2006) 1(8):335–8. doi: 10.6026/97320630001335
47. Islam MS, Shahik SM, Sohel M, Patwary NI, Hasan MA. In Silico Structural and Functional Annotation of Hypothetical Proteins of Vibrio Cholerae O139. Genomics Inform (2015) 13(2):53–9. doi: 10.5808/GI.2015.13.2.53
48. Kolker E, Makarova KS, Shabalina S, Picone AF, Purvine S, Holzman T, et al. Identification and Functional Analysis of ‘Hypothetical’ Genes Expressed in Haemophilus Influenzae. Nucleic Acids Res (2004) 32(8):2353–61. doi: 10.1093/nar/gkh555
49. Omeershffudin UNM, Kumar S. In Silico Approach for Mining of Potential Drug Targets From Hypothetical Proteins of Bacterial Proteome. Int J Mol Biol Open Access (2019) 4(4):145–52. doi: 10.15406/ijmboa.2019.04.00111
50. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: Archive for Functional Genomics Data Sets–Update. Nucleic Acids Res (2013) 41(Database issue):D991–995. doi: 10.1093/nar/gks1193
51. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, et al. NCBI GEO: Archive for Functional Genomics Data Sets–10 Years on. Nucleic Acids Res (2011) 39(Database issue):D1005–1010. doi: 10.1093/nar/gkq1184
52. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI Gene Expression and Hybridization Array Data Repository. Nucleic Acids Res (2002) 30(1):207–10. doi: 10.1093/nar/30.1.207
53. Becker MM, Graham RL, Donaldson EF, Rockx B, Sims AC, Sheahan T, et al. Synthetic Recombinant Bat SARS-Like Coronavirus is Infectious in Cultured Cells and in Mice. Proc Natl Acad Sci USA (2008) 105(50):19944–9. doi: 10.1073/pnas.0808116105
54. Katsura H, Sontake V, Tata A, Kobayashi Y, Edwards CE, Heaton BE, et al. Human Lung Stem Cell-Based Alveolospheres Provide Insights Into SARS-CoV-2-Mediated Interferon Responses and Pneumocyte Dysfunction. Cell Stem Cell (2020) 27(6):890–904 e898. doi: 10.1016/j.stem.2020.10.005
55. Huang da W, Sherman BT, Lempicki RA. Bioinformatics Enrichment Tools: Paths Toward the Comprehensive Functional Analysis of Large Gene Lists. Nucleic Acids Res (2009) 37(1):1–13. doi: 10.1093/nar/gkn923
56. Huang da W, Sherman BT, Lempicki RA. Systematic and Integrative Analysis of Large Gene Lists Using DAVID Bioinformatics Resources. Nat Protoc (2009) 4(1):44–57. doi: 10.1038/nprot.2008.211
58. XLSTAT A. Data Analysis and Statistics Software for Microsoft Excel. New York, NY: Addinsoft (2013).
59. Fung TS, Liu DX. Human Coronavirus: Host-Pathogen Interaction. Annu Rev Microbiol (2019) 73:529–57. doi: 10.1146/annurev-micro-020518-115759
60. Mizutani T, Fukushi S, Saijo M, Kurane I, Morikawa S. Phosphorylation of P38 MAPK and its Downstream Targets in SARS Coronavirus-Infected Cells. Biochem Biophys Res Commun (2004) 319(4):1228–34. doi: 10.1016/j.bbrc.2004.05.107
61. Wehbe Z, Hammoud S, Soudani N, Zaraket H, El-Yazbi A, Eid AH. Molecular Insights Into SARS COV-2 Interaction With Cardiovascular Disease: Role of RAAS and MAPK Signaling. Front Pharmacol (2020) 11(836). doi: 10.3389/fphar.2020.00836
62. Liu M, Guo S, Hibbert JM, Jain V, Singh N, Wilson NO, et al. CXCL10/IP-10 in Infectious Diseases Pathogenesis and Potential Therapeutic Implications. Cytokine Growth Factor Rev (2011) 22(3):121–30. doi: 10.1016/j.cytogfr.2011.06.001
63. Blot M, Jacquier M, Aho Glele L-S, Beltramo G, Nguyen M, Bonniaud P, et al. CXCL10 Could Drive Longer Duration of Mechanical Ventilation During COVID-19 ARDS. Crit Care (2020) 24(1):632. doi: 10.1186/s13054-020-03328-0
64. Coperchini F, Chiovato L, Croce L, Magri F, Rotondi M. The Cytokine Storm in COVID-19: An Overview of the Involvement of the Chemokine/Chemokine-Receptor System. Cytokine Growth Factor Rev (2020) 53:25–32. doi: 10.1016/j.cytogfr.2020.05.003
65. Wang J, Jiang M, Chen X, Montaner LJ. Cytokine Storm and Leukocyte Changes in Mild Versus Severe SARS-CoV-2 Infection: Review of 3939 COVID-19 Patients in China and Emerging Pathogenesis and Therapy Concepts. J Leukoc Biol (2020) 108(1):17–41. doi: 10.1002/JLB.3COVR0520-272R
66. Vaninov N. In the Eye of the COVID-19 Cytokine Storm. Nat Rev Immunol (2020) 20(5):277–7. doi: 10.1038/s41577-020-0305-6
67. Ibsen MS, Gad HH, Thavachelvam K, Boesen T, Despres P, Hartmann R. The 2’-5’-Oligoadenylate Synthetase 3 Enzyme Potently Synthesizes the 2’-5’-Oligoadenylates Required for RNase L Activation. J Virol (2014) 88(24):14222–31. doi: 10.1128/JVI.01763-14
68. Pairo-Castineira E, Clohisey S, Klaric L, Bretherick AD, Rawlik K, Pasko D, et al. Genetic Mechanisms of Critical Illness in COVID-19. Nature (2021) 591(7848):92–8. doi: 10.1101/2020.09.24.20200048
69. Zhou S, Butler-Laporte G, Nakanishi T, Morrison DR, Afilalo J, Afilalo M, et al. A Neanderthal OAS1 Isoform Protects Individuals of European Ancestry Against COVID-19 Susceptibility and Severity. Nat Med (2021) 27(4):659–67. doi: 10.1038/s41591-021-01281-1
70. Zeberg H, Paabo S. A Genomic Region Associated With Protection Against Severe COVID-19 is Inherited From Neandertals. Proc Natl Acad Sci USA (2021) 118(9):e2026309118. doi: 10.1073/pnas.2026309118
71. Pidugu VK, Pidugu HB, Wu MM, Liu CJ, Lee TC. Emerging Functions of Human IFIT Proteins in Cancer. Front Mol Biosci (2019) 6:148. doi: 10.3389/fmolb.2019.00148
72. Shibata T, Noguchi T, Takeno S, Gabbert HE, Ramp U, Kawahara K. Disturbed XIAP and XAF1 Expression Balance is an Independent Prognostic Factor in Gastric Adenocarcinomas. Ann Surg Oncol (2008) 15(12):3579–87. doi: 10.1245/s10434-008-0062-4
73. Xia Y, Novak R, Lewis J, Duckett CS, Phillips AC. Xaf1 can Cooperate With TNFalpha in the Induction of Apoptosis, Independently of Interaction With XIAP. Mol Cell Biochem (2006) 286(1-2):67–76. doi: 10.1007/s11010-005-9094-2
Keywords: SARS-CoV, SARS-CoV2, coronavirus, gene expression, meta-analysis, gene set enrichment analysis, MERS-CoV
Citation: Park A and Harris LK (2021) Gene Expression Meta-Analysis Reveals Interferon-Induced Genes Associated With SARS Infection in Lungs. Front. Immunol. 12:694355. doi: 10.3389/fimmu.2021.694355
Received: 13 April 2021; Accepted: 05 July 2021;
Published: 23 July 2021.
Edited by:
Vijay Kumar, University of Tennessee Health Science Center (UTHSC), United StatesCopyright © 2021 Park and Harris. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Laura K. Harris, oesterei@msu.edu