- Department of Biology, The University of Winnipeg, Winnipeg, MB, Canada
Genome-wide assays of expression between species and their hybrids have identified genes that become either over- or underexpressed relative to the parental species (i.e., transgressive). Transgressive expression in hybrids is of interest because it highlights possible changes in gene regulation linked to hybrid dysfunction. Previous studies in Drosophila that used long-diverged species pairs with complete or nearly complete isolation (i.e., full sterility and partial inviability of hybrids) and high-levels of genome misregulation have found correlations between expression and coding sequence divergence. The work highlighted the possible effects of directional selection driving sequence divergence and transgressive expression. Whether the same is true for taxa at early stages of divergence that have only achieved partial isolation remains untested. Here, we reanalyze previously published genome expression data and available genome sequence reads from a pair of partially isolated subspecies of Drosophila to compare expression and sequence divergence. We find a significant correlation in rates of expression and sequence evolution, but no support for directional selection driving transgressive expression in hybrids. We find that most transgressive genes in hybrids show no differential expression between parental subspecies and used SNP data to explore the role of stabilizing selection through compensatory mutations. We also examine possible misregulation through cascade effects that could be driven by interacting gene networks or co-option of off-target cis-regulatory elements.
Introduction
Studies that have addressed the genetic basis of incompatibilities in hybrids between species, or diverging populations, have traditionally resorted to mapping loci and interactions between them (Coyne and Orr, 1989; Masly and Presgraves, 2007; Presgraves, 2008; Victoria Cattani and Presgraves, 2012; Dufresnes et al., 2016). This approach has been fruitful in that ultimately a few major protein-coding genes have been identified (Ting et al., 1998; Masly et al., 2006; Mihola et al., 2009; Phadnis and Allen Orr, 2009), and in some cases, the effect of these major genes require interactions with other genetic factors. Major genes often show patterns of rapid evolution between divergent populations or species (Ting et al., 1998; Presgraves et al., 2003; Maheshwari and Barbash, 2011) suggesting that, at least in part, changes in protein composition might exert effects on phenotype and function through alterations in patterns of expression of genes targeted by such proteins. Moreover, genome-wide surveys have provided evidence to support that many genes and complex systems of epistasis are linked to hybrid incompatibility phenotypes (Morán and Fontdevila, 2014; Turner and Harr, 2014; Turner et al., 2014; Fontdevila, 2016). While co-evolution among interacting genes keeps function within populations and species, hybridization between divergent isolated populations and incipient species brings together incompatible interloci allele interactions resulting in a reduction in hybrid fitness (Dobzhansky, 1937; Muller, 1942; Orr, 1996). The reduced fitness of hybrids serves as a postzygotic barrier among divergent taxa.
The role of divergence in the regulation of gene expression has been long acknowledged (King and Wilson, 1975), but not until recently has genome-wide divergence in gene expression during speciation been addressed. Recent reviews have summarized how changes in gene expression could impact hybrid phenotypes (Civetta, 2016; Mack and Nachman, 2017). Using genome-wide approaches, questions have been addressed as to the proportion of genome-wide misregulation in hybrids, the relative contribution of cis- vs. trans-regulatory elements in gene misregulation, and the identity of misregulated genes that might contribute to hybrid fitness breakdown (Ranz et al., 2004; Haerty and Singh, 2006; Renaut et al., 2009; Tirosh et al., 2009; McManus et al., 2010; Llopart, 2012; Coolon et al., 2014; Gomes and Civetta, 2015; Brill et al., 2016; Mack et al., 2016). Often, genome-wide assays of expression in hybrids reveal gene regulatory dysfunctions as patterns of transgressive gene expression (i.e., expression beyond levels found in parental species). This can be a consequence of directional selection or drift causing changes at cis- and trans-regulatory elements that drive divergence in expression between taxa and transgressive expression in hybrids. Previous studies have found positive correlations between protein-coding evolution and gene expression divergence between species of Drosophila (Castillo-Davis et al., 2004; Nuzhdin et al., 2004; Lemos et al., 2005; Artieri et al., 2007). Moreover, the finding of a similar significant positive correlation between non-synonymous (dN) and non-synonymous/synonymous (dN/dS) divergence and gene expression differences between hybrids and parental species has been used to suggest sequence divergence driving regulatory incompatibilities and to highlight the potential effects of directional selection in gene expression during speciation (Artieri et al., 2007; Hunt et al., 2013). However, the species pairs used were typically long diverged with hybrids exhibiting complete or nearly complete isolation and high levels of genome misregulation (Ranz et al., 2004; Haerty and Singh, 2006; Artieri et al., 2007; McManus et al., 2010; Coolon et al., 2014). The use of divergent populations within species of copepods have found no significant relationship between hybrid transgressive expression and estimates of sequence divergence, and the authors offered an alternative physiological explanation for the detected pattern (Barreto et al., 2015, 2018).
There are, in fact, alternative explanations that could explain the lack of relationship between sequence and expression divergence. Mutations within taxa can work to compensate the effect of deleterious mutations on expression (i.e., stabilizing selection). The possibility that cis–trans mutations may cause compensation within species but lead to transgressive expression in hybrids is supported by studies that report abundant cis–trans epistasis (Mackay, 2014; Mackay and Moore, 2014; He et al., 2016; Vonesch et al., 2016). However, the strength of selection for a secondary compensatory mutation might be small (Bourguet, 1999). It is also possible for transgressive expression in hybrids to arise as a response to hybrid dysfunction within gene-interacting networks or metabolic pathways. While this could work to ameliorate fitness problems in hybrids, it could also exacerbate hybrid dysfunction. This might be particularly the case for fitness breakdown between diverging populations (Barreto et al., 2015, 2018). Finally, we speculate that newly arising mutations in trans regulatory elements that result from divergence between taxa or compensatory mutations within, could co-opt preexisting cis-regulatory elements among multiple genes thereby causing widespread misregulation.
Here, we used a pair of geographically separated subspecies of D. pseudoobscura, D. p. pseudoobscura, and D. p. bogotana that have diverged for at least 0.15 Myr (Schaeffer and Miller, 1991; Wang et al., 1997) and whose hybrids exhibit unidirectional male sterility where only male hybrids produced by D. p. bogotana females are sterile. We reanalyze previously published transcriptomics data (Gomes and Civetta, 2015) using a newer D. p. pseudoobscura genome release (r3.04) and updated mapping and expression analysis tools to explore relationships between genome expression and gene-coding sequence divergence. Our report identifies no relationship between sequence divergence and transgressive expression in hybrids suggesting a need for broader examinations of transgressive expression between recently diverged populations and species across taxa. We find that most transgressive genes in hybrids are not differentially expressed between subspecies. We explore explanations for transgressive expression other than incompatibilities in regulation arising from rapid divergence between subspecies, such as compensatory mutations, gene-interaction networks, and the co-option of multiple cis-regulatory elements by trans-regulatory elements. While we find some support for these alternative hypotheses, we acknowledge that they do not fully explain transgressive expression in hybrids. We discuss some caveats and offer other possible explanations in the hope that they will trigger further inquiry. Ultimately, full comprehension of transgressive expression in hybrids will require combining information on genome expression and sequencing with the identification of interactomes and a proper characterization of mechanism of trans effects on characterized cis-regulatory targets.
Materials and Methods
RNA-Sequence Data
Raw RNA sequence data used in this analysis were from a genome-wide transcriptomics study of the Drosophila pseudoobscura subspecies pair and their reciprocal hybrids by Gomes and Civetta (2015). Briefly, RNA was extracted from the whole male reproductive tract. Biological replicates were obtained for the parental subspecies and their reciprocal F1 hybrids with each replicate containing 30–40 male reproductive tracts. cDNA libraries were prepared using the Illumina TruSeq Stranded mRNA sample preparation kit and multiplexed on a single lane of an Illumina HiSeq2000 platform with 100 bp paired-end sequencing. A quality check was performed on the raw reads using FastQC (Andrews, 2010). Read processing and adapter trimming were performed with Trimmomatic (Bolger et al., 2014), and reads with a Phred score below 30 and a final length of 50 bp were excluded.
Mapping and Differential Expression Analysis
We mapped processed reads to the latest release (r3.04) of the D. p. pseudoobscura reference genome1 using STAR, chosen for its reliability (Dobin et al., 2013; Baruzzo et al., 2017) over the previously used TopHat approach (Gomes and Civetta, 2015). Read counting was performed at the gene level using featureCounts (Liao et al., 2014) with the reversely stranded (-s 2) and fragment counting (-p) parameters and the latest version of the D. p. pseudoobscura annotation serving as a guide.
Pairwise differential expression across all groups was performed using both DESeq2 (Love et al., 2014) and edgeR (Robinson et al., 2009). In the analysis using edgeR, genes with less than 1 count per million (CPM) in at least one group were excluded from further analysis, and the per gene counts for each sample were normalized using the TMM method (Robinson and Oshlack, 2010). The default settings were used to obtain normalized counts from the DESeq2 analysis. The consensus list of differentially expressed genes from both tools were used for all downstream analyses. Differentially expressed genes among the hybrids were identified as transgressive if their expression were significantly above or below the range found in the parental subspecies. Further, log2 fold-changes (lfc) thresholds of 0.5 and 1 were applied to increase our statistical yield of true positives (Schurch et al., 2016). All tools for the analysis were ran on Galaxy2.
Coding Sequence and Expression Divergence
Rates of coding sequence divergence between D. p. bogotana and D. p. pseudoobscura were estimated for differentially expressed genes between the parental subspecies and for transgressive genes in fertile and sterile F1 hybrids. Since the RNA-seq data provided only partial sequences from each gene analyzed, we retrieved raw DNA sequence reads from the sequence read archives (SRA) under the accession number SRX091468 (D. p. bogotana). The D. p. bogotana raw sequence reads were aligned to all gene regions from the r.3.04 D. p. pseudoobscura reference genome (see text footnote 1) using BWA (Li and Durbin, 2010) ran on Galaxy (see text footnote 2) under default settings except for the maximum number of gap extensions, which was set to 4. The “extract consensus from assembly” workflow in UGene (Okonechnikov et al., 2012) was then used to extract the D. p. bogotana gene regions, and these were aligned to the longest available transcript for D. p. pseudoobscura from FlyBase (see text footnote 1) using MAFFT (Katoh and Standley, 2013). The alignments were modified using Gblocks v.0.91b (Castresana, 2000) with default settings except for the block parameters, which allowed gap positions with half within the final blocks–this removes unaligned introns from the D. p. bogotana gene region while preserving possible indels. Alignments from Gblocks were inspected to ensure that the coding sequences were intact open reading frames and were a multiple of three.
Rates of synonymous (dS) and non-synonymous (dN) nucleotide substitutions were estimated using the SeqinR package (Charif and Lobry, 2007) loaded on RStudio version 1.1.463. Non-parametric Spearman rank sum correlation coefficients were calculated to test the relationship between coding sequence divergence (dN, dS, and dN/dS) and expression difference. For the parental subspecies, expression differences were calculated as the absolute difference of [log2(D. p. pseudoobscura) - log2(D. p. bogotana)]. For the transgressive genes, expression differences were calculated for each hybrid relative to each parental subspecies as the absolute difference of [log2(Fert or Ster) - log2(D. p. pseudoobscura or D. p. bogotana)]. The lower absolute difference value was kept as a measure of minimum transgressive expression (Barreto et al., 2015).
Allele-Specific Expression
To determine the role of cis and/or trans changes to transgressive gene expression in the hybrids, we identified fixed species-specific single nucleotide polymorphisms (SNPs) and their relative allele expression in the hybrids. SNPs between the parental subspecies were identified from their mapped reads using Naïve variant caller followed by processing with the Variant annotator (Blankenberg et al., 2014). SNPs were considered fixed in each parental subspecies if each parent had a single different allele and at least three supporting reads. Allele-specific expression (ASE) in the hybrids was measured by first assigning their RNA-seq reads to a parent of origin based on the identity of the allele at fixed SNP positions in each parent. Reads with fixed SNPs mapping to a single gene were summed, and any gene with less than 20 mapped reads from both parental subspecies combined were discarded from further analysis (McManus et al., 2010; Gomes and Civetta, 2015). SNP counts for each gene were then adjusted to account for differences in sequencing depth between samples. Samples with zero SNP counts were given a value of 1 to allow for statistical testing. To detect significant differences between the ratio of parental SNP counts to counts of each parental allele in the sterile and fertile hybrids, respectively, the Fisher’s exact test was used (McManus et al., 2010; Gomes and Civetta, 2015). Transgressive genes that showed differential expression between the parental subspecies were classified as driven by cis–trans divergence if the Fisher’s exact test was significant and cis regulatory divergence when the Fisher’s exact test was not significant (McManus et al., 2010). For transgressive genes that were not differentially expressed between the parental subspecies, a significant result for the Fisher’s exact test indicated evidence for compensatory cis and trans mutations (McManus et al., 2010), while a non-significant result suggested a conservation in regulatory interactions and classified as non-compensatory.
Interactions and Sequence Similarity
Interactions among proteins were predicted using STRING (v11.0; Szklarczyk et al., 2019). Gene-Ontology and UniProt keyword enrichments were assessed from outputs using STRING and DAVID (v6.8; Huang et al., 2009a,b). We used the extended gene regions (which includes 2-kb 5′ and 3′) for genes that showed transgressive expression driven by trans regulatory elements (i.e., cis–trans divergent or compensatory) to perform a BLASTn against a database containing all transgressive genes and against another database with all D. p. pseudoobscura extended gene regions within the genome to identify similarities between upstream regions for plus/plus matches or between the upstream and downstream regions for plus/minus matches. We retained only hits that were lower than 1 × 10–14 and unique among transgressive sequences and not shared with other genes in the genome. Retained hits had E-values lower than 8 × 10–15, with nucleotide alignments of at least 173 base pairs and identities higher than 64%.
Results
The re-analysis of our previously published data (Gomes and Civetta, 2015) by mapping reads onto a newer released genome assembly and using more recently developed analytical pipelines found similar results in terms of lack of bias in mapping, low proportion of differentially expressed genes between subspecies, and significant excess of transgressive expression in sterile relative to fertile hybrids (Supplementary Material).
Transgressive Gene Expression in Hybrids Does Not Correlate With Accelerated Rates of Evolution as Expected Under a Scenario of Divergent Selection Between Subspecies
Under the assumption that regulatory evolution and structural protein evolution are under similar selective pressures, a correlation is expected between expression difference and nucleotide sequence evolution. Of the 819 differentially expressed genes between the parental subspecies, 604 (73.7%) were protein-coding genes with the remaining 215 (26.3%) being non-coding RNAs or coding genes without full coding sequences available for both subspecies. The percentage of differentially expressed protein-coding genes between subspecies increases significantly when a less stringent lfc threshold of 0.5 was applied (82.7%; Z = 5.51, P < 0.001) (Supplementary Figure 1). We found a significant correlation for expression differences between subspecies and non-synonymous (dN) sequence divergence (N = 604; Spearman’s ρ = 0.091, P = 0.026) but not between differences in expression and synonymous substitutions (dS) (Spearman’s ρ = -0.046, P = 0.261). The dN/dS ratio was also positively correlated with expression differences (ρ = 0.108, P = 0.011). Using the less stringent lfc threshold of 0.5, dN, dS, and dN/dS were all significantly correlated with gene expression divergence between subspecies (N = 1,801; ρ = 0.121, P = 2.39 × 10–7; ρ = 0.065, P = 0.005; and ρ = 0.096, P = 8.4 × 10–5, respectively) (Figure 1A). These results are overall in agreement with previous findings in Drosophila and other organisms confirming that protein sequence and expression divergence are influenced by similar selective processes (Castillo-Davis et al., 2004; Nuzhdin et al., 2004; Khaitovich et al., 2005; Artieri et al., 2007; Ortíz-Barrientos et al., 2007).
Figure 1. Correlation analysis between expression and coding sequence divergence. Spearman’s rank-sum coefficient and P-values are displayed in each frame. (A) Analysis on differentially expressed genes between the parental subspecies. (B) Analysis on genes showing transgressive expression in hybrids (fertile and sterile).
Given that protein-coding sequence differentiation serves as a good predictor of expression divergence, some studies have explored correlations between rates of protein divergence with expression of misregulated genes in hybrids. Misregulated genes with transgressive expression in hybrids are of interest in speciation as they associate with hybrid disrupted phenotypes (Moehring et al., 2007; Catron and Noor, 2008; Sundararajan and Civetta, 2011; Gomes and Civetta, 2015; Brill et al., 2016; Civetta, 2016). Significant positive correlations are suggestive of either directional selection or relaxation of selective constraints fueling regulatory incompatibilities (Artieri et al., 2007; Hunt et al., 2013; Barreto et al., 2015). Of the 44 transgressive genes in the hybrids, 35 had available sequence data for the estimation of coding sequence divergence. The analysis showed no significant correlations between sequence divergence and expression difference (N = 35; dN, ρ = 0.078, P = 0.655; dS, ρ = 0.242, P = 0.161; dN/dS, ρ = -0.112, P = 0.547). This result holds when a less stringent lfc threshold of 0.5 was used, with 223 of the 262 transgressive genes having sequence data available for analysis (N = 223 dN, ρ = -0.078, P = 0.245; dS, ρ = 0.081, P = 0.230; dN/dS, ρ = -0.092, P = 0.208) (Figure 1B).
Alternative Explanations for Transgressive Expression in Hybrids: Compensatory Mutations, Interaction Networks, and Transcriptional Drive by Sequence Similarity Among Targets
One possibility for a lack of correlation between transgressive expression in hybrids and sequence divergence is that transgressive expression might be a consequence of occasional deleterious mutations that are followed by compensatory DNA changes to overcome detrimental effects on gene expression (i.e., a side effect of stabilizing selection between divergent taxa) (Figure 2A–Gene 1). Our data show that 32 out of 44 (72.72%) transgressive genes in the hybrids were not differentially expressed between parental subspecies. The low number of transgressive genes is likely a consequence of our stringent use of a twofold change (lfc = 1) in expression threshold to maximize our statistical yield of true positives. Given the low sample size, we decided to continue using a less stringent lfc threshold of 0.5 and found, as with the more stringent threshold, a large proportion of transgressive genes without differential expression between parental subspecies (79%, 207/262). If genes without differential expression between subspecies are under stabilizing selection favoring compensatory mutations to buffer deleterious mutations and restore expression to similar levels among parental subspecies, we expect their rate of sequence divergence to be lower than those of genes experiencing divergence in regulation, and thus expression, between subspecies. Our data shows no significantly lower rates of change (dN and dN/dS) for genes with transgressive expression in hybrids and no differential expression between parentals (Mann–Whitney FDR corrected P-values) (Table 1).
Figure 2. Scenarios of regulatory divergence for cis and trans regulatory elements. (A) Gene 1 shows compensatory cis and trans mutations, denoted by asterisks, wherein D. p. bogotana experiences an initial mutation in cis followed by a mutation in trans restoring gene expression to similar levels between parental subspecies. Gene 2 shows similar levels of expression in parental subspecies. In the hybrid background, the D. p. bogotana trans factor for gene 1 interacts with the D. p. pseudoobscura trans factor for gene 2 leading to a conformation change. This new trans factor complex can now bind optimally to the cis region of genes 1 and 2 (red arrows) resulting in transgressive expression (i.e., expression above parental levels). The allelic ratio of gene 2 in the hybrid is equal to the allelic ratio of the two subspecies, and the gene is classified as non-compensatory through SNP analysis. (B) Gene 3 shows divergence in cis in one subspecies and trans in the other subspecies. This leads to suboptimal binding in both subspecies and differential expression. The regulatory incompatibilities persist within the hybrid background leading to unequal allelic ratios. Gene 3 is classified as cis–trans divergent by SNP analysis. Gene 4 shows a situation of cis-only divergence between the parental subspecies. Regulatory incompatibilities would occur in D. p. bogotana but not D. p. pseudoobscura resulting in differential expression between the subspecies. Similar interactions for this gene would occur in the hybrid resulting in equal allelic ratio. Gene 4 is classified as cis-only by SNP analysis.
Table 1. Average evolutionary rates (±SD) for differentially expressed genes between parental subspecies that do not show transgressive expression in hybrids [(P1 ≠ P2)NT], transgressive genes that show differential expression between subspecies [(P1 ≠ P2)T], and transgressive genes that do not show differential expression between subspecies [(P1 = P2)T].
We used informative SNPs to identify genes with transgressive expression in hybrids driven by compensatory mutations or cis–trans divergence (Figures 2A,B–Genes 1 and 3). Twenty-five percent of the transgressive genes (65/262) had non-informative SNPs to allow us to classify parent of origin for the alleles found in the hybrids. Of the remaining 197 transgressive genes, we found that for 65% of them, transgressive expression could be explained by compensatory mutations (97 genes) or cis–trans divergence (31 genes) (Figures 2A,B–Genes 1 and 3). The remaining are cases in which the transgressive gene shows similar ratios of subspecies allele expression in parents and hybrids. Of these, 62 were classified as non-compensatory and 7 as having experienced cis divergence (Figures 2A,B–Genes 2 and 4).
We explored whether transgressive expression in hybrids for genes that do not show evidence of compensatory or cis–trans mutations could be a cascade triggered by interactions in a shared gene network and/or pathway (Bader et al., 2015; Barreto et al., 2015). This will predict clusters of interacting and functionally related proteins to be misregulated in the hybrids. We detected a protein–protein interaction (PPI) network of 90 genes (34% of the 262 transgressive genes) (Figure 3) with a significant (i.e., more interactions than randomly expected) PPI enrichment (P = 4.29 × 10–2). We found no evidence of known functional enrichment in the network, but a significant overrepresentation of “Signal” genes based on UniProt keywords (FDR corrected P = 1.25 × 10–7). More importantly, when we partitioned the network analysis by sterile vs. fertile hybrids, the PPI analysis was significant for sterile hybrids (PPI enrichment P = 1.06 × 10–2, 79 nodes) but not for fertile hybrids (PPI enrichment P = 0.106, four nodes). Twenty-two genes in the network were cis or non-compensatory, thus their misregulation could be driven by interactions with other misregulated genes in the network (Figure 3). We found no significant PPI for transgressive genes differentially expressed between subspecies (P = 0.597). Finally, we explored whether transgressive expression in hybrids could be a consequence of what we refer to as “transcriptional drive.” That is, the ability of a modified trans factors to control and drive the transcription of multiple prior non-target genes with cis-sequence similarity (Figure 2A–red arrows). We found 46 genes (18% -46/262) with possible evidence of co-option by newly evolved trans mutations. Of these genes, 15 were classified as compensatory, 10 had cis–trans divergence, 9 were non-compensatory, and 12 had non-informative SNPs for classification (Supplementary Table 1).
Figure 3. STRING protein–protein interaction network for all transgressive genes in hybrids. Circles represent transgressive genes that are unique to the sterile hybrid (78), squares are genes unique to the fertile hybrid (11), and the triangle represents a gene that shows transgressive expression in both fertile and sterile hybrids. Non-compensatory genes (20) are colored green, red represents compensatory genes (37), yellow for genes with cis–trans divergence (16), blue for cis-only genes (2), and black represents genes with no informative SNPs (15).
Discussion
Genome-wide, our results are in agreement with previous reports of correlated evolution between sequence and expression divergence (Castillo-Davis et al., 2004; Nuzhdin et al., 2004; Khaitovich et al., 2005; Lemos et al., 2005; Artieri et al., 2007; Hunt et al., 2013; Whittle et al., 2014; Barreto et al., 2015), but provide no support for positive selection or relaxation of selective constraints as drivers of change causing misregulation and transgressive expression in hybrids. It is possible that the lack of correlation for hybrids results from the relatively low divergence between the subspecies. However, we did find a significant correlation for expression differences between subspecies. Genes with no differential expression between subspecies and transgressive expression in hybrids did not show overall evidence of lower sequence divergence than transgressive genes with differential expression between subspecies. This result is unexpected under a scenario of compensation favoring mutations that restore divergence in gene expression between parental subspecies (i.e., stabilizing selection). We used SNPs to tease apart regulatory divergence among transgressive genes in hybrids. Transgressive expression results from divergence in cis and trans regulatory elements, leading to differential expression between parental species as well as hybrids. Alternatively, such changes can be buffered by compensatory mutations within lineages to restore levels of expression to similar levels between species but cause misexpression in hybrids (Landry et al., 2005; McManus et al., 2010; Mack and Nachman, 2017). Studies of divergence in gene expression between species provides support for changes in transcript levels being often deleterious, with large mutational effects, and equilibrium levels of genetic variation maintained by stabilizing selection (Rifkin et al., 2003; Lemos et al., 2005; Hodgins-Davis et al., 2015). Our study shows that the majority (79%) of transgressive genes in hybrids between D. p. pseudoobscura and D. p. bogotana were not differentially expressed between the subspecies, and the SNP analysis supports a good proportion of transgressive expression caused by compensatory changes (49%) during early stages of species divergence, with another (16%) caused by cis–trans divergence.
Informative SNPs are limited between closely related subspecies. Thus 25% of transgressive genes could not be analyzed this way. Moreover, for any gene, not all reads have informative SNPs imposing some analytical limitations. While this might lead to an underestimation, our result of 49% compensatory evolution for a pair of very closely related subspecies of Drosophila is expected when compared to estimates of 73% compensatory evolution for hybrids between more distantly related species of D. simulans and D. sechellia (Coolon et al., 2014) and 67% for yeast (Wang et al., 2015). The proportion of compensatory mutations within lineage (49%) is larger than cis–trans divergence between lineages (16%) and suggests that hybrids between closely related taxa might be more vulnerable to a breakdown of co-adaptations within species than misregulation caused by divergent evolution. Two important caveats are that we used one line per subspecies, which might overestimate true divergence between these subspecies by overlooking possible shared polymorphisms, and that there is inherent bias toward a possible overestimate of the role of compensatory evolution when using an ASE approach (Fraser, 2019; Zhang and Emerson, 2019). Therefore, it is important to explore possible alternative explanations for a large proportion of transgressive genes, which could not be explained by cis–trans compensation or divergent cis–trans evolution.
We found that genes with transgressive expression in hybrids that experienced divergence in regulation between subspecies produced proteins that did not show enrichment for interactions. On the other hand, transgressive genes with no evidence of divergence between subspecies were enriched for protein interactions, particularly for the sterile hybrids. The overrepresentation of protein interactions among transgressive genes may be due to possible tissue or cell type atrophy in the hybrids. However, these two subspecies are very closely related, and previous work have shown no evidence of tissue atrophy in the hybrids (Gomes and Civetta, 2014). Furthermore, we did not find an overrepresentation of underexpressed genes as expected under tissue/cell atrophy. This result suggests that in some cases, misregulation and transgressive expression could be a cascade effect driven by networks of interacting proteins and that such domino effect could work to exacerbate initial incompatibilities in hybrids between early stage diverging lineages. The role of gene-network effects is expected under the Bateson–Dobzhansky–Muller model of speciation (Turner et al., 2014), and while there has been some support for gene-networks buffering allelic variation among yeast strains (Bader et al., 2015), its importance in speciation is largely unexplored. Finally, we entertained the idea that newly arising trans mutations in either divergent or compensatory cases could possibly generate a cascade effect of misregulation of targets that might have not experienced cis-regulatory mutations between divergent taxa (Figure 2A–red arrows). We explored the idea of “transcriptional drive by sequence similarity among targets” by seeking sequence similarity within proximal (2,000 bp) putative cis-regulatory elements between transgressive genes showing evidence of cis–trans divergence or compensation and those showing no evidence of such sequence divergence. Our analysis showed some support for this idea with 18% of genes being possibly co-opted. However, only nine genes classified as non-compensatory appear as possible targets. One important limitation is that we only addressed sequence similarities between nearby upstream sequence regions of compensatory or cis–trans transgressive genes and upstream sequence regions of other transgressive genes, leaving unexplored the possibility that misregulation could be exerted by more distant cis-regulatory elements.
Data Availability Statement
Publicly available datasets were analyzed in this study. This data can be found here: Drosophila bogotana SRA at GenBank: SRX091468. D. pseudoobscura extracted from FlyBase (http://flybase.org/). Expression data and other data can be accessed on Dryad (10.5061/dryad.2z34tmpjv).
Author Contributions
AC designed the study. AG conducted the data analysis. Both authors contributed to writing and editing the manuscript.
Funding
This work was supported by a Discovery grant from the Natural Sciences and Engineering Research Council of Canada (NSERC-RGPIN-2017-04599) to AC and an NSERC Canada Graduate Scholarship (CGS M) to AG.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020.599292/full#supplementary-material
Footnotes
References
Andrews, S. (2010). FastQC:A Quality Control Tool for High Throughput Sequence Data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed September 28, 2019).
Artieri, C. G., Haerty, W., and Singh, R. S. (2007). Association between levels of coding sequence divergence and gene misregulation in Drosophila male hybrids. J. Mol. Evol. 65, 697–704. doi: 10.1007/s00239-007-9048-2
Bader, D. M., Wilkening, S., Lin, G., Tekkedil, M. M., Dietrich, K., Steinmetz, L. M., et al. (2015). Negative feedback buffers effects of regulatory variants. Mol. Syst. Biol. 11:785. doi: 10.15252/msb.20145844
Barreto, F. S., Pereira, R. J., and Burton, R. S. (2015). Hybrid dysfunction and physiological compensation in gene expression. Mol. Biol. Evol. 32, 613–622. doi: 10.1093/molbev/msu321
Barreto, F. S., Watson, E. T., Lima, T. G., Willett, C. S., Edmands, S., Li, W., et al. (2018). Genomic signatures of mitonuclear coevolution across populations of Tigriopus californicus. Nat. Ecol. Evol. 2, 1250–1257. doi: 10.1038/s41559-018-0588-1
Baruzzo, G., Hayer, K. E., Kim, E. J., Di Camillo, B., Fitzgerald, G. A., and Grant, G. R. (2017). Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat. Methods 14, 135–139. doi: 10.1038/nmeth.4106
Blankenberg, D., Von Kuster, G., Bouvier, E., Baker, D., Afgan, E., Stoler, N., et al. (2014). Dissemination of scientific software with Galaxy ToolShed. Genome Biol. 15:403. doi: 10.1186/gb4161
Bolger, A. M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. doi: 10.1093/bioinformatics/btu170
Bourguet, D. (1999). The evolution of dominance. Heredity 83(Pt 1), 1–4. doi: 10.1038/sj.hdy.6885600
Brill, E., Kang, L., Michalak, K., Michalak, P., and Price, D. K. (2016). Hybrid sterility and evolution in Hawaiian Drosophila: differential gene and allele-specific expression analysis of backcross males. Heredity 117, 100–108. doi: 10.1038/hdy.2016.31
Castillo-Davis, C. I., Hartl, D. L., and Achaz, G. (2004). cis-Regulatory and protein evolution in orthologous and duplicate genes. Genome Res. 14, 1530–1536. doi: 10.1101/gr.2662504
Castresana, J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552. doi: 10.1093/oxfordjournals.molbev.a026334
Catron, D. J., and Noor, M. A. F. (2008). Gene expression disruptions of organism versus organ in Drosophila species hybirds. PLoS One 30:e3009. doi: 10.1371/journal.pone.0003009
Charif, D., and Lobry, J. R. (2007). “SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis,” in Structural Approaches to Sequence Evolution. Biological and Medical Physics, Biomedical Engineering, eds U. Bastolla, M. Porto, H. E. Roman, and M. Vendruscolo (Berlin: Springer), 207–232. doi: 10.1007/978-3-540-35306-5_10
Civetta, A. (2016). Misregulation of gene expression and sterility in interspecies hybrids: causal links and alternative hypotheses. J. Mol. Evol. 82, 176–182. doi: 10.1007/s00239-016-9734-z
Coolon, J. D., McManus, C. J., Stevenson, K. R., Graveley, B. R., and Wittkopp, P. J. (2014). Tempo and mode of regulatory evolution in Drosophila. Genome Res. 24, 797–808. doi: 10.1101/gr.163014.113
Coyne, J. A., and Orr, H. A. (1989). Patterns of speciation in Drosophila. Evolution 43:362. doi: 10.2307/2409213
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., et al. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. doi: 10.1093/bioinformatics/bts635
Dobzhansky, T. (1937). “Genetics and the origin of species,” in Columbia Biological Series, (New York, NY: Columbia University Press).
Dufresnes, C., Majtyka, T., Baird, S. J. E., Gerchen, J. F., Borzee, A., Savary, R., et al. (2016). Empirical evidence for large X-effects in animals with undifferentiated sex chromosomes. Sci. Rep. 6:21029. doi: 10.1038/srep21029
Fontdevila, A. (2016). Hybrid incompatibility in D rosophila:an updated genetic and evolutionary analysis. eLS 1–16. doi: 10.1002/9780470015902.a0020896.pub2
Fraser, H. B. (2019). Improving estimates of compensatory cis–trans regulatory divergence. Trends Genet. 35, 3–5. doi: 10.1016/j.tig.2018.09.003
Gomes, S., and Civetta, A. (2014). Misregulation of spermatogenesis genes in Drosophila hybrids is lineage-specific and driven by the combined effects of sterility and fast male regulatory divergence. J. Evol. Biol. 27, 1775–1783. doi: 10.1111/jeb.12428
Gomes, S., and Civetta, A. (2015). Hybrid male sterility and genome-wide misexpression of male reproductive proteases. Sci. Rep. 5:11976. doi: 10.1038/srep11976
Haerty, W., and Singh, R. S. (2006). Gene regulation divergence is a major contributor to the evolution of Dobzhansky-Muller incompatibilities between species of Drosophila. Mol. Biol. Evol. 23, 1707–1714. doi: 10.1093/molbev/msl033
He, X., Zhou, S., St. Armour, G. E., Mackay, T. F. C., and Anholt, R. R. H. (2016). Epistatic partners of neurogenic genes modulate Drosophila olfactory behavior. Genes Brain Behav. 15, 280–290. doi: 10.1111/gbb.12279
Hodgins-Davis, A., Rice, D. P., Townsend, J. P., and Novembre, J. (2015). Gene expression evolves under a house-of-cards model of stabilizing selection. Mol. Biol. Evol. 32, 2130–2140. doi: 10.1093/molbev/msv094
Huang, D. W., Sherman, B. T., and Lempicki, R. A. (2009a). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13. doi: 10.1093/nar/gkn923
Huang, D. W., Sherman, B. T., and Lempicki, R. A. (2009b). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57. doi: 10.1038/nprot.2008.211
Hunt, B. G., Ometto, L., Keller, L., and Goodisman, M. A. D. (2013). Evolution at two levels in fire ants: the relationship between patterns of gene expression and protein sequence evolution. Mol. Biol. Evol. 30, 263–271. doi: 10.1093/molbev/mss234
Katoh, K., and Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. doi: 10.1093/molbev/mst010
Khaitovich, P., Hellmann, I., Enard, W., Nowick, K., Leinweber, M., Franz, H., et al. (2005). Evolution: parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science 309, 1850–1854. doi: 10.1126/science.1108296
King, M. C., and Wilson, A. C. (1975). Evolution at two levels in humans and chimpanzees. Science 188, 107–116. doi: 10.1126/science.1090005
Landry, C. R., Wittkopp, P. J., Taubes, C. H., Ranz, J. M., Clark, A. G., and Hartl, D. L. (2005). Compensatory cis-trans evolution and the dysregulation of gene expression in interspecific hybrids of Drosophila. Genetics 171, 1813–1822. doi: 10.1534/genetics.105.047449
Lemos, B., Meiklejohn, C. D., Cáceres, M., and Hartl, D. L. (2005). Rates of divergence in gene expression profiles of primates, mice, and flies: stabilizing selection and variability among functional categories. Evolution 59, 126–137. doi: 10.1111/j.0014-3820.2005.tb00900.x
Li, H., and Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595. doi: 10.1093/bioinformatics/btp698
Liao, Y., Smyth, G. K., and Shi, W. (2014). FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930. doi: 10.1093/bioinformatics/btt656
Llopart, A. (2012). The rapid evolution of X-linked male-biased gene expression and the large-X effect in Drosophila yakuba, D. santomea, and their hybrids. Mol. Biol. Evol. 29, 3873–3886. doi: 10.1093/molbev/mss190
Love, M. I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550. doi: 10.1186/s13059-014-0550-8
Mack, K. L., Campbell, P., and Nachman, M. W. (2016). Gene regulation and speciation in house mice. Genome Res. 26, 451–461. doi: 10.1101/gr.195743.115
Mack, K. L., and Nachman, M. W. (2017). Gene regulation and speciation. Trends Genet. 33, 68–80. doi: 10.1016/j.tig.2016.11.003
Mackay, T. F. C. (2014). Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat. Rev. Genet. 15, 22–33. doi: 10.1038/nrg3627
Mackay, T. F. C., and Moore, J. H. (2014). Why epistasis is important for tackling complex human disease genetics. Genome Med. 6:124. doi: 10.1186/gm561
Maheshwari, S., and Barbash, D. A. (2011). The genetics of hybrid incompatibilities. Annu. Rev. Genet. 45, 331–355. doi: 10.1146/annurev-genet-110410-132514
Masly, J. P., Jones, C. D., Noor, M. A. F., Locke, J., and Orr, H. A. (2006). Gene transposition as a cause of hybrid sterility in Drosophila. Science 313, 1448–1450. doi: 10.1126/science.1128721
Masly, J. P., and Presgraves, D. C. (2007). High-resolution genome-wide dissection of the two rules of speciation in Drosophila. PLoS Biol. 5:e243. doi: 10.1371/journal.pbio.0050243
McManus, C. J., Coolon, J. D., Duff, M. O., Eipper-Mains, J., Graveley, B. R., and Wittkopp, P. J. (2010). Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res. 20, 816–825. doi: 10.1101/gr.102491.109
Mihola, O., Trachtulec, Z., Vlcek, C., Schimenti, J. C., and Forejt, J. (2009). A mouse speciation gene encodes a meiotic histone H3 methyltransferase. Science 323, 373–375. doi: 10.1126/science.1163601
Moehring, A. J., Teeter, K. C., and Noor, M. A. F. (2007). Genome-wide patterns of expression in Drosophila pure species and hybrid males. II. Examination of multiple-species hybridizations, platforms, and life cycle stages. Mol. Biol. Evol. 24, 137–145. doi: 10.1093/molbev/msl142
Morán, T., and Fontdevila, A. (2014). Genome-wide dissection of hybrid sterility in Drosophila confirms a polygenic threshold architecture. J. Hered. 105, 381–396. doi: 10.1093/jhered/esu003
Nuzhdin, S. V., Wayne, M. L., Harmon, K. L., and McIntyre, L. M. (2004). Common pattern of evolution of gene expression level and protein sequence in Drosophila. Mol. Biol. Evol. 21, 1308–1317. doi: 10.1093/molbev/msh128
Okonechnikov, K., Golosova, O., Fursov, M., Varlamov, A., Vaskin, Y., Efremov, I., et al. (2012). Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics 28, 1166–1167. doi: 10.1093/bioinformatics/bts091
Ortíz-Barrientos, D., Counterman, B. A., and Noor, M. A. F. (2007). Gene expression divergence and the origin of hybrid dysysfunctions. Genetica 129, 71–81. doi: 10.1007/s10709-006-0034-1
Phadnis, N., and Allen Orr, H. (2009). A single gene causes both male sterility and segre. Science 323, 376–379. doi: 10.1126/science.1163934.A
Presgraves, D. C. (2008). Sex chromosomes and speciation in Drosophila. Trends Genet. 24, 336–343. doi: 10.1016/j.tig.2008.04.007
Presgraves, D. C., Balagopalan, L., Abmayr, S. M., and Orr, H. A. (2003). Adaptive evolution drives divergence of a hybrid inviability gene between two species of Drosophila. Nature 423, 715–719. doi: 10.1038/nature01679
Ranz, J. M., Namgyal, K., Gibson, G., and Hartl, D. L. (2004). Anomalies in the expression profile of interspecific hybrids of Drosophila melanogaster and Drosophila simulans. Genome Res. 14, 373–379. doi: 10.1101/gr.2019804
Renaut, S., Nolte, A. W., and Bernatchez, L. (2009). Gene expression divergence and hybrid misexpression between lake whitefish species pairs (Coregonus spp. Salmonidae). Mol. Biol. Evol. 26, 925–936. doi: 10.1093/molbev/msp017
Rifkin, S. A., Kim, J., and White, K. P. (2003). Evolution of gene expression in the Drosophila melanogaster subgroup. Nat. Genet. 33, 138–144. doi: 10.1038/ng1086
Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2009). edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140. doi: 10.1093/bioinformatics/btp616
Robinson, M. D., and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25. doi: 10.1186/gb-2010-11-3-r25
Schaeffer, S. W., and Miller, E. L. (1991). Nucleotide sequence analysis of Adh genes estimates the time of geographic isolation of the Bogota population of Drosophila pseudoobscura. Proc. Natl. Acad. Sci. U. S. A. 88, 6097–6101. doi: 10.1073/pnas.88.14.6097
Schurch, N. J., Schofield, P., Gierliński, M., Cole, C., Sherstnev, A., Singh, V., et al. (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22, 839–851. doi: 10.1261/rna.053959.115
Sundararajan, V., and Civetta, A. (2011). Male sex interspecies divergence and down regulation of expression of spermatogenesis genes in Drosophila sterile hybrids. J. Mol. Evol. 72, 80–89. doi: 10.1007/s00239-010-9404-5
Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., et al. (2019). STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613. doi: 10.1093/nar/gky1131
Ting, C. T., Tsaur, S. C., Wu, M. L., and Wu, C. I. (1998). A rapidly evolving homeobox at the site of a hybrid sterility gene. Science 282, 1501–1504. doi: 10.1126/science.282.5393.1501
Tirosh, I., Reikhav, S., Levy, A. A., and Barkai, N. (2009). A yeast hybrid provides insight into the evolution of gene expression regulation. Science 324, 659–662. doi: 10.1126/science.1169766
Turner, L. M., and Harr, B. (2014). Genome-wide mapping in a house mouse hybrid zone reveals hybrid sterility loci and Dobzhansky-Muller interactions. Elife 3:e02504. doi: 10.7554/eLife.02504
Turner, L. M., White, M. A., Tautz, D., and Payseur, B. A. (2014). Genomic networks of hybrid sterility. PLoS Genet. 10, 18–22. doi: 10.1371/journal.pgen.1004162
Victoria Cattani, M., and Presgraves, D. C. (2012). Incompatibility between X chromosome factor and pericentric heterochromatic region causes lethality in hybrids between Drosophila melanogaster and its sibling species. Genetics 191, 549–559. doi: 10.1534/genetics.112.139683
Vonesch, S. C., Lamparter, D., Mackay, T. F. C., Bergmann, S., and Hafen, E. (2016). Genome-wide analysis reveals novel regulators of growth in Drosophila melanogaster. PLoS Genet. 12:e1005616. doi: 10.1371/journal.pgen.1005616
Wang, R. L., Wakeley, J., and Hey, J. (1997). Gene flow and natural selection in the origin of Drosophila pseudoobscura and close relatives. Genetics 147, 1091–1106.
Wang, Z., Sun, X., Zhao, Y., Guo, X., Jiang, H., Li, H., et al. (2015). Evolution of gene regulation during transcription and translation. Genome Biol. Evol. 7, 1155–1167. doi: 10.1093/gbe/evv059
Whittle, C. A., Sun, Y., and Johannesson, H. (2014). Dynamics of transcriptome evolution in the model eukaryote Neurospora. J. Evol. Biol. 27, 1125–1135. doi: 10.1111/jeb.12386
Keywords: regulatory co-option, network interactions, speciation, selection, transgressive gene expression, compensatory evolution, gene expression divergence
Citation: Go AC and Civetta A (2020) Hybrid Incompatibilities and Transgressive Gene Expression Between Two Closely Related Subspecies of Drosophila. Front. Genet. 11:599292. doi: 10.3389/fgene.2020.599292
Received: 26 August 2020; Accepted: 12 November 2020;
Published: 10 December 2020.
Edited by:
Igor V. Sharakhov, Virginia Tech, United StatesReviewed by:
Daniel Barbash, Cornell University, United StatesAna Llopart, The University of Iowa, United States
Joseph Coolon, Wesleyan University, United States
Copyright © 2020 Go and Civetta. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Alberto Civetta, YS5jaXZldHRhQHV3aW5uaXBlZy5jYQ==