Study on the Characteristic Codon Usage Pattern in Porcine Epidemic Diarrhea Virus Genomes and Its Host Adaptation Phenotype

Si, Fusheng; Jiang, Li; Yu, Ruisong; Wei, Wenqiang; Li, Zhen

doi:10.3389/fmicb.2021.738082

ORIGINAL RESEARCH article

Front. Microbiol., 18 October 2021

Sec. Virology

Volume 12 - 2021 | https://doi.org/10.3389/fmicb.2021.738082

Study on the Characteristic Codon Usage Pattern in Porcine Epidemic Diarrhea Virus Genomes and Its Host Adaptation Phenotype

1. Shanghai Key Laboratory of Agricultural Genetics and Breeding, Shanghai Engineering Research Center of Breeding Pig, Institute of Animal Science and Veterinary Medicine, Shanghai Academy of Agricultural Sciences, Shanghai, China
2. South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China
3. Department of Microbiology, School of Basic Medical Sciences, Henan University, Kaifeng, China

Abstract

Porcine epidemic diarrhea virus (PEDV), which classified in the genus Alphacoronavirus, family Coronaviridae, is one of the most important pathogens that cause heavy economic losses in pig industry. Although intensive mutation and recombination analysis of PEDV strains were provided, systematic genome analysis were needed to elucidate the evolution mechanism and codon usage adaptation profiles of the pathogen. Here, a comprehensive investigation was carried out to reveal the systematic evolutionary processes of synonymous codon usage and host-adapted evolution phenotype of PEDV genome. We found a low codon usage bias (CUB) in PEDV genome and that nucleotide compositions, natural selection, mutation pressure and geographical diversity shapes the codon usage patterns of PEDV, with natural selection dominated the overall codon usage bias in PEDV than the others. By using the relative codon deoptimization index (RCDI) and similarity index (SiD) analysis, we observed that genotype II PEDV strains showed the highest level of adaptation phenotype to Sus scrofa than another divergent clade. To the best of our knowledge, this is the first comprehensive report elaborating the codon usage and host adaptation of PEDV. The findings offer an insight into our understanding of factors involved in PEDV evolution, adaptation and fitness toward their hosts.

Introduction

Porcine epidemic diarrhea virus (PEDV) is a pathogen causing vomiting, diarrhea, dehydration and high lethality in piglets. This pathogen was first identified in Belgium in 1978 (Pensaert and de Bouck, 1978), and since then it has been identified in other countries of the world, including China, Japan, South Korea and Thailand in Asia. Wide use of inactivated PEDV vaccine kept the incidences at minimum in China until 2010, however, emergence of new variants thereafter caused increased PEDV outbreaks with more severe morbidity and mortality in suckling piglets (Li et al., 2012), then spread to North America in 2013–2014 (Huang et al., 2013). The continuous worldwide outbreak had a huge impact on the pig industry and caused great economic losses (Jung and Saif, 2015).

PEDV belongs to the Coronaviridae family and is a single-stranded, positive-sense RNA virus with envelope. Its genome is about 28 kb in length and encodes 16 non-structural proteins, 4 structural proteins and an accessory protein (ORF3). Structural proteins, including spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins, are the main components of virus particles (Jung et al., 2020). The S protein is responsible for binding to cell receptor and virus invasion into host cells. The E protein is an ion channel protein involved in the virion morphogenesis (Wang et al., 2011). The M protein is the most abundant proteins in viral envelope and plays an important role in the packaging process of virus particles. The N protein combines with genomic RNA to form ribonucleoprotein (RNP), which constitutes the core of the virus particles. The only accessory protein ORF3 has ion channel activity (Wang et al., 2012), and was found beneficial to virus proliferation (Wang et al., 2012; Ye et al., 2015; Si et al., 2020). Phylogenetic analysis showed that PEDV could be divided into two genotypes (G1 and G2) and five subgenotypes (G1a, G1b, G2a, G2b, and G2c) based on complete sequence of S gene (Fan et al., 2017; Hsueh et al., 2020; Wang H. et al., 2020; Wang X.W. et al., 2020).

Synonymous codon refers to the biochemical phenomenon of codon redundancy for an amino acid. Each amino acid, in general, can be encoded by 1–6 codons. However, the patterns of codon usage in different species may be different. It was reported that the frequency of some special synonymous codons used in protein translation is significantly higher than that of other counterparts with a specific organism (Grantham et al., 1980; Martin et al., 1989; Lloyd and Sharp, 1992). The phenomenon is called as codon usage bias (CUB). CUB exists in the genomes of all species including viruses, it is regarded as a driving force of genetic evolution (Deb et al., 2021; Huang et al., 2021; Nguyen et al., 2021) and is suggested to play an important role in the adaptation of various viruses to their host (Butt et al., 2016; Kumar et al., 2018; Tian et al., 2018; Khandia et al., 2019; Luo et al., 2020a). It is also suggested that codon usage of viruses is not limited to host adaption, but is also critical in various biological processes including viral gene transcription, nuclear export of virus RNA, tolerance to translation errors and immune escape (Zhou et al., 2016; Kumar et al., 2021; Mordstein et al., 2021). In general, codon usage pattern is influenced by mutational pressure and natural selection, which continuously act on the virus coding sequences and promote the existence of codon usage preference and at last drive and optimize virus survival, fitness and continuous evolution in the host cells (Lauring et al., 2012; Luo et al., 2020b). For the reasons mentioned above, comprehensive knowledge of the related influencing factors of the codon usage patterns of viruses will benefit the study on genetic evolution and pathogenic mechanism of viruses.

While CUB is associated with a variety of biological processes and phenomena, related issues in PEDV are still open questions to be answered. Previous studies have reported the codon usage of PEDV based on its complete genome or certain genes (S and ORF3) (Cao et al., 2011; Chen et al., 2014; Xu et al., 2019; Yu et al., 2021b). However, a more comprehensive analysis is still needed to determine whether two phylogroups (G1 and G2) of PEDV differ in the codon usage patterns and other factors play an important role in shaping codon usage of PEDV. More importantly, the major bottleneck limiting our complete understanding of the ongoing PEDV outbreak is a lack of comprehensive and systematic codon usage analysis report about PEDV genome sequences, particularly those data on early epidemic strains and recent highly pathogenic mutation isolates, and the possible correlation between of them. Up to now, it is still mysterious that how PEDV codon usage pattern benefits host adaptation and viral replicative fitness, whether the codon usage pattern of the host exerts selection pressure on PEDV coding sequence or vice versa. Thus, in this study, we employed systems biology approaches to comprehensively analyze the codon usage pattern of PEDV and the corresponding influencing factors. The results of this study provided novel insight into the codon bias phenomenon and evolution mechanism of PEDV.

Materials and Methods

Sequence Data Retrieving and Processing

For this study, we retrieved 56 complete genomes of PEDV (recombinant sequences were excluded) from GenBank¹ according to the isolation date, geographical distribution and phylogroups. The open reading frames (ORFs) for each genome were concatenated in the following order: ORF1ab-Spike-ORF3-Envelop-Membrane-Nucleocapsid, and each concatenated gene was subsequently retained for being analyzed on the characteristic of codon usage pattern. Detailed information about the 56 selected PEDV strains, including their isolated date, strain name, accession number and their place of isolation was listed in Supplementary Table 1.

Phylogenetic Analysis

In order to show the genetic evolution relationship between the selected strains, phylogenetic analysis based on the non-recombinant complete nucleotide coding sequences of the 56 PEDV listed in Supplementary Table 1 was performed. The multiple sequence alignment of PEDV coding region was applied using an online tool MUSCLE.² The resulting multiple sequence alignment was concatenated together to build phylogenetic trees with neighbor-joining (NJ) and maximum-likelihood (ML) algorithms, respectively. Specifically, MEGA-X software (version 10.1.8) was used to construct the phylogenetic tree with NJ method and the Kimura 2-parameter model. Nucleotide substitutions include transitions and transversions. The reliability of the phylogenetic tree was evaluated by the bootstrap methods with 1,000 replicates. IQ-TREE (version 2.1.2) (Spielman and Wilke, 2015; Minh et al., 2020) was used to perform a ML phylogenetic analysis under the TIM+F+R4 model as the best-fit model (Kalyaanamoorthy et al., 2017) using the ultrafast bootstrap option with 1,000 replicates. The phylogenetic trees were visualized using iTOL³ (Letunic and Bork, 2021). Bootstrap support values were labeled only if equal or larger than 50%.

Analysis of Overall Nucleotide Composition

According to a recent study (He et al., 2021), we employed Codon W software (version 1.4.2) to compute the frequencies of A, T, C, and G at the third positions (A3s, U3s, G3s, C3s) in the synonymous codons. Meanwhile, GC contents at the first (GC1s), second (GC2s), third (GC3s) codon positions, mean of GC1s and GC2s (GC12s) and the frequencies of mononucleotides (A, C, U, and G) were calculated using R Language (version 4.0.4).⁴ The five codons, the termination codons UGA, UAG, and UAA do not encode any amino acids, AUG codons for Methionine, UGG codons for Tryptophan were excluded from the analysis.

Relative Synonymous Codon Usage Analysis

RSCU value indicates whether the synonymous codons of a specific amino acid are used fairly or whether the codon usage pattern is affected by the amino acid composition. Usually RSCU values for the 59 codons (excluding UGA, UAG, UAA, UGG, and AUG) were computed using Codon W software (version 1.4.2) to assess the codon usage patterns. The RSCU value equal to 1 means that there is no deviation in the codon usage; while the codons with an RSCU value of < 0.6 and > 1.6 are considered as under-represented and over-represented, respectively (Sharp and Li, 1986b). If a specific codon has the highest RSCU value in both the virus and the host, this is considered as evidence of a shared codon preference (Khandia et al., 2019; Roy et al., 2021). The RSCU was calculated by the formula invented by Sharp et al. (1986):

Where X_ij represents the number of codons used by the observed amino acid, and n_i represents the overall sum of synonymous codons for the amino acid.

Correspondence Analysis of Relative Synonymous Codon Usage

Correspondence analysis (COA) is a commonly used statistical method to study the major trends of codon usage patterns in virus coding sequences, which is also known as principal component analysis (PCA) (Lara-Ramírez et al., 2014). In detail, the RSCU values of each strain were represented as a 59-dimensional vector corresponding to the 59 synonymous codons (excluding AUG and UGG encoded by single codon, as well as three stop codons UGA, UAG, UAA), and they were then transformed into uncorrelated variables (principal components) (He et al., 2019). In this method, PCA can determine the main variables according to the RSCU value of each codon and the factors influencing the CUB can be effectively determined by correspondence analysis (COA) of RSCU. The COA was conducted utilizing the Codon W software (version 1.4.2). The first two axes values which were accounting for most of the component influencing the codon usage variation among genes were used to build the PCA plots.

Relative Dinucleotide Abundance of Porcine Epidemic Diarrhea Virus

The relative abundance of dinucleotides was calculated following a previously described method using the equation below (Kariin and Burge, 1995):

The odds ratio ρ_xy = f_xy/f_xf_y, where the frequency of nucleotide X is represented by f_x, the frequency of nucleotide Y is represented by f_y, and f_xy denotes the observed frequency of dinucleotide XY. As an universal standard, for ρ_xy ≥ 1.25 or ≤ 0.78, the dinucleotide XY is over-represented or under-represented, respectively; for 1.20 ≤ ρ_xy < 1.25 or 0.78 ≤ ρ_xy< 0.82, the XY pair is considered to be marginally high or marginally low; and for 0.82 ≤ ρ_xy ≤ 1.19, the XY pair is considered to be within the normal range (Kunec and Osterrieder, 2016; Khandia et al., 2019).

Evaluation of the Influence of Selection Pressure on Codon Usage Bias

The effective codon number (ENC) analysis is used to quantify the extent of CUB of amino acid coding sequences. The ENC values ranged from 20 (only one synonymous codon for one amino acid, an extreme CUB) to 61 (all synonymous codons were used equally, showing no preference). Generally, when the ENC value is lower than 35, it is regarded as strongly biased codon usage (Comeron and Aguadé, 1998; Yu et al., 2021a). The ENC value was inversely correlated with CUB, i.e., the higher ENC value indicates the lower CUB, indicating that more synonymous codons are used to encode the same amino acid, and vice versa.

The plot of ENC vs. GC3s (the GC contents at the third position of the codon) can be utilized to determine the factors influencing the CUB (Wright, 1990). In order to determine which of the burden of mutation pressure or natural selection is the main factor affecting the codon usage of PEDV, we further analyzed the ENC-plot with the ENC values plotted against the GC3s values. If these points are located on or around Wright’s theoretical curve, the codon usage of a specific gene is only affected by mutation pressure. Otherwise, if these points are lower than the theoretical curve, natural selection plays a leading role in shaping the codon usage pattern (Wong et al., 2010). The expected ENC value was calculated by the following equation (Kumar et al., 2016):

where s represents the frequency of G or C at the third position of synonymous codons (GC3s).

Neutrality Plot Analysis

The neutrality plot is also called neutral evolution analysis. It is used to quantitatively measure the influences of mutation pressure and natural selection on the codon usage patterns of coding sequences (Sueoka, 1988). Using GC3s as a horizontal coordinate and GC12s (the average value of GC contents at the first and the second positions of the codons) as the vertical coordinate, the GC3s and GC12s contents of the PEDV coding sequence were plotted to create a scatterplot and a fitted regression line was calculated using R Language (version 4.0.4; see text footnote 4). The slope of regression line indicates the impact of mutational pressure (Nasrullah et al., 2015). The regression line close to the diagonal (slope = 1.0) indicates mutational pressure dominates the CUB, whereas regression curves tend toward 0 indicate natural selection plays a key role on CUB (Deb et al., 2021).

Parity Rule 2 Analysis

Parity rule 2 (PR2) plot analysis was used to investigate the effect of mutation pressure and natural selection on the codon usage by calculating nucleotide compositions of codons at the third position (A3s, U3s, C3s, and G3s). A parity rule 2 (PR2) bias was calculated by using the AU bias [A3/(A3 + U3)] as the vertical coordinate and the GC bias [G3/(G3 + C3)] as the horizontal coordinate (Wu et al., 2015). Generally, the origin point is 0.5 (x = 0.5 and y = 0.5). If mutation pressure and natural selection have the same effect on codon usage, these points will be at the origin of the plot, where A = U and G = C.

Hydropathicity (GRAVY) and Aromaticity Indices Analysis

GRAVY and Aroma values represent the hydropathicity and aromaticity of a given coding sequence, respectively. Those are two major factors influencing the translation and natural selection of a gene. In this study, the GRAVY and Aroma values of each strain were computed using the Codon W software (version 1.4.2) to indicate the hydropathicity and aromaticity, respectively. A higher GRAVY or ARO value suggests a more hydrophobic or aromatic amino acid product (Zang et al., 2017).

Correlation Analysis

Correlation analysis was used to identify the relationship between nucleotide composition (A, U, C, G), the third position of code (A3s, U3s, G3s, C3s), GC3s, principal component factors, hydropathicity (GRAVY), and aromaticity (ARO) in 56 complete PEDV coding sequences using Spearman’s rank correlation analysis (Ewens and Grant, 2006).

Relative Codon Deoptimization Index

The RCDI developed by Mueller et al. (2006) reflects the similarity between the codon usage of a gene and the codon usage of a reference genome. It can also be used to measure the rate of translation of viral gene in a host genome. Similar codon usage between a virus and its host is characterized by RCDI values close to 1, which implies that a virus is almost completely adapted to its host, and indicates that the virus follows the codon usage pattern of its host (Butt et al., 2016), whereas an RCDI higher than 1 indicates that the virus is less adaptable to the host, or the deoptimization of the codon usage patterns of the virus from that of its host (D’andrea et al., 2019). The RCDI values of different lineages of the complete PEDV coding sequence were computed using vhcub R package tool (Anwar et al., 2019). The codon usage patterns for Sus scrofa were used as references and were retrieved from the Codon Usage Database.⁵

Similarity Index

The similarity index [SiD or D (A, B)] is employed to estimates the influence of the overall codon usage patterns of hosts on the formation of certain viruses. The range of SiD is between 0 and 1, and the higher the value, the stronger the impact of a host is on virus’s codon usage. In order to further reveal the influence of the codon usage patterns of the Sus scrofa on PEDV’s codon usage pattern, the similarity index was calculated as follows:

Where R (A, B) is defined as the cosine value of the angle included between the A and B spatial vectors, and indicates the similarity between PEDV and the overall codon usage pattern of the host. “a_i” is the RSCU value of a specific codon of the PEDV coding sequence, and “b_i” is the RSCU value of the same codon for the host. D (A, B) indicates the potential impact of the overall use of the host codon on that of PEDV, and its value ranges from 0.0 to 1.0 (Zhou et al., 2013).

Software and Statistical Analysis

The software Codon W (version 1.4.2) was used to calculate the overall nucleotide composition, relative synonymous codons usage (RSCU) values, correspondence analysis (COA) and the GRAVY and Aroma values of the coding sequence. Spearman’s rank correlation and linear regression analyses were performed by R Language (version 4.0.4; see text footnote 4). In some cases, the graphs were drawn by some different R packages as SeqinR and ggplot2 (Charif and Lobry, 2007; Anwar et al., 2019). An online tool CIMminer⁶ was used to performed the cluster analysis (Heat map) based on the calculated RSCU value of each PEDV strain. A p-value < 0.01 (^∗∗) indicates a very significant correlation, and 0.01 < p < 0.05 (^∗) indicates a significant correlation. The statistical data were analyzed by one-way ANOVA and Dunnett’s test for multiple comparisons to observe significant differences between these means from the different groups, using GraphPad Prism version 7.0 (GraphPad Software, San Diego, California, United States).

Results

Phylogenetic Analysis Based on Coding Sequences of Porcine Epidemic Diarrhea Virus

In order to determine the relationship of PEDV strains selected in this study, we first carried out phylogenetic analysis of the complete coding sequence of PEDV by using neighbor-Joining (NJ) and maximum-likelihood (ML) algorithms method. Our results showed that the NJ and ML tree topologies were highly congruent (Figures 1A,B). It can be seen that all 56 PEDV isolates were divided into two main phylogroups (Figure 1), Group I (including G1a and G1b) and Group II (including G2a, G2b, and G2c). This was in agreement with the findings of previous reports showing two classical divergent clades (Hsueh et al., 2020; Wang X.W. et al., 2020).

FIGURE 1

Nucleotide Composition of the Porcine Epidemic Diarrhea Virus Genome

Nucleotide content of 56 complete coding sequences was calculated to evaluate the potential impact of compositional constraints on codon usage pattern of PEDV. We found that the A%, U%, G%, C%, and GC% were 24.78 ± 0.004 (mean ± SD), 33.35 ± 0.000, 22.75 ± 0.000, 19.04 ± 0.000, and 41.79 ± 0.000, respectively. The base contents in the third position of the PEDV synonymous codons were also calculated and A3s%, U3s%, G3s%, C3s%, and GC3s% in these viruses were 23.86 ± 0.001 (mean ± SD), 54.26 ± 0.002, 22.61 ± 0.002, 22.98 ± 0.001, and 35.33 ± 0.002, respectively. We also observed that U3s (54.26%) was the highest in frequency and showed a very significant difference (p < 0.0001) among the A3s (23.86%), C3s (22.98%), and G3s (22.61%). In addition, the mean GC3s compositions were 35.33%, and the composition of AU (58.2%) was significantly higher than that of GC (41.79%) in PEDV complete coding sequences (p < 0.0001) (Supplementary Table 2), a similar trend was also observed among the five phylogroup strains (Supplementary Table 3). Taken together, these results suggest an AU-rich composition and the mononucleotide A is the most frequent nucleotide in PEDV coding sequences.

Relative Synonymous Codon Usage Analysis

RSCU value is not related to amino acid composition, and has been widely used to evaluate the bias of codon use among genes. The higher the RSCU value means the higher the frequency of codon use or the higher the CUB phenomenon. In general, the RSCU values of the majority of the codons scored between 0.6 and 1.6. To investigate the codon usage patterns and the extent of CUB in the PEDV complete coding sequences, the RSCU values were calculated for each strain and compared with its natural host and other reference animal model hosts including Homo sapiens (Table 1). We observed that all the analyzed 18 amino acids had multiple synonymous codons (Table 1 and Figure 2). Specifically, among 59 synonymous codons, 26 were classified as preferred codons (RSCU > 1.0), and 21 of them are A/U-ended. Among preferred codon, 10 codons GCU (Ala), UUG (Leu), ACU (Thr), CCU (Pro), UCU (Ser), AUU (Ile), CUU (Leu), GUU (Val), CGU (Arg), and GGU (Gly) with RSCU value of > 1.6 were considered over-represented. Similarly, 11 codons GCG (Ala), CUA (Leu), GUA (Val), CGA (Arg), CCG (Pro), GGA (Gly), UCG (Ser), CCC (Pro), ACG (Thr), CGG (Arg), and GGG (Gly) with RSCU value of < 0.6 were regarded as under-represented and 7 out of 11 under-represented codons were G/C-ended. It is quite interesting to note that almost all of the over-represented codons were U-ended (9 out of 10) and mostly under-represented codons were A/G-ended (10 out of 11) (Table 1). We could not find a common single codon, which was over-represented in PEDV and the three model species. However, we observed several coincident preferred codons of PEDV and Sus scrofa; In contrast, 6 codons, CUA (Leu), GUA (Val), UCG (Ser), CCG (Pro), ACG (Thr), GCG (Ala), were under-represented not only in PEDV but also in other three model species. In addition, there were 2 codons, GUU (Val) and CGU (Arg), which were over-represented in PEDV but not the reference animal hosts species (Table 1). This result implied that PEDV had evolved a mixture of coincident and antagonistic codon usage patterns relative to its natural host, Sus scrofa. When clustering these biases according to a heat map, we also observed that all the PEDV strains from distinct phylogroups or different geographical areas shared similar preferred codons as above (Table 2 and Supplementary Figures 1A,B). The result indicated that CUB existed in PEDV genomes and A/U-ended codons were preferred.

TABLE 1

The relative synonymous codon usage (RSCU) patterns of PEDV in comparison with the RSCU values of its natural host (Sus scrofa) as well as reference animal model hosts including Homo sapiens.

^aMean RSCU values of 56 PEDV strains. Termination codons (TER) and single codons encoding methionine (Met) and Tryptophan (Trp) were excluded from correspondence analysis. Blue, red, and green RSCU values represent the preferred (RSCU > 1), over-represented (RSCU > 1.6), and under-represented (RSCU < 0.6) codons, respectively. Coincident preferred codon of PEDV and Sus scrofa are shown in bold, most preferred codons in PEDV displaying antagonism with Sus scrofa are marked in bold italic.

FIGURE 2

TABLE 2

RSCU values of the whole coding sequence in 56 PEDV strains.

Note: The values are presented according to genotypes. Blue, red, and green represent the preferred (RSCU > 1), over-represented (RSCU > 1.6), and under-represented (RSCU < 0.6) codons, respectively.

Correspondence Analysis

In order to explore whether the phylogroup and geographic distribution had an impact on the codon usage pattern, we used COA based on the RSCU values to analyze the codon usage of different PEDV isolates which were isolated from different geographic areas. The result showed that the first, second, third, and fourth axis accounted for 34.81, 12.98, 10.01, and 7.97% of the total variation, respectively. This indicated that the first and second axes were responsible for the main change in the variation of the CUB which was further plotted to understand the distribution of synonymous codons usage patterns. The COA results showed that axis 2 separated G1 from the G2 genotype of PEDV (Figure 3A), which is in consistent with the phylogenetic analysis. When taking the geographical factors that may potentially influence PEDV evolution into consideration, we found that there was an obvious geographical distribution. As shown in Figure 3B, most of the isolates from different geographical areas were distributed separately except the strains from the United States, which concentrated on the top of the ordinate axis. In addition, we can see clearly that the PEDV strains from China were diversely distributed in three independent areas among the strains, which demonstrated that different strains from the same geographical region had great differences in their codon usage. These results indicate that the geographical diversity may influence PEDV CUB potentially.

FIGURE 3

Relationship Between Relative Dinucleotide Abundance and Codon Usage in Porcine Epidemic Diarrhea Virus Genome

It was reported that the codon usage pattern in several organisms, including some of the DNA and RNA viruses could be constrained by the relative abundance of dinucleotides (Kariin and Burge, 1995; Nasrullah et al., 2015; Wang et al., 2016; Cheng et al., 2020), which may be a consequence of intrinsic characteristics of the virus or mutational pressure from the host. Thus, we were interested in determining the relative abundance of 16 dinucleotides for all PEDV sequences. Results showed that distribution of relative dinucleotides abundance was not random in PEDV coding region (Figure 4 and Supplementary Table 4). Specifically, dinucleotides UU, GU, CU, and AA were marginally consistent with the theoretical value (1.006 ± 0.002, 1.078 ± 0.004, 1.078 ± 0.004, and 1.059 ± 0.004, respectively), dinucleotides UG and CA were over-represented (ρ_xy ≥ 1.25, ρ_xy = 1.331 ± 0.003, and ρ_xy = 1.351 ± 0.006), dinucleotide AC was close to over-represented (ρ_xy = 1.211 ± 0.007), dinucleotide CG was under-represented (ρ_xy ≤ 0.77, ρ_xy = 0.543 ± 0.005). These observations suggested that PEDV genome had a unique dinucleotide usage pattern. Taken together, our results suggested that dinucleotide composition also played a role in shaping the synonymous codon usage pattern of PEDV.

FIGURE 4

Overall Extent of Codon Usage Bias in Porcine Epidemic Diarrhea Virus

RNA viruses usually have a high ENC value profile, which helps the virus replication and facilitates the adaptation of the host to the preferred codons. The average ENC value of RNA viruses are estimated to be between 38.9 and 58.3 (Jenkins and Holmes, 2003). In this study, we observed that the ENC values for all the studied PEDV isolates ranged from 47.85 to 48.63 (Supplementary Table 2). In addition, the ENC value showed no statistically significant difference among different phylogroups, with 48.27 ± 0.270 in G1a, 48.09 ± 0.061 in G1b, 48.07 ± 0.162 in G2a, 48.03 ± 0.139 in G2b, 48.19 ± 0.132 in G2c, and 48.1 ± 0.166 in all of the 56 calculated isolates (p > 0.05) (Figure 5A). Furthermore, when considering the ENC values by different nations, we observed that the epidemic strains in Thailand had a significantly high value than the US and Mexico strains (Figure 5B). Those results proved that a low CUB existed in the PEDV genome.

FIGURE 5

In order to further study the influence of mutational pressure on the CUB pattern of PEDV genome, we then evaluated the correlation between the nucleotide compositions (A%, U%, G%, C%, and GC%), codon compositions (A3s, U3s, G3s, C3s, and GC3s) and ENC values (Table 3). The results indicated that most of the codon compositions correlated or significantly correlated with the nucleotide compositions. A3s content has a significant negative correlation with the contents of G and GC, but a positive correlation with that of U, GC2s and GC12s. U3s content has a significant negative correlation with the content of G, C, GC, and ENC, but a positive correlation with that of U. G3s content has a significant negative correlation with the contents of U, GC2s, and GC12s, but a positive correlation with that of G, GC, and ENC. C3s content has a significant negative correlation with the content of U, but a positive correlation with that of G, C, GC, and ENC. GC3s content has a significant negative correlation with the content of U, but a positive correlation with that of G, C, GC, and ENC. These results proved that the CUB of the PEDV was influenced by nucleotide compositions, which further confirmed that the mutational pressure has contributed in shaping the codon usage patterns within PEDV genomes.

TABLE 3

	A3s	U3s	G3s	C3s	GC3s
A	0.109	–0.069	0.038	–0.017	0.007
U	0.278*	0.906**	−0.471**	−0.880**	−0.806**
G	−0.929**	−0.698**	0.948**	0.447**	0.872**
C	0.192	−0.586**	–0.145	0.773**	0.348**
GC	−0.503**	−0.865**	0.610**	0.792**	0.844**
GC1s	0.257	0.108	–0.212	–0.087	–0.158
GC2s	0.306*	–0.111	−0.411**	0.258	–0.105
GC12s	0.376**	0.028	−0.396**	0.068	–0.187
ENC	0.03	−0.858**	0.268*	0.859**	0.666**

Correlation analysis between the codon compositions (A3s, U3s, G3s, C3s, and GC3s), the ENC values, nucleotide compositions (A%, U%, G%, C%, and GC%) of the 56 PEDV strains.

The numbers in each column represent correlation coefficient “r” values, which are calculated in each correlation analysis.

“*” means 0.01 < P < 0.05; “**” means P < 0.01.

Effective Codon Number Plot Analysis

Given that the relatively low CUB in the PEDV genome described above, this promotes us to clarify which factors affecting PEDV CUB. We then assessed the relationship between the ENC value and the percentage of G or C in the third site of codons (GC3s) in PEDV genomes. In ENC vs. GC3s graph, the curve represents the theoretically expected ENC values only considering the mutation factors and the points represents the actual ENC values which were calculated in the current study (Figures 5C,D). According to the ENC-GC3s plots, all obtained points of the PEDV coding sequence are below the expected ENC curve, which indicates that the effective codon usage for all of 56 complete coding sequences is lower than expected. Therefore, it can be explained that, in addition to mutation pressure, natural selection also plays a role in shaping the codon usage pattern of the PEDV genome.

Neutrality Plot Analysis

A neutrality plot analysis is a method to quantify mutational pressure and natural selection, which are two main evolutionary forces on a gene or genome. In order to know which factor plays a dominant role in shaping the codon usage pattern of PEDV coding sequence, we performed a neutrality plot analysis. We plotted GC12s as the vertical coordinate and GC3s as the horizontal coordinate to draw a linear regression line (Figure 6). In general, if the nucleotide changes at the third position of the codon don’t influence the deduced amino acids, they are just viewed as a mutational pressure. In the meantime, nucleotide changes that cause amino acid changes are considered a selection pressure. As shown in Figure 6A, a positive correlation was observed between the GC12s and GC3s values for G1a and G2 PEDV strains except G1b strains. The slopes of the linear regression were 0.11, −0.4289, 0.1516, 0.0298, and 0.0333 for G1a, G1b, G2a, G2b, and G2c coding sequences, respectively. These results indicate that mutational pressure accounted for 11, 15.16, 2.98, and 3.33% of the selection force for the G1a, G2a, G2b, and G2c coding sequences, whereas natural selection accounted for 89, 84.84, 97.02, and 96.67%, respectively. When considering the strains from different geographical areas, we also observed that natural selection accounted for 87.98% in Asia strains, 90.18% in North America strains and 83.99% in Europe strains, respectively (Figure 6B). Hence, neutrality analysis indicated that natural selection is the main force in shaping the CUB of PEDV.

FIGURE 6

Parity Rule 2 Bias Plot Analysis

A PR2 bias plot analysis is another tool to study the effect of mutational pressure and natural selection on the CUB of genes. Proportional distribution of bases indicates function of mutational pressure in affecting CUB in the virus genome. On the contrary, the disproportionate distribution of bases suggests the role of both mutational pressure and natural selection in determining the CUB (Sueoka, 1995). We analyzed the relationship between the purine (A and G) and the pyrimidine (C and U) content, with [A3/(A3 + U3)] on the vertical coordinate and [G3/(G3 + C3)] on the horizontal coordinate (Figure 7). Our results showed the means of AU bias [A3/(A3 + U3)] and GC bias [G3/(G3 + C3)], which were 0.3054 (0.3054 ± 0.001) and 0.4959 (0.4959 ± 0.002), respectively (Supplementary Table 5). A bias value larger than 0.5 suggests a preference for pyrimidine over purine (Zhang et al., 2018). Thus in PEDV, U is preferred over A, while C is preferred over G in the third codon position, regardless of PEDV strains from different phylogroups (Figure 7A) or different geographic locations (Figure 7B). This result suggested that both mutational bias and natural selection might have shaped the CUB in PEDV genomes.

FIGURE 7

Correlation Analysis Results

We also conducted the correlation analysis between the codon compositions, the first two principal axes value and the general average hydropathicity (GRAVY) and aromaticity (ARO) values. Our results revealed that these compositions were significantly correlated with the first axis, but not the second axis (Table 4). We also proved that ENC values had a negative correlation with the hydrophobicity (r = −0.353, p < 0.01), whereas there is no significant correlation between the ENC value and the aromaticity (r = 0.023, p > 0.05) (Table 5). In addition, the correlation analysis based on the PEDV complete coding sequences indicated that GRAVY is negatively correlated and significantly negatively associated with C3 (r = −0.331, p < 0.05) and ENC (r = −0.353, p < 0.01), respectively. Whereas, ARO showed a positive correlation with G3 (r = 0.339, p < 0.05) but was negatively correlated with A3 (r = −0.373, p < 0.01) and Axis1 (r = −0.445, p < 0.01) (Table 5). These results indicate that the overall average aromaticity and hydrophilicity of PEDV genome are related to the CUB, indicating that natural selection pressure has an effect on the codon usage pattern of PEDV. All together, these observations proposed that other factors, such as natural selection and nucleotide compositions, might also involve in shaping the CUB of PEDV.

TABLE 4

Base composition	Axis1	Axis2
A	–0.001	–0.173
U	0.389**	0.061
G	−0.892**	–0.033
C	0.199	–0.010
GC	−0.541**	0.007
A3s	0.828**	0.005
U3s	0.481**	0.184
G3s	−0.891**	–0.123
C3s	–0.228	–0.203
GC3s	−0.700**	–0.206
ENC	–0.117	–0.046

Correlation analysis between the nucleotide compositions and the two principal component axes in PEDV genomes.

The numbers in each column represent correlation coefficient “r” values, which are calculated in each correlation analysis.

“**” means P < 0.01.

TABLE 5

		Axis1	Axis2	ENC	GC	A3s	U3s	G3s	C3s	GC3s
GRAVY	r	–0.198	–0.158	−0.353**	–0.203	–0.180	0.137	0.242	−0.331*	–0.015
	p	0.143	0.243	0.008	0.133	0.185	0.313	0.073	0.013	0.915
ARO	r	−0.445**	0.273	0.023	0.161	−0.373**	–0.123	0.339*	0.118	0.237
	p	0.001	0.042*	0.866	0.236	0.005	0.367	0.011	0.386	0.079

Correlation analysis among ARO, GRAVY, the first two axes, GC3s, ENC and GC in the 56 PEDV strains.

The numbers in each column represent correlation coefficient “r” values, which are calculated in each correlation analysis. “*” means 0.01 < P < 0.05; “**” means P < 0.01.

Genotype I Porcine Epidemic Diarrhea virus strains Showed the Highest Host Adaptation Phenotype for Sus scrofa

Relative codon deoptimization index (RCDI) values are measured by comparing the codon usage of virus with that of its host. A lower RCDI value indicates higher adaptation of a virus to its host. Conversely, a high RCDI value indicates that the virus is less adaptive to its host. In the cases of PEDV, the mean RCDI for genotype I strains was found statistically significantly lower compared with genotype II strains (Figure 8), which suggest that genotype I PEDV strains have a higher degree of adaptation to Sus scrofa.

FIGURE 8

Sus scrofa Exerted a Significantly Deeper Selection Pressure on Porcine Epidemic Diarrhea Virus Genotype II Strains

SiD analysis allowed for a direct measurement of the codon usage similarities between the hosts and viruses. SiD analysis was carried out to demonstrate whether the host (Sus scrofa) influences the codon usage patterns of the PEDV in the process of evolution and which phylogroup was most affected. It was observed that the mean SiD value was highest in genotype II strains regardless of grouped by phylogroups (Figure 9). Specifically, SiD value of subgenotype 1a (0.1204 ± 0.0002) was statistically significantly lower compared with subgenotypes 2a (0.1228 ± 0.0010), subgenotypes 2b (0.1232 ± 0.0010), and subgenotypes 2c (0.1221 ± 0.0010) (Figure 9). This relatively low D (A, B) values indicated that Genotype I PEDV strains can replicate more efficiently in the Sus scrofa system without much impact on the host codon usage. Taken together, these results suggested that Sus scrofa induced a significantly stronger selection pressure on Genotype II strains, which implied that genotype I strains might be more adapted to their host (Sus scrofa) than genotype II strains.

FIGURE 9

Discussion

Codon usage bias (CUB) refers to the unbalanced use of codons when encoding an amino acid. Mutation pressure and natural selection are two main factors affecting CUB in a species genome (Sharp and Li, 1986a; Sharp et al., 1986; Karlin and Mrázek, 1996; Barbhuiya et al., 2020), but other factors such as base composition, evolutionary pressure and geographic distribution may also have influence. While extensive studies on nucleotide composition property and CUB of advanced life forms such as Drosophila and mammals have been carried out (Eyre-Walker, 1991; Machado et al., 2020), similar studies with pathogens are limited. It is believed that knowledge about the codon usage patterns and the related influencing factors is important to understand the genetic evolution of the pathogens, such as bacteria and viruses (Cai et al., 2009; van Hemert et al., 2016; Zhou et al., 2019; Jin et al., 2020; MacLean et al., 2021). To fill in the gaps in the study of PEDV, we adopted several analytical approaches in this study to investigate the codon usage pattern and elucidate the involved factors influencing PEDV codon bias.

Relative synonymous codon usage (RSCU) analysis is the most commonly used tool to study the CUB of a gene. The codon usage patterns are specific to family, genus and even at the species level. In order to analyze this specificity in detail at the species level, the RSCU values of PEDV were computed and compared with host species. According to the values observed in this study, PEDV showed a CUB phenomenon in its genome, since out of the 26 preferred codons (except UGG, AUG, and stop codons) 5 were A-ended, 16 was U-ended, 4 were G-ended, 1 was C-ended. The content of A/U is the highest in PEDV genome (mean value was 24.78 and 33.35, respectively). This result indicated that PEDV prefer A/U-ended codons, further proving existence of CUB in PEDV genome. Among the RSCU values observed, almost all of the over-represented codons are U-ended, and the majority under-represented codons are A/G-ended. It is worth to note that the 6 codons, CUA (Leu), GUA (Val), UCG (Ser), CCG (Pro), ACG (Thr), GCG (Ala), were under-represented in PEDV and all the reference hosts species. Furthermore, almost all the RSCU values of less than 0.5 in those under-represented codons were presented as NCG/CGN form, indicating a strong CpG suppression or deficiency (Table 1). Study has shown that during the process of evolution, many viruses tended to reduce the content of CpG in their genomic components, which benefits its immune escape or host adaption. For example, the extremely low CpG dinucleotide content favorited influenza B virus to adapt to its human host (Greenbaum et al., 2008). The avoidance of the CpG dinucleotide is also commonly observed in many other RNA viruses (Kunec and Osterrieder, 2016; Wang et al., 2016; Roy et al., 2021), and is regarded as another selective pressure contributing in CUB (Gómez et al., 2011; Kumar et al., 2016). According to reports, the unmethylated CpGs of viral pathogens can be recognized by Toll like receptor 9 (TLR9) in the host cell, thereby generating an immune response to combat the pathogen (Dorn and Kippenberger, 2008; Kunec and Osterrieder, 2016). Thus, CpG deficiency in PEDV coding sequence appears to be an effective strategy to escape the host antiviral immune response, which reminds us that we should take this phenomenon seriously during the formulation and implementation of anti-PEDV strategies. Indeed, recently studies proved that this strategy was employed in other human and swine derived coronavirus, i.e., the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and transmissible gastroenteritis virus (TGEV) (Cheng et al., 2020; Roy et al., 2021). It is suggested that CpG deficiency in RNA viral genomes is another selective pressure contributing in CUB (Vetsigian and Goldenfeld, 2009; Gómez et al., 2011). In our study, this CpG deficiency composition in PEDV and perhaps other RNA viruses, has the potential possibility to benefit the viruses to adapt to their host and escape the host’s defense system. Studies have shown that RNA viruses generally have low CUB, and their ENC values are in the range of 47.62–57.23 (Hu et al., 2011; Lara-Ramírez et al., 2014; Cristina et al., 2015; Zhou et al., 2019; Nguyen et al., 2021), which would facilitate virus replication and adaptation to the host immune system (Chen et al., 2014; Khandia et al., 2019). In order to prove whether the CUB in PEDV genome follows this rule, we calculated the ENC value in each genome and the results in this study indicated that there is a low CUB in all PEDV isolates with the mean ENC value of 48.1 (Supplementary Table 2), this value falls in the low bias range and was compatible with the similar study on PEDV (Chen et al., 2014). We also compared the difference of ENC values between G1 and G2 phylogroups and found that ENC value has little change between the two phylogroups. The low CUB may be important for the efficient replication of PEDV in the host cells, and support the consensus that RNA virus genome sequences are prone to mutation in the process of evolution.

It is believed that the codon usage pattern is mainly affected by mutation pressure and natural selection, but it is still unknown what the case for PEDV. According to Wright’s method (Wright, 1990), we employed an ENC vs. GC3s analysis to evaluate the selection pressure on the PEDV codon usage pattern. Our results demonstrated that the data points representing the ENC value for each PEDV isolate were located below the expected curve and clustered together, suggesting that natural selection plays a role in PEDV codon usage pattern. Besides, neutrality plot analysis was conducted to quantify the effect of mutational pressure and natural selection and results supported that natural selection plays an important role in shaping the codon usage of PEDV. Moreover, Parity rule 2 (PR2) analysis showed that U and C were preferred over A and G, implying that PEDV CUB was shaped by mutation pressure and natural selection.

We also found a significant strong correlation between the overall composition of majority nucleotides and the composition of the third position in the codon (Table 3), which confirmed the effect of mutational pressure on CUB according to a previous study (Chen et al., 2014). In addition, the correlation analysis also showed that there was a significant correlation of the ENC values with the overall nucleotide content and axis value from principal component analysis (PCA), and the GRAVY/ARO values with the third nucleotide position of codon, which further supported the influencing role of mutational pressure and the effect of natural selection on PEDV CUB, respectively. It is worth to note that COA analysis found that two phylogroups of PEDV clustered separately between the first two axes, indicating that the codon usage pattern between different phylogroups have difference. Apart from those two selection pressures, reports have showed that other factors such as geographic distribution (Chen et al., 2014; Nguyen et al., 2021) and relative dinucleotide frequencies (Cheng et al., 2020; Munjal et al., 2020) also acts as the driving forces in shaping the codon usage pattern in many RNA viruses. Specifically in our study, unique dinucleotide usage pattern and the relative dinucleotide frequencies show that dinucleotide composition also plays a role in the synonymous codon usage pattern of PEDV. Moreover, COA analysis revealed that the geographical diversity may influence the entire PEDV codon usage, which reflects the evolutionary characteristics of PEDV genome to some extent. In this case, the CUB may be one of the potential factors driving the evolution of PEDV.

The RCDI has been suggested to be an effective index of the degree of viral adaptation to a host’s cellular environment. Among the two classical divergent clades, genotype I strains presented statistically significantly lower RCDI value compared with genotype II strains (Figure 8), suggesting more adapted potential and fitness of PEDV genotype I strains to swine cellular system. Because there is no reference in this regard, we are concerned that a single RCDI value is not convincible enough to stress such an important issue. Therefore, we have incorporated additional codon usage indices, SiD, to strength our findings and further evaluate the adaptation of PEDV to host species. We found that the mean SiD value of genotype II was statistically significantly higher compared with genotype I (Figure 9), indicating that host animal (Sus scrofa) exerted a significantly deeper selection pressure on Genotype II strains compared to the genotype I counterpart. The RCDI and SiD analyses suggest that PEDV might be adapted to its natural host (Sus scrofa), which supporting that Sus scrofa plays an important role as a PEDV reservoir (Lee et al., 2016; Turlewicz-Podbielska and Pomorska-Mól, 2021). In addition, those results also reflect that the selection pressure from Sus scrofa may influence the codon usage pattern of PEDV and that the translation resources of host system are more efficient for PEDV evolution. The current fact is that PEDV has now spread all around the world, and swine is the only natural host, it is possible that PEDV might improve its adaptive fitness to pig cells through the complex adaptive evolution process, thus resulting in further risks of global transmission and subsequent outbreak. In this respect, stricter anti-PEDV measures are urgently needed.

Up to now, based on the continued study of codon usage patterns of viral genome, we have a considerable understanding of the codon usage profile and the preferred and avoided codons and codon pairs in certain viral coding sequence, and this concept has been successfully applied in synthetic attenuated virus engineering and poliovirus live-attenuated vaccines development (Burns et al., 2006; Coleman et al., 2008). Back to our research, those results we obtained in this study concerning the preferred and under-represented codons in PEDV might be employed for the design of new generation PEDV vaccine and other prevention measures.

Collectively, our study showed that PEDV genome presented a relative low CUB, which suggested that the frequency of synonymous codon usage in PEDV genome is scattered. In addition, two phylogroups of PEDV may evolve with subtle difference under mutation and selection pressures. Moreover, mutation pressure and natural selection are the two main forces in influencing the PEDV’s codon usage pattern, and dinucleotide composition and geographical distribution are another potential influencing factor shaping the codon usage pattern of PEDV. Furthermore, PEDV has evolved a mixture of coincident and antagonistic codon usage patterns relative to Sus scrofa, which benefit its host adaptation and viral replicative fitness. This study not only provides a comprehensive investigation of the codon usage patterns of PEDV, but also helps to deepen our understanding of the processes governing the evolution of PEDV.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions

FS: conceptualization, resources and data curation, writing—original draft preparation, and project administration. FS and LJ: methodology, software, visualization, formal analysis, and investigation. FS, LJ, and WW: validation. FS, WW, and ZL: writing—review and editing. FS and ZL: supervision. FS and RY: funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (Grant No. 32072838) and National Key Research and Development Program of China (Grant No. 2016YFD0500101). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Acknowledgments

We acknowledge and thank the editors and reviewers, who contributed immensely in improving the quality of this publication.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2021.738082/full#supplementary-material

Supplementary Figure 1

Cluster analysis (Heat map) of RSCU values among different subgenotypes (A) or countries (B) of PEDV isolates used in this study. The heat map represents the RSCU values divided into 3 ranges: < 1 (Green color), 1–1.6 (Yellow color) and > 1.6 (Distinct red). The heatmap analysis was performed using CIMminer. Each column represents a codon. The higher RSCU value, suggesting more frequent codon usage, was represented with distinct red. The codon usage is highly biased toward A/U-ending codons. Euclidean distance and complete-linkage methods were used to produce the clusters.

Supplementary Table 1

The detailed information describing the 56 PEDV strains used in this study.

Supplementary Table 2

Overall occurrence of nucleotide composition, the third position nucleotide contents and ENC values of 56 PEDV strains.

Supplementary Table 3

Nucleotide composition and properties of the G1 and G2 complete PEDV coding sequences.

Supplementary Table 4

Relative dinucleotide abundance of the complete PEDV coding sequences used in this study.

Supplementary Table 5

AU bias [A3%/(A3% + U3%)] and GC bias [G3%/(G3%+ C3%)] for the 56 PEDV strains used in this study.

Abbreviations

PEDV
porcine epidemic diarrhea virus
CUB
codon usage bias
RNP
ribonucleoprotein
G1
genotype I
G2
genotype II
ORFs
open reading frames
NJ
neighbor-Joining
ML
maximum-likelihood
GC1s
guanine plus cytosine content of first codon position
GC2s
guanine plus cytosine content of second codon position
GC3s
guanine plus cytosine content of third codon position
ENC
effective codon number
RSCU
relative synonymous codon usage
COA
correspondence analysis
PCA
principal component analysis
PR2
parity rule 2
GRAVY
hydropathicity
ARO
aromaticity
RCDI
relative codon deoptimization index
SiD
similarity index
SD
standard deviation
AA
amino acids
TLR9
toll like receptor 9
SARS-CoV-2
novel severe acute respiratory syndrome coronavirus 2
TGEV
transmissible gastroenteritis virus.

Footnotes

1.^http://www.ncbi.nlm.nih.gov/

2.^https://www.ebi.ac.uk/Tools/msa/muscle/

3.^http://itol.embl.de/

4.^https://www.r-project.org/

5.^https://www.kazusa.or.jp/codon/

6.^https://discover.nci.nih.gov/cimminer/

References

1
AnwarA. M.SoudyM.MohamedR. (2019). vhcub: Virus-host codon usage co-adaptation analysis.F1000Research8:2137. 10.12688/f1000research.21763.1
2
BarbhuiyaP. A.UddinA.ChakrabortyS. (2020). Codon usage pattern and evolutionary forces of mitochondrial ND genes among orders of class Amphibia.J. Cell. Physiol.2362850–2868. 10.1002/jcp.30050
3
BurnsC. C.ShawJ.CampagnoliR.JorbaJ.VincentA.QuayJ.et al (2006). Modulation of poliovirus replicative fitness in HeLa cells by deoptimization of synonymous codon usage in the capsid region.J. Virol.803259–3272. 10.1128/jvi.80.7.3259-3272.2006
4
ButtA. M.NasrullahI.QamarR.TongY. (2016). Evolution of codon usage in Zika virus genomes is host and vector specific.Emer. Microb. Infect.51–14. 10.1038/emi.2016.106
5
CaiM.-S.ChengA.-C.WangM.-S.ZhaoL.-C.ZhuD.-K.LuoQ.-H.et al (2009). Characterization of synonymous codon usage bias in the duck plague virus UL35 gene.Intervirology52266–278. 10.1159/000231992
6
CaoH.ZhangH.LiD. (2011). Synonymous codon usage bias of spike genes of porcine epidemic diarrhea virus.Afr. J. Microbiol. Res.53784–3789.
- Google Scholar
7
CharifD.LobryJ. R. (2007). “SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis,” in Structural Approaches to Sequence Evolution, edsBastollaU.PortoM.RomanH. E.VendruscoloM. (Berlin: Springer), 207–232. 10.1007/978-3-540-35306-5_10
- CrossRef
- Google Scholar
8
ChenY.ShiY.DengH.GuT.XuJ.OuJ.et al (2014). Characterization of the porcine epidemic diarrhea virus codon usage bias.Infect. Genet. Evol.2895–100. 10.1016/j.meegid.2014.09.004
9
ChengS.WuH.ChenZ. (2020). Evolution of Transmissible Gastroenteritis Virus (TGEV): a codon usage perspective.Int. J. Mol. Sci.21:7898.
- Google Scholar
10
ColemanJ. R.PapamichailD.SkienaS.FutcherB.WimmerE.MuellerS. (2008). Virus attenuation by genome-scale changes in codon pair bias.Science3201784–1787. 10.1126/science.1155761
11
ComeronJ. M.AguadéM. (1998). An evaluation of measures of synonymous codon usage bias.J. Mol. Evol.47268–274. 10.1007/pl00006384
12
CristinaJ.MorenoP.MoratorioG.MustoH. (2015). Genome-wide analysis of codon usage bias in Ebolavirus.Virus Res.19687–93. 10.1016/j.virusres.2014.11.005
13
D’andreaL.Pérez-RodríguezF.-J.De CastellarnauM.GuixS.RibesE.QuerJ.et al (2019). The critical role of codon composition on the translation efficiency robustness of the hepatitis a virus capsid.Genome Biol. Evol.112439–2456. 10.1093/gbe/evz146
14
DebB.UddinA.ChakrabortyS. (2021). Genome-wide analysis of codon usage pattern in herpesviruses and its relation to evolution.Virus Res.292:198248. 10.1016/j.virusres.2020.198248
15
DornA.KippenbergerS. (2008). Clinical application of CpG-, non- CpG-, and antisense oligodeoxynucleotides as immunomodulators.Curr. Opin. Mol. Therap.1010–20.
- Google Scholar
16
EwensW. J.GrantG. R. (2006). Statistical Methods in Bioinformatics: an Introduction, 2nd Edn. New York, NY: Springer Science & Business Media, 111–151.
- Google Scholar
17
Eyre-WalkerA. C. (1991). An analysis of codon usage in mammals: selection or mutation bias?J. Mol. Evol.33442–449. 10.1007/bf02103136
18
FanB.JiaoD.ZhaoX.PangF.XiaoQ.YuZ.et al (2017). Characterization of Chinese porcine epidemic diarrhea virus with novel insertions and deletions in genome.Sci. Rep.7:44209.
- Google Scholar
19
GómezM. M.TortL. F. L.De Mello VolotaoE.RecareyR.MoratorioG.MustoH.et al (2011). Analysis of human P [4] G2 rotavirus strains isolated in Brazil reveals codon usage bias and strong compositional constraints.Infect. Genet. Evol.11580–586. 10.1016/j.meegid.2011.01.006
20
GranthamR.GautierC.GouyM.MercierR.PaveA. (1980). Codon catalog usage and the genome hypothesis.Nucleic Acids Res.8:197. 10.1093/nar/8.1.197-c
21
GreenbaumB. D.LevineA. J.BhanotG.RabadanR. (2008). Patterns of evolution and host gene mimicry in influenza and other RNA viruses.PLoS Pathog.4:e1000079. 10.1371/journal.ppat.1000079
22
HeZ.DongZ.QinL.GanH. (2021). Phylodynamics and codon usage pattern analysis of broad bean wilt virus 2.Viruses13:198. 10.3390/v13020198
23
HeZ.GanH.LiangX. (2019). Analysis of synonymous codon usage bias in potato virus M and Its adaption to hosts.Viruses11:752. 10.3390/v11080752
24
HsuehF. C.LinC. N.ChiouH. Y.ChiaM. Y.ChiouM. T.HagaT.et al (2020). Updated phylogenetic analysis of the spike gene and identification of a novel recombinant porcine epidemic diarrhoea virus strain in Taiwan.Transbound. Emer. Dis.67417–430. 10.1111/tbed.13365
25
HuJ.-S.WangQ.-Q.ZhangJ.ChenH.-T.XuZ.-W.ZhuL.et al (2011). The characteristic of codon usage pattern and its evolution of hepatitis C virus.Infect. Genet. Evol.112098–2102.
- Google Scholar
26
HuangW.GuoY.LiN.FengY.XiaoL. (2021). Codon usage analysis of zoonotic coronaviruses reveals lower adaptation to humans by SARS-CoV-2.Infect. Genet. Evol.89:104736. 10.1016/j.meegid.2021.104736
27
HuangY. W.DickermanA. W.PineyroP.LiL.FangL.KiehneR.et al (2013). Origin, evolution, and genotyping of emergent porcine epidemic diarrhea virus strains in the United States.MBio4e00737–13.
- Google Scholar
28
JenkinsG. M.HolmesE. C. (2003). The extent of codon usage bias in human RNA viruses and its evolutionary origin.Virus Res.921–7. 10.1016/s0168-1702(02)00309-x
- CrossRef
- Google Scholar
29
JinL.GaoH.CaoX.HanS.XuL.MaZ.et al (2020). Significance and roles of synonymous codon usage in the evolutionary process of Proteus.J. Basic Microbiol.60424–434. 10.1002/jobm.201900647
30
JungK.SaifL. J. (2015). Porcine epidemic diarrhea virus infection: Etiology, epidemiology, pathogenesis and immunoprophylaxis.Vet. J.204134–143. 10.1016/j.tvjl.2015.02.017
31
JungK.SaifL. J.WangQ. (2020). Porcine epidemic diarrhea virus (PEDV): An update on etiology, transmission, pathogenesis, and prevention and control.Virus Res.286:198045. 10.1016/j.virusres.2020.198045
32
KalyaanamoorthyS.MinhB. Q.WongT. K.Von HaeselerA.JermiinL. S. (2017). ModelFinder: fast model selection for accurate phylogenetic estimates.Nat. Methods14587–589. 10.1038/nmeth.4285
33
KariinS.BurgeC. (1995). Dinucleotide relative abundance extremes: a genomic signature.Trends Genet.11283–290. 10.1016/s0168-9525(00)89076-9
- CrossRef
- Google Scholar
34
KarlinS.MrázekJ. (1996). What drives codon choices in human genes?J. Mol. Biol.262459–472. 10.1006/jmbi.1996.0528
35
KhandiaR.SinghalS.KumarU.AnsariA.TiwariR.DhamaK.et al (2019). Analysis of Nipah virus codon usage and adaptation to hosts.Front. Microbiol.10:886.
- Google Scholar
36
KumarN.BeraB. C.GreenbaumB. D.BhatiaS.SoodR.SelvarajP.et al (2016). Revelation of influencing factors in overall codon usage bias of equine influenza viruses.PLoS One11:e0154376. 10.1371/journal.pone.0154376
37
KumarN.KaushikR.TennakoonC.UverskyV. N.LonghiS.ZhangK. Y.et al (2021). Insights into the evolutionary forces that shape the codon usage in the viral genome segments encoding intrinsically disordered protein regions. Brief. Bioinform.22:bbab145. 10.1093/bib/bbab145
38
KumarN.KulkarniD. D.LeeB.KaushikR.BhatiaS.SoodR.et al (2018). Evolution of codon usage bias in Henipaviruses is governed by natural selection and is host-specific. Viruses10:604. 10.3390/v10110604
39
KunecD.OsterriederN. (2016). Codon pair bias is a direct consequence of dinucleotide bias.Cell Rep.1455–67. 10.1016/j.celrep.2015.12.011
40
Lara-RamírezE. E.SalazarM. I.López-LópezM. D. J.Salas-BenitoJ. S.Sánchez-VarelaA.GuoX. (2014). Large-scale genomic analysis of codon usage in dengue virus and evaluation of its phylogenetic dependence.BioMed Res. Int.2014:851425.
- Google Scholar
41
LauringA. S.AcevedoA.CooperS. B.AndinoR. (2012). Codon usage determines the mutational robustness, evolutionary capacity, and virulence of an RNA virus.Cell Host Microbe12623–632. 10.1016/j.chom.2012.10.008
42
LeeD. U.KwonT.SangH. J.YooS. J.SeoS. W.SunwooS. Y.et al (2016). Wild boars harboring porcine epidemic diarrhea virus (PEDV) may play an important role as a PEDV reservoir.Vet. Microbiol.19290–94. 10.1016/j.vetmic.2016.07.003
43
LetunicI.BorkP. (2021). Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation.Nucleic Acids Res.49W293–W296.
- Google Scholar
44
LiW.LiH.LiuY.PanY.DengF.SongY.et al (2012). New variants of porcine epidemic diarrhea virus, China, 2011.Emer. Infect. Dis.181350–1353. 10.3201/eid1808.120002
45
LloydA. T.SharpP. M. (1992). Evolution of codon usage patterns: the extent and nature of divergence between Candida albicans and Saccharomyces cerevisiae.Nucleic Acids Res.205289–5295. 10.1093/nar/20.20.5289
46
LuoW.RoyA.GuoF.IrwinD. M.ShenX.PanJ.et al (2020a). Host adaptation and evolutionary analysis of Zaire ebolavirus: insights from codon usage based investigations.Front. Microbiol.11:2823.
- Google Scholar
47
LuoW.TianL.GanY.ChenE.ShenX.PanJ.et al (2020b). The fit of codon usage of human-isolated avian influenza A viruses to human.Infect. Genet. Evol.81:104181. 10.1016/j.meegid.2020.104181
48
MachadoH. E.LawrieD. S.PetrovD. A. (2020). Pervasive strong selection at the level of codon usage bias in drosophila melanogaster.Genetics214511–528. 10.1534/genetics.119.302542
49
MacLeanO. A.LytrasS.WeaverS.SingerJ. B.BoniM. F.LemeyP.et al (2021). Natural selection in the evolution of SARS-CoV-2 in bats created a generalist virus and highly capable human pathogen.PLoS Biol.19:e3001115. 10.1371/journal.pbio.3001115
50
MartinA.BertranpetitJ.OliverJ.MedinaJ. (1989). Variation in G+ C-content and codon choice: differences among synonymous codon groups in vertebrate genes.Nucleic Acids Res.176181–6189. 10.1093/nar/17.15.6181
51
MinhB. Q.SchmidtH. A.ChernomorO.SchrempfD.WoodhamsM. D.Von HaeselerA.et al (2020). IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era.Mol. Biol. Evol.371530–1534. 10.1093/molbev/msaa015
52
MordsteinC.CanoL.MoralesA. C.YoungB.HoA. T.RiceA. M.et al (2021). Transcription, mRNA export and immune evasion shape the codon usage of viruses.Genome Biol. Evol.13:evab106.
- Google Scholar
53
MuellerS.PapamichailD.ColemanJ. R.SkienaS.WimmerE. (2006). Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity.J. Virol.809687–9696. 10.1128/jvi.00738-06
54
MunjalA.KhandiaR.ShendeK. K.DasJ. (2020). Mycobacterium lepromatosis genome exhibits unusually high CpG dinucleotide content and selection is key force in shaping codon usage.Infect. Genet. Evol.84:104399. 10.1016/j.meegid.2020.104399
55
NasrullahI.ButtA. M.TahirS.IdreesM.TongY. (2015). Genomic analysis of codon usage shows influence of mutation pressure, natural selection, and host features on Marburg virus evolution.BMC Evol. Biol.15:174.
- Google Scholar
56
NguyenT. H.WangD.RahmanS. U.BaiH.YaoX.ChenD.et al (2021). Analysis of codon usage patterns and influencing factors in rice tungro bacilliform virus.Infect. Genet. Evol.90:104750. 10.1016/j.meegid.2021.104750
57
PensaertM. B.de BouckP. (1978). A new coronavirus-like particle associated with diarrhea in swine.Arch. Virol.58243–247. 10.1007/bf01317606
58
RoyA.GuoF.SinghB.GuptaS.PaulK.ChenX.et al (2021). Base composition and host adaptation of the SARS-CoV-2: insight from the codon usage perspective.Front. Microbiol.12:548275.
- Google Scholar
59
SharpP. M.LiW.-H. (1986b). An evolutionary perspective on synonymous codon usage in unicellular organisms.J. Mol. Evol.2428–38. 10.1007/bf02099948
60
SharpP. M.LiW.-H. (1986a). Codon usage in regulatory genes in Escherichia coli does not reflect selection for ‘rare’ codons.Nucleic Acids Res.147737–7749. 10.1093/nar/14.19.7737
61
SharpP. M.TuohyT. M. F.MosurskiK. R. (1986). Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes.Nucleic Acids Res.145125–5143. 10.1093/nar/14.13.5125
62
SiF.HuX.WangC.ChenB.WangR.DongS.et al (2020). Porcine Epidemic Diarrhea Virus (PEDV) ORF3 enhances viral proliferation by inhibiting apoptosis of infected cells.Viruses12:214. 10.3390/v12020214
63
SpielmanS. J.WilkeC. O. (2015). Pyvolve: a flexible python module for simulating sequences along phylogenies.PLoS One10:e0139047. 10.1371/journal.pone.0139047
64
SueokaN. (1988). Directional mutation pressure and neutral molecular evolution.Proc. Natl. Acad. Sci. U.S.A.852653–2657. 10.1073/pnas.85.8.2653
65
SueokaN. (1995). Intrastrand parity rules of DNA base composition and usage biases of synonymous codons.J. Mol. Evol.40318–325. 10.1007/bf00163236
66
TianL.ShenX.MurphyR. W.ShenY. (2018). The adaptation of codon usage of+ ssRNA viruses to their hosts.Infect. Genet. Evol.63175–179. 10.1016/j.meegid.2018.05.034
67
Turlewicz-PodbielskaH.Pomorska-MólM. (2021). Porcine coronaviruses: overview of the state of the art.Virol. Sin.1–19. 10.1007/s12250-021-00364-0
68
van HemertF.Van Der KuylA. C.BerkhoutB. (2016). Impact of the biased nucleotide composition of viral RNA genomes on RNA structure and codon usage.J. General Virol.972608–2619. 10.1099/jgv.0.000579
69
VetsigianK.GoldenfeldN. (2009). Genome rhetoric and the emergence of compositional bias.Proc. Natl. Acad. Sci. U.S.A.106215–220. 10.1073/pnas.0810122106
70
WangH.LiuS.ZhangB.WeiW. (2016). Analysis of synonymous codon usage bias of Zika virus and its adaption to the hosts.PLoS One11:e0166260. 10.1371/journal.pone.0166260
71
WangH.ZhangL.ShangY.TanR.JiM.YueX.et al (2020). Emergence and evolution of highly pathogenic porcine epidemic diarrhea virus by natural recombination of a low pathogenic vaccine isolate and a highly pathogenic strain in the spike gene.Virus Evol.6:veaa049.
- Google Scholar
72
WangK.LuW.ChenJ.XieS.ShiH.HsuH.et al (2012). PEDV ORF3 encodes an ion channel protein and regulates virus production.FEBS Lett.586384–391. 10.1016/j.febslet.2012.01.005
73
WangK.XieS.SunB. (2011). Viral proteins function as ion channels.Biochim. Biophys. Acta (BBA) Biomembranes1808510–515. 10.1016/j.bbamem.2010.05.006
74
WangX.-W.WangM.ZhanJ.LiuQ.-Y.FangL.-L.ZhaoC.-Y.et al (2020). Pathogenicity and immunogenicity of a new strain of porcine epidemic diarrhea virus containing a novel deletion in the N gene.Vet. Microbiol.240:108511. 10.1016/j.vetmic.2019.108511
75
WongE. H.SmithD. K.RabadanR.PeirisM.PoonL. L. (2010). Codon usage bias and the evolution of influenza A viruses. Codon usage biases of influenza virus.BMC Evol. Biol.10:253.
- Google Scholar
76
WrightF. (1990). The ‘effective number of codons’ used in a gene.Gene8723–29. 10.1016/0378-1119(90)90491-9
- CrossRef
- Google Scholar
77
WuY.ZhaoD.TaoJ. (2015). Analysis of codon usage patterns in herbaceous peony (Paeonia lactiflora Pall.) based on transcriptome data.Genes61125–1139. 10.3390/genes6041125
78
XuX.LiP.ZhangY.WangX.XuJ.WuX.et al (2019). Comprehensive analysis of synonymous codon usage patterns in orf3 gene of porcine epidemic diarrhea virus in China.Res. Vet. Sci.12742–46. 10.1016/j.rvsc.2019.09.012
79
YeS.LiZ.ChenF.LiW.GuoX.HuH.et al (2015). Porcine epidemic diarrhea virus ORF3 gene prolongs S-phase, facilitates formation of vesicles and promotes the proliferation of attenuated PEDV.Virus Genes51385–392. 10.1007/s11262-015-1257-y
80
YuX.LiuJ.LiH.LiuB.ZhaoB.NingZ. (2021b). Comprehensive analysis of synonymous codon usage patterns and influencing factors of porcine epidemic diarrhea virus.Arch. Virol.166157–165. 10.1007/s00705-020-04857-3
81
YuX.LiuJ.LiH.LiuB.ZhaoB.NingZ. (2021a). Comprehensive analysis of synonymous codon usage bias for complete genomes and E2 gene of atypical porcine pestivirus.Biochem. Genet.59799–812. 10.1007/s10528-021-10037-y
82
ZangM.HeW.DuF.WuG.WuB.ZhouZ. (2017). Analysis of the codon usage of the ORF2 gene of feline calicivirus.Infect. Genet. Evol.5454–59. 10.1016/j.meegid.2017.06.013
83
ZhangL.-P.CaiY.-Y.YuD.-N.StoreyK. B.ZhangJ.-Y. (2018). Gene characteristics of the complete mitochondrial genomes of Paratoxodera polyacantha and Toxodera hauseri (Mantodea: Toxoderidae).PeerJ6:e4595. 10.7717/peerj.4595
84
ZhouJ.-H.LiX.-R.LanX.HanS.-Y.WangY.-N.HuY.et al (2019). The genetic divergences of codon usage shed new lights on transmission of hepatitis E virus from swine to human.Infect. Genet. Evol.6823–29. 10.1016/j.meegid.2018.11.024
85
ZhouJ.-H.ZhangJ.SunD.-J.MaQ.ChenH.-T.MaL.-N.et al (2013). The distribution of synonymous codon choice in the translation initiation region of dengue virus.PLoS One8:e77239.
- Google Scholar
86
ZhouZ.DangY.ZhouM.LiL.YuC.-H.FuJ.et al (2016). Codon usage is an important determinant of gene expression levels largely through its effects on transcription.Proc. Natl. Acad. Sci. U.S.A.113E6117–E6125.
- Google Scholar

Summary

Keywords

coronavirus, PEDV, codon usage, selection pressure, host adaptation, viral evolution

Citation

Si F, Jiang L, Yu R, Wei W and Li Z (2021) Study on the Characteristic Codon Usage Pattern in Porcine Epidemic Diarrhea Virus Genomes and Its Host Adaptation Phenotype. Front. Microbiol. 12:738082. doi: 10.3389/fmicb.2021.738082

Received

08 July 2021

Accepted

20 September 2021

Published

18 October 2021

Volume

12 - 2021

Edited by

Robert Czajkowski, University of Gdańsk, Poland

Reviewed by

Naveen Kumar, ICAR-National Institute of High Security Animal Diseases (ICAR-NIHSAD), India; Siddiq Ur Rahman, Khushal Khan Khattak University, Pakistan; Arif Uddin, Assam University, India

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Fusheng Si, mr.fusheng@163.comWenqiang Wei, weiwq168@163.com

†These authors have contributed equally to this work

This article was submitted to Virology, a section of the journal Frontiers in Microbiology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

ORIGINAL RESEARCH article

Study on the Characteristic Codon Usage Pattern in Porcine Epidemic Diarrhea Virus Genomes and Its Host Adaptation Phenotype

Abstract

Introduction

Materials and Methods

Sequence Data Retrieving and Processing

Phylogenetic Analysis

Analysis of Overall Nucleotide Composition

Relative Synonymous Codon Usage Analysis

Correspondence Analysis of Relative Synonymous Codon Usage

Relative Dinucleotide Abundance of Porcine Epidemic Diarrhea Virus

Evaluation of the Influence of Selection Pressure on Codon Usage Bias

Neutrality Plot Analysis

Parity Rule 2 Analysis

Hydropathicity (GRAVY) and Aromaticity Indices Analysis

Correlation Analysis

Relative Codon Deoptimization Index

Similarity Index

Software and Statistical Analysis

Results

Phylogenetic Analysis Based on Coding Sequences of Porcine Epidemic Diarrhea Virus

Nucleotide Composition of the Porcine Epidemic Diarrhea Virus Genome

Relative Synonymous Codon Usage Analysis

Correspondence Analysis

Relationship Between Relative Dinucleotide Abundance and Codon Usage in Porcine Epidemic Diarrhea Virus Genome

Overall Extent of Codon Usage Bias in Porcine Epidemic Diarrhea Virus

Effective Codon Number Plot Analysis

Neutrality Plot Analysis

Parity Rule 2 Bias Plot Analysis

Correlation Analysis Results

Genotype I Porcine Epidemic Diarrhea virus strains Showed the Highest Host Adaptation Phenotype for Sus scrofa

Sus scrofa Exerted a Significantly Deeper Selection Pressure on Porcine Epidemic Diarrhea Virus Genotype II Strains

Discussion

Publisher’s Note

Statements

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Supplementary material

Abbreviations

Footnotes

References

Summary

Outline

Figures

Cite article

Share article

Article metrics