Strong Selectional Forces Fine-Tune CpG Content in Genes Involved in Neurological Disorders as Revealed by Codon Usage Patterns

Khandia, Rekha; Sharma, Anushri; Alqahtani, Taha; Alqahtani, Ali M.; Asiri, Yahya I.; Alqahtani, Saud; Alharbi, Ahmed M.; Kamal, Mohammad Amjad

doi:10.3389/fnins.2022.887929

ORIGINAL RESEARCH article

Front. Neurosci., 10 June 2022

Sec. Translational Neuroscience

Volume 16 - 2022 | https://doi.org/10.3389/fnins.2022.887929

This article is part of the Research TopicTranslational Neuroscience and Reverse Translational Neuroscience: What’s Giving Us Hope?View all 6 articles

Strong Selectional Forces Fine-Tune CpG Content in Genes Involved in Neurological Disorders as Revealed by Codon Usage Patterns

Yahya I. Asiri²

Saud Alqahtani²

Ahmed M. Alharbi³

Mohammad Amjad Kamal^4,5,6,7

¹Department of Biochemistry and Genetics, Barkatullah University, Bhopal, India
²Department of Pharmacology, College of Pharmacy, King Khalid University, Abha, Saudi Arabia
³Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
⁴Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
⁵King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
⁶Department of Pharmacy, Faculty of Allied Health Sciences, Daffodil International University, Dhaka, Bangladesh
⁷Enzymoics, Novel Global Community Educational Foundation, Hebersham, NSW, Australia

Neurodegenerative disorders cause irreversible damage to the neurons and adversely affect the quality of life. Protein misfolding and their aggregation in specific parts of the brain, mitochondrial dysfunction, calcium load, proteolytic stress, and oxidative stress are among the causes of neurodegenerative disorders. In addition, altered metabolism has been associated with neurodegeneration as evidenced by reductions in glutamine and alanine in transient global amnesia patients, higher homocysteine-cysteine disulfide, and lower methionine decline in serum urea have been observed in Alzheimer’s disease patients. Neurodegeneration thus appears to be a culmination of altered metabolism. The study’s objective is to analyze various attributes like composition, physical properties of the protein, and factors like selectional and mutational forces, influencing codon usage preferences in a panel of genes involved directly or indirectly in metabolism and contributing to neurodegeneration. Various parameters, including gene composition, dinucleotide analysis, Relative synonymous codon usage (RSCU), Codon adaptation index (CAI), neutrality and parity plots, and different protein indices, were computed and analyzed to determine the codon usage pattern and factors affecting it. The correlation of intrinsic protein properties such as the grand average of hydropathicity index (GRAVY), isoelectric point, hydrophobicity, and acidic, basic, and neutral amino acid content has been found to influence codon usage. In genes up to 800 amino acids long, the GC3 content was highly variable, while GC12 content was relatively constant. An optimum CpG content is present in genes to maintain a high expression level as required for genes involved in metabolism. Also observed was a low codon usage bias with a higher protein expression level. Compositional parameters and nucleotides at the second position of codons played essential roles in explaining the extent of bias. Overall analysis indicated that the dominance of selection pressure and compositional constraints and mutational forces shape codon usage.

Introduction

Neurodegenerative disorders are incurable and debilitating pathological conditions resulting in progressive degeneration and possible death of nerve cells. Such diseases pose a major threat to human health due to deterioration in the quality of life and premature mortality. Economic impacts also have been associated with long-term in-home caregiving. Many neurodegenerative disorders have shown an association with misfolding of proteins and their aggregation in specific brain regions (Soto, 2003). The most common neurodegenerative disorders associated with misfolded protein aggregation are Alzheimer’s disease (AD) and Parkinson’s disease (PD). Multiple lines of evidence have connected the link of Aβ and tau in AD and α-syn proteins in PD. However, it is still unclear whether the presence of abnormal proteins is the consequence of disease or its cause (Bourdenx et al., 2017). The presence of misfolded proteins and their aggregation might be attributed to the genetic mutations in genes related to the disease. Other shared pathologies between neurodegenerative diseases are mitochondrial dysfunction, glutamate toxicity, calcium load, proteolytic stress, and oxidative stress (Muddapu et al., 2020). The changes in disease-specific proteins are associated with enhanced oxidative stress, initiation of inflammatory processes, and neuronal damage (Ballard et al., 2011).

A comparative study of genome-wide gene expression data of 93 brain tissue samples obtained from patients with AD, PD, Huntington’s disease (HD), acute myeloid leukemia (AML), and multiple sclerosis revealed a high number of dysregulated genes is associated with each disorder. Still, no gene was shared across all conditions (Durrenberger et al., 2015). This finding indicates that no single shared mechanism is involved in neurodegenerative disorders. However, the results of Durrenberger et al. (2015) did not include an assessment of protein expression and post-translational modifications, which may result in misleading conclusions.

In neurodegenerative diseases, specific neuronal clusters have been found more likely to serve as the primary site for the spread of neuronal pathology (Fu et al., 2018). This vulnerable population exhibits specific morphological features, including long-range neuronal projections and extensive synaptic connections, making them selectively vulnerable due to the higher metabolic requirements for structural integrity maintenance (Pacelli et al., 2015). Muddapu et al. (2020) proposed that the pathological markers of neurodegenerative diseases, including protein misfolding, oxidative stress, and mitochondrial dysfunction, are the direct consequences of metabolic anomalies. For instance, insulin plays a role in cholesterol metabolism essential to myelination and the regulation of amyloid protein degrading enzymes (Wang et al., 2014). Insulin resistance causes an imbalance of glucose metabolism and results in hyperglycemia and oxidative stress, leading to inflammatory response and neuronal damage. Alerted levels of amino acid in the brain and serum of AD patients have been documented. Since glutamate and its metabolite gamma-aminobutyric acid (GABA) are excitatory and inhibitory neurotransmitters, respectively, we can speculate that the alterations in glutamate may adversely affect neural functioning (Esposito et al., 2013). Glutamine and alanine levels are also reduced in the blood of patients with transient global amnesia (Sancesario et al., 2013). Higher homocysteine-cysteine disulfide and lower methionine levels have been documented in the serum of AD patients. In the normal human brain, the activity of the enzyme ornithine transcarbamoylase is very low, thereby preventing the urea cycle (Bensemain et al., 2009). AD patients experience a 44% decline of urea in serum (González-Domínguez et al., 2015). All this evidence suggests the central role of metabolism malfunctioning in neurodegenerative disorders. After observing a potential connection between metabolic disturbances and neurodegeneration, we were tempted to study those metabolism-associated genes that contribute to neurodegeneration if malfunctioning. In case of clinical features associated with neurological consequences like cerebral edema, cerebellar ataxia, coma, seizures, stroke and intellectual disability along with hyperammonemia, protein avoidance, low plasma citrulline and hypoargininemia, commercially, the genetic diagnosis is available, and information regarding the genes those are involved may be obtained. Therefore, we used the information available through commercial sources and assessed a panel of 60 genes associated with neurodegeneration that are directly or indirectly involved in the metabolism or transport of metabolites in brain cells. These genes are associated with several neurodegeneration symptoms, including neurocognitive deficiencies, attention-deficit/hyperactivity disorder, developmental delays, seizures, learning disabilities, lethargy, somnolence, refusal to feed, vomiting, tachypnea, respiratory alkalosis, fatal neonatal encephalopathy with hypotonia, and many others.

All proteins are made up of amino acids, and 61 codons encode 20 amino acids. All amino acids are coded by two or more synonymous codons, excluding methionine and tryptophan. This usage of synonymous codons is not equal, and often, some of the synonymous codons are used preferably over others. This phenomenon is called codon usage bias (CUB), which is attributed to various factors, including overall compositional constraints (Deka and Chakraborty, 2014), selectional or mutational forces (Hershberg and Petrov, 2008), gene expression levels (Zhou et al., 2016), and the tRNA pool (Quax et al., 2015). The gene expression is affected by codon usage choice. The genes with higher expression levels exhibit a higher codon adaptation index (CAI), and the most abundant proteins have higher CAI values (Henry and Sharp, 2007).

Bioinformatics and biomedical research have permitted an expanded understanding of the pathobiology of neurodegenerative disorders. Thus far, little research has been conducted on the genes involved in neurodegeneration from the metabolism perspective. In the present study, the codon usage pattern of 60 relevant genes is studied to elucidate various forces (such as mutational, selectional, or compositional) acting upon them. In the present study, we calculated various indices, including parity and nucleotide skews, to determine the compositional disproportion. Neutrality, parity, ENc-GC3 curve and regression analysis between nucleotide compositions were carried out to reveal the impact of evolutionary forces. In addition, the CAI and relative synonymous codon usage (RSCU) were determined to evaluate the codon preferences. Various statistical methods have been employed to see the association between various molecular features. The analyses helped determine various molecular signatures, evolutionary forces acting on genes and codon usage patterns related to the genes involved in metabolism and neurodegeneration. The results of this study will provide insight into the factors affecting codon choices along with the expression level information of these genes.

Materials and Methods

Data Collection

The genes analyzed for neurodegenerative disorders were obtained from the NCBI Genetic Testing Registry NGS Neurodegenerative disorders Multi-Gene Panel. For neurodegenerative symptoms with evidence of disturbed metabolism (cerebral edema, cerebellar ataxia, coma, seizures, stroke, and intellectual disability along with hyperammonemia, protein avoidance, low plasma citrulline and hypoargininemia), next-generation sequencing is recommended. Laboratory of genome diagnostics, LGD-AUMC Academic Medical Center, University of Amsterdam, offers NGS for a Multi-Gene Panel for diagnostic purposes. The gene panel offered by them was taken in the present study. A total of 183 transcripts belonging to 60 genes were studied, shown in Table 1 with the function of each gene and the number of its transcripts utilized. To be utilized in the study, a transcript/coding sequence (CDS) must be in a reading frame and contain no nucleotides other than A, T, G, or C (for example, R, Y, or B representing A/G, C/T, and C/G/T respectively). In addition, the sequences were devoid of UAA, UAG, or UGA stop codons within sequences. A total of 264096 nucleotides and 88032 codons were studied.

TABLE 1

Table 1. The list of genes involved in neurodegenerative disorders with the location on the human chromosome, the disease involved, and the number of transcripts.

Nucleobase Compositional Analysis

The nucleobase composition was calculated for all the 183 CDSs. The number of A, T, C, G nucleotides present, the % of the nucleotides, and % composition at the first, second, and third codon position (A1, T1, C1, G1, A2, T2, C2, G2, A3, T3, C3, G3) were determined. Total AT% and GC%, along with AT3% and GC3%, were calculated. Calculations of %GC at the first and second place (GC12) and GC% content at the third place were also included. The %GC at different codon positions (%GC1,%GC2, and %GC3) helps decipher the relationship between the codon usage and compositional, selectional, and mutational forces (Mazumder et al., 2014). The above calculations were performed using informatics software developed by Puigbò et al. (2008) and available at http://genomes.urv.es/CAIcal/ (Supplementary Table 1).

Dinucleotide Abundance

Sixteen dinucleotides obtained from combining four nucleotides were subjected to odds ratio analysis. The odds ratios, the results of dividing the observed frequencies by expected frequencies, were calculated and presented in Table 2. The calculations were performed using DNASTAR Lasergene Inc.¹ software. The dinucleotides with an odds ratio less than 0.78 are considered underrepresented, and greater than 1.25 are considered overrepresented (Kunec and Osterrieder, 2016).

TABLE 2

Table 2. Dinucleotide analysis showing odds ratio of genes.

Relative Synonymous Codon Usage Analysis

RSCU, an index representing codon bias, is the ratio of the observed to the expected frequency of a codon coding for a particular amino acid among all synonymous codons (Deb et al., 2021). The RSCU values were obtained using informatics software developed by Puigbò et al. (2008) and available at http://genomes.urv.es/CAIcal/ (Puigbò et al., 2008). The length of the gene or amino acid composition does not affect RSCU values. Values above 1.6 and below 0.6 are considered overrepresented and underrepresented codons, respectively (Yu et al., 2021).

The Parity Rule 2 Plot Analysis

Parity analysis shows the bias between AT and GC at the 3rd codon position. AT bias (A3%/[A3% + T3%]) and GC bias (G3%/[G3% + C3%]) are plotted on the ordinate and abscissa, respectively. Under ideal conditions, as per the rule of parity, in a strand of DNA, A = T and G = C provided there is no bias among mutation and selection (Sueoka, 1995). Therefore, at the center of the plot, where the value is 0.5, the selection and mutational forces are equal (Sueoka, 1999).

Neutral Evolution Analysis

Neutrality plots are useful in quantifying mutational and other forces such as selectional forces (Khandia et al., 2019). Neutrality is derived by plotting %GC12 vs.%GC3. Here, the regression coefficient is considered the point of equilibrium between mutation and selection pressure (Sueoka, 1988). A slope value approximating 1 shows the dominance of mutational forces (Yu et al., 2021).

Codon Adaptation Index

The CAI value expresses the adaptability and expression of any gene within the organism (Munjal et al., 2020). The values of CAI range between 0 and 1. Values approaching 1 indicate the gene has higher expressivity, while CAI values near zero represent lower expressivity (Sharp and Li, 1987). The calculations were performed using informatics software developed by Puigbò et al. (2008) and available at http://genomes.urv.es/CAIcal/. The codon usage table for Homo sapiens was obtained from the codon usage database² encompassing 93487 coding sequences belonging to 40662582 codons.

Intrinsic Codon Bias Index

The intrinsic codon bias index (ICDI) is an analogous index to the Nc value and is independent of optimal codons. The values of ICDI span between 0 and 1. A value of 1 indicates extremely high bias, while a value 0 indicates equal usage of codons. Values lower than 0.3 indicate comparatively low bias (Freire-Picos et al., 1994). The ICDI values were obtained using the formula provided by Freire-Picos et al. (1994).

Nc Determination and Plotting Nc-GC3% Curve

The Nc-GC3 curve was plotted to evaluate the role of compositional constraints. Nc values, which reveal bias in codon usage, were determined using the CodonW 1.4.4 program. The lowest and the highest Nc values are 20 and 61, respectively (Munjal et al., 2020). An Nc value of 61 results from a situation when all the codons are used equally for coding amino acids, so no bias is detected. In contrast, a value of 20 results from only one codon being used among various synonymous codons, indicating the highest bias. In general, an Nc value less than 35 indicates a higher codon preference, and greater than 50 indicates random codon usage (Wang et al., 2018).

Principal Component Analysis

In principal component analysis (PCA), a multivariate statistical approach in codon usage analysis, two major axes (axis 1 and axis 2) represent the two components contributing the most variation to the data. In the PCA, the RSCU values of each sequence were distributed into a 59-dimensional vector corresponding to the 59 synonymous codons. The stop, initiation (AUG), and tryptophan (UGG) codons were excluded.

Protein Indices

Indices related to proteins were calculated using appropriate formulas available in the literature or software programs. The hydropathicity index GRAVY, with combined features of hydrophobicity or hydrophilicity, was calculated using the formula of Kyte and Doolittle (1982). Its value ranges between –2 and +2 for most proteins, with positive values indicating more hydrophobic proteins and vice versa. The AROMA value represents the distribution of aromatic amino acids (tryptophan, tyrosine, and phenylalanine) in the protein. Variation in these indices indicates the selection pressures. The PI or isoelectric point of any protein is the pH at which a protein has no net electrical charge. Hydrophobicity values were also determined as they have a role in determining protein-protein interactions (Young et al., 1994). This notion is further strengthened because few of the most extensive hydrophobic surfaces are found within the membrane proteins (Hagemans et al., 2015). The aliphatic index measures the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and leucine) and was calculated by the formula given by Ikai (1980). The aliphatic index is positively correlated with the thermal stability of the protein. The % acidic, basic, and neutral amino acids were also calculated. All the indices were calculated using ExpasyProtparam tool (Gasteiger et al., 2005) or Peptide 2.0 tool available at https://www.peptide2.com/.

Statistical Analysis

Correlation, regression, and correspondence analyses were performed and plotted in PAST4 software. Basic calculations, including sums and averages, were done in Microsoft Windows Excel.

Results

Compositional Analysis

The genes involved in neurodegeneration exhibited a widely variable compositional pattern. The % nucleotide composition of all four nucleotides is given in Figure 1. The %A ranged from 13.74% (IDUA) to 31.12% (DBT), %C ranged from 16.62 (DLD) to 38.98% (IDUA),%T ranged between 14.80% (IDUA) to 33.92%, and %G ranged between 19.57% (LMBRD1) and 33.91% (CBS). The nucleotide component C demonstrated the greatest range of variability (22.35%), while nucleotide G had the least (14.34%). Stem graph (Figure 1) showing the overall% of all the 4 nucleotides in different genes. Overall, %GC varied between 36.93% (LMBRD1) and 71.45% (CBS), while GC3 varied between 26.82% (LMBRD1) and 89.14% (CBS).

FIGURE 1

Figure 1. Stem graph for nucleotide composition of genes envisaged in present study.

GC Content Correlation With Protein Length

GC components (%GC12 and %GC3) were analyzed to determine their relationships with protein length. It was observed that, on average, %GC12 content was relatively constant and contributed to 40%-60% composition, while %GC3 content largely fluctuated towards both low and high values along the length of protein (Figure 2).

FIGURE 2

Figure 2. Relation of GC content GC12 and GC3 with the length of the protein. Red dots are GC12 composition and blue dots are GC3 position.

Relationship of Compositional Properties and Codon Bias

A significant positive correlation was observed between Nc and different nucleotides. C1, A2, and G3 (r = –0.197, p < 0.05, r = –0.243 p < 0.01, r = –0.198 p < 0.01 respectively) were negatively correlated with Nc, while T, T1, T2, and C2 were positively correlated (r = 0.0.165, p < 0.05, r = 0.156 p < 0.05, r = 0.237, p < 0.01, r = 0.286 p < 0.001 respectively).

Relationship Between Codon Usage Bias and Nucleotides at the Third Codon Position

Regression coefficients 0.547, 0.555, –0.515, and –0.482 were obtained for Nc-A3, Nc-T3, Nc-G3, and Nc-C3, respectively. The negative regression coefficient between Nc-G3 and Nc-C3 suggests a positive influence of C3 and G3 on CUB (Figure 3).

FIGURE 3

Figure 3. Regression analysis between the Nc and third position of codon.

Relationship Between Overall Composition and Composition at the Third Position of the Codon

Regression analysis between the overall nucleotide content (%A, %T, %C, %G) and their respective 3rd position of the codon shows the effects of compositional properties on mutational force (Figure 4). The nucleotides C and A contributed almost equally, 49.3 and 48.7%, while G contributed the least (37.6%) and toward mutational pressure.

FIGURE 4

Figure 4. Regression analysis for overall nucleotide content and nucleotide content at third position.

Effects of Dinucleotide Content

The frequency of dinucleotide occurrence is of great interest since some of the dinucleotide combinations significantly deviate from the expected value. The measure of this deviation is calculated as an odds ratio (frequency observed/expected). The CpG and TpA dinucleotide combinations are commonly underrepresented dinucleotides (Venter et al., 2001). Notably, we found that 23.33% of genes displayed unbiased TpA usage with an odds ratio of more than 0.78. In fact, in the IUDA gene, TpA was overrepresented (odds ratio 1.54). Nucleotide composition is also reported to play an important role in deciding the TpA or CpG content (Munjal et al., 2020); however, we report an exception to this as, despite lower GC content, the CpG dinucleotide exhibited its presence unbiased in GCDH and MAN2B1 genes. Similarly, despite low AT content, the TpA dinucleotide was present in an unbiased manner LMBRD1, MMAA, MOCS2 genes.

Relative Synonymous Codon Usage Analysis

The RSCU value indicates the relative frequency of the codon. The codons with RSCU values above 1.6 and below 0.6 are overrepresented and underrepresented, respectively (Paul et al., 2018). Across the genes, GC ending codons had an RSCU value below 0.6 (TCG, CCG, ACG, and GCG codons are unrepresented with RSCU values below 0.6 in 81.66, 70, 68, and 72% of sequences, respectively). TA ending codons TTA, CTA, ATA, and GTA were also underrepresented with RSCU values below 0.6 in 71.67, 76.67, 71.67, and 68.33% of sequences, respectively. Among all codons, CTG and GTG had maximum RSCU values (average of 2.55 and 2.01, respectively) with the highest RSCU values of 4.73 and 3.27 for CTG and GTG, respectively. Table 3 lists the RSCU values of the genes analyzed in this study, highlighting the codons with the highest RSCUs for each amino acid.

TABLE 3

Table 3. RSCU values of genes highlighting the codons with highest RSCU values.

Analysis of Rare Codons

The codons with a frequency of occurrence below 1% were considered rare codons. Codons TTA, CTA, ATA, GTA, TCG, CCG, ACG, GCG, CGT, CGC, CGA, and TGT were rare codons (Figure 5). The codon usage is tissue-specific and in the brain, the codon usage might be different from other tissues. We adapted per million frequencies of codon bias and per million frequencies of the codon counts multiplied by the respective expression; the value would be called codonome bias hereafter in the human brain dataset from the works of Piovesan et al. (2013). A very high statistically significant positive correlation (r = 0.665, p < 0.001) between codon bias for genes envisaged and codonome bias for brain-specific genes was observed, revealing the presence of tissue and cell-specific selective pressure on gene-related to neurodegeneration with metabolic consequences in present study. Changes in the expression profile of isoacceptor tRNAs might result from adaptation and selection to changes in transcriptome codon usage (Dittmar et al., 2006). Also, tRNA-Arg-TCT enrichment is present and it is suggestive of a tissue-specific role of tRNA in translation (Torres et al., 2019).

FIGURE 5

Figure 5. Rare codons for the neurodegeneration associated gene transcripts. The “rare codon” was defined by calculating the frequency of occurrence of all codons in coding sequences (threshold selected <1% viz. less than 10 in 1,000).

The Parity Rule 2 Plot Analysis

As per the parity rule, in the absence of mutational force or selection force on a gene’s codon usage, the base content follows Chargaff’s rule: A = T and G = C. The A3%, T3%, C3%, and G3% (nucleotide content at the third position of the codon) were calculated to determine the A3/(A3 + U3) serving as AT bias and G3/(G3 + C3) serving as GC bias. When A3/(A3 + U3) is plotted against G3/(G3 + C3) on the abscissa and the ordinate, respectively, if the values are 0.5, then all the data will be located in the center (Young et al., 1994). However, the codon usage is generally governed by either of these or both the selection and mutation pressure and other forces such as compositional pressure. In the present study, the results (Figure 6) show that the average position of x = 0.449 ± 0.152 (AT bias) and y = 0.511 ± 0.054 (GC bias). Hence T is preferred over A, and G is preferred over C.

FIGURE 6

Figure 6. Parity plot generated using A3/(A3 + U3) as abscissa and G3/(G3 + C3) as the ordinate. The plot exhibit the preference of T over A and G over C.

Degree of Codon Bias

The Nc value determines the degree of bias. The higher the Nc, the lower the bias. A gene with an Nc value less than 35 generally has a strong codon bias (Sheikh et al., 2020), while the gene with an Nc value of greater than 50 has a random choice of codon, indicating the least codon bias. Nc values between 35 and 50 demonstrate moderate bias (Wang et al., 2018). In the present study, the Nc value ranged between 33.9 and 59.9, indicating that the genes exhibited a wider range of codon bias. An Nc < 35 was observed for 2.17% of transcripts, representing a very high bias. Furthermore, 40.21% had moderate bias (Nc 35-50), and 56.60% of transcripts displayed low bias (Nc > 50).

The Effect of Compositional Constraint in Shaping Codon Usage

The Nc-GC3 curve was plotted to elucidate further the effects of mutational, selectional, or compositional constraints on codon usage. If the codon usage was solely driven by %GC content present at the third position of the codon, then all the data points will lie on the GC3 curve. If a gene is subjected to translational selection, the data points will lie well below the expected curve (Sablok et al., 2011). In the present case, all low Nc points were well below the curve, suggesting forces other than compositional constraints like selectional forces affected codon usage (Figure 7).

FIGURE 7

Figure 7. The relationship between compositional constraint and codon usage bias. ENc-GC3 curve indicates the presence of selection and mutational forces on codon bias of genes. Data points far from the standard curve indicate action of forces other than compositional constraints acting on codon usage choices.

Neutrality Analysis for Quantitation of Mutation and Selection Pressure

Although the Nc-GC3 plot can demonstrate the key factors responsible for shaping codon bias, it cannot quantitate the directional pressure and selection pressure. For the same neutrality plot is required, constructed using the %GC3 and %GC12 content of genes. The GC3 content of the genes varied from 27.7 to 89.6%, while GC1 and GC2 varied from 41.7–89% to 30.1–76.8%, respectively. The linear regression model of GC12% on GC3% indicated (GC12%) = 0.1732 (GC3%) + 39.18 with R² = 0.315. It suggests that a 31.5% variance in GC12 is introduced by GC3. The regression coefficient is 17.25%, indicating that mutational forces contribute 17.32%, while selection and other factors contribute 82.68%. The results indicate that selection pressures are dominant over mutational forces (Figure 8).

FIGURE 8

Figure 8. Neutrality plot analysis revealed 17.25% mutational forces and 82.68% selection forces acting on 60 genes related to neurodegenerative disorders.

Effect of Mutational Forces on Gene Expression

The CAI is an index for gene expressivity. The higher the CAI value, the greater the expressivity of the gene (Kumar et al., 2021). CAI and %A3, %T3, %C3, and %G3 were analyzed by regression analysis to determine the effect of mutational forces on gene expression. The results indicate that for all four nucleotides, an almost straight line is obtained as a regression curve (regression coefficients of –0.0047, –0.0045, 0.004, and 0.004 for A3, T3, C3, and G3, respectively). These results indicate that substantially fewer mutational forces were at work, and selectional forces majorly influence gene expression.

Intrinsic Codon Bias Index

An ICDI estimates codon bias where optimal codons are unknown (Rodríguez-Belmonte et al., 1996). The average ICDI value was 0.182, and vales were well correlated with Nc values. According to Freire-Picos et al. (1994), if the ICDI value is greater than 0.5, there is a high level of bias. We found generally low levels of bias. The lowest ICDI value was 0.05 for 5-methyltetrahydrofolate-homocysteine methyltransferase (MTR) transcript variant 2, while the highest was 0.501 for solute carrier family 6 member 19 (SLC6A19). However, the average ICDI was 0.182 ± 0.008, indicating a generally low bias.

The Interrelationship Between Codon Usage Bias and Properties of Protein

Protein indices including protein length, GRAVY, AROMO, PI, hydrophobicity index, aliphatic index, instability index, and percent acidic, basic, and neutral amino acids were estimated and subjected to correlation analysis (Barbhuiya et al., 2020). The correlation analysis between the codon usage bias and various protein indices revealed that Nc was positively associated with PI (r = 0.176; p < 0.05), acidic (r = 0.216; p < 0.01), and basic (r = 0.347; p < 0.0001) protein residues. The Nc was negatively correlated with GRAVY (r = –0.173; p < 0.05), hydropathicity (r = –0.220; p < 0.01) and neutral amino acids (r = 0.146; p < 0.05). Nc and protein length did not correlate to codon usage bias.

Principal Component Analysis

The biplot arrows indicate the preferred codons from each sequence. The farthest vector (codon) has the maximum effect on PC. The eclipse enclosed the sequences with 95% confidence based on PCA on biplots (Figure 9). The dots are the PCA score of the sequences. PCBD1 and PTS genes were present outside of 95% confidence eclipse. The scree plot revealed that PC1 captured 51.56% of the variation, and PC2 captured 6.13%.

FIGURE 9

Figure 9. A biplot depiction of PCA. Dots are representing the sequences, whilst arrows are showing codons. Eclipse showing a 95% confidence limit.

Discussion

In the present study, the codon usage pattern was analyzed for 183 transcripts belonging to 60 genes involved in neurodegeneration with metabolic disturbances. Initial overall nucleotide composition analysis revealed a random pattern for A, T, C, and G nucleotide usage in the sequences. Variation was observed at total nucleotide composition as well as GC composition. The nucleotide component C had a maximum range of variability of 22.35%, while nucleotide G had the least (14.34%). The GC3 component showed the most variation of all positions, possibly due to the degeneracy at the third codon position.

Dinucleotide frequency significantly influences codon usage bias and can be considered a genetic signature for a species (De Amicis and Marchetti, 2000). Underrepresentation of the TpA dinucleotide has been reported in multiple studies. As TpA is more susceptible to degradation by cellular RNases, owing to its mRNA destabilizing effect, contribution to stop codons (TAA and TAG) (Khandia et al., 2019), and selectional forces that tend to keep the TpA content low. CpG dinucleotides are also predisposed to mutations by deamination of 5-methylcytosine at CpG sites resulting in C?T changes. The dinucleotide CpG is approximately 42 times more mutable than predicted from random mutation; however, the exact proportion cannot be estimated due to the variable degree of methylation of cytosine in vertebrates (Cooper and Youssoufian, 1988). For this reason, the expression constructs for protein expression for protein production and gene therapy are designed in a way to avoid CpG (Bauer et al., 2010).

Despite selection forces acting against the dinucleotide pair TpA and CpG, our study found an unbiased representation of the TpA and CpG dinucleotides in 18.33 and 23.33% of sequences, respectively. In a study related to a humanized green fluorescent protein, consideration was given to 60 CpG residues within the coding region. Expression of a detectable amount of protein was decreased with decreasing number of CpG and was independent of the promoter used. A similar experiment reported that CpG depleted mRNA was decreased fivefold in the nucleus and eightfold in the cytoplasm. A decrease in the GFP reporter expression associated with CpG depletion was more related to a decline in the copy number of mRNA than translational efficiency, and the effect was gene-independent (Bauer et al., 2010). This result indicates that intragenic CpG influences de novo transcriptional activity. Such experimental evidence implies that although CpG tends to mutate faster than other nucleotide combinations and makes the gene vulnerable to loss of function, it is still essential for optimal gene expression; hence, a fine-tuned balance is needed to achieve optimal protein expression. Apart from de novo transcription, CpG has a role in the stability of RNA transcripts, and with an increasing proportion of CpG, mRNA stability and subsequent protein expression also increase (Duan and Antezana, 2003).

It is evident that all the genes related to metabolism or metabolite transport need to be highly expressed in the cells to meet metabolism and metabolite transport demands. Despite their tendency towards mutation and loss of function, the genes tended to retain CpG at an adequate level, explaining well the unbiased usage of CpG in our study and underscoring the selectional forces that keep the CpG at a certain level to maintain high expression.

A BRCA1 or BRCA2 mutant chicken DT40 cell line model for spontaneous mutation exhibited an 11 times higher likelihood of NCG to NTG mutation relative to the mean mutation rate (Zámborszky et al., 2017). In our study, we found strikingly high RSCU for CTG (RSCU > 1.6; highest 4.73) and GTG (RSCU > 1.6; highest 3.28) codons in some genes with over-representation of CTG and GTG in 78.33 and 68.33% of genes, respectively. This finding correlated well with the transition of CpG dinucleotide to TpG. It is further strengthened by the fact that the predecessors of CTG and GTG (codons CCG and GCG) were over exhibited only in 3.26 and 4.34% of coding sequences but underrepresented for both CCG and GCG in 85.86 and 80.97% of genes. Thus, underrepresentation can be attributed to the conversion of CCG and GCG to CTG and GTG, culminating in CTG and GTG overrepresentation.

Several factors affect the biased codon choices, including genetic drift, mutation pressure, natural selection, composition, secondary protein motifs, protein’s physical properties, transcriptional factors, and external environment, tRNA abundance etc. (Ikemura, 1981). However, natural selection, mutation pressure with genetic drift (Chen et al., 2014; LaBella et al., 2019), and compositional constraints (Jia et al., 2015) are major factors. In addition, various analyses like parity, neutrality, ENc-GC3 analysis, and abundance of specific codons and dinucleotides suggest the presence of selection as a significant force and mutation force. Investigation of the role of compositional properties on codon bias revealed that three out of four nucleotides at the second position of the codon significantly impact the bias (A2, C2, and T2). The A2 nucleotides negatively correlated with Nc, while C2 and T2 were positively correlated. This result could be explained by Saier (2019) work, who explained that the second nucleotide position is the most important in determining the nature of the genetic code. Overall based on our analyses, it can be inferred that selection, mutation and composition are the forces that might be responsible for shaping codon usage. In living organisms, the GC content ranges from approximately 20% GC to 80% GC. Upon plotting the GC content variation at three codon positions against overall GC content, there appeared a positive correlation between GC content in a codon with a total GC content of the genome; however, the steepness of the slope differed with a rank of third, first, and second codon positions (Muto and Osawa, 1987). Since the mutations are random, the advantageous one will be selected. The constraints affecting the mutation are highest on codon position two, while least on codon position three. This observation can be attributed to the fact that the second position of the codon specifies the type of amino acid, while the first one specifies a specific amino acid. The third position is redundant since several bases specify an amino acid. How position two of the codon specifies the type of amino acid can be understood by the example of when T, A, and C are present at the second position: all resulting amino acids are hydrophobic, hydrophilic, and semipolar. The only exception is G; when it is present at the second position, similar to C at the second position, it results in semipolar amino acids with two exceptions (arginine, a strongly hydrophilic amino acid, and UGA, a stop codon).

A regression analysis between the Nc with the nucleotide content present at the third position of the codon revealed a positive association with %A3 and %T3 and a negative correlation with %C3 and %G3. The %T3 had the highest regression coefficient and a positive correlation. Parity plot analysis revealed that T was preferred over A, and G is preferred over C. The disproportionate usage of these nucleotides suggests the natural selection of codon usage bias of genes (Uddin and Chakraborty, 2016) linked to neurodegeneration. Neutrality analysis indicates the dominance of selection and other forces, such as compositional, in shaping codon usage. Mutational force only contributed 17.32 and 31.5% variance in GC12 was attributed to GC3. Similar to Nc, ICDI is also a parameter to evaluate the codon usage bias, and its value ranges between 0 and 1. Higher ICDI values (toward 1) indicate the highest codon usage bias. In the present study, the average ICDI value was 0.182 ± 0.008, indicating a generally low bias.

Nc analysis revealed a relatively low bias in codon usage. This finding was coupled with the fact that these genes were highly expressed with high CAI values. The CAI value quantifies the synonymous codon usage bias for a DNA or RNA sequence (and the codon usage similarities between the gene and a reference set). High CAI suggests a very high selectional force on a gene to selectively use a codon contributing to high-level protein expression (Sharp and Li, 1987; Puigbò et al., 2007). The genes with higher CAI values tend to utilize more optimal codons. The CAI value varied between 0.885 and 0.71. In E. coli, the highest CAI (0.84) has been reported for the rplL gene encoding ribosomal protein L7/12, one of the most abundant proteins present in the species (DiRienzo and Inouye, 1979). In the present study, all the genes had high CAI values, indicating higher expression of genes and the importance of these genes in physiological functions. Upon regression the CAI values to the nucleotide composition present at the third position of the codon, a very low regression coefficient indicated that mutational forces were not affected by the gene expression and expression was driven mainly by selectional forces.

Conclusion

The present study explored the codon usage pattern and various forces applied on 183 transcripts of 60 genes involved in neurodegeneration associated with metabolic ailments. Analyses revealed a random pattern of the overall composition of the four standard nucleotides, and nucleotide C had a maximum range of variability of 22.35% in terms of total nucleotide components. Across the protein length, up to 800 amino acids, with increasing length, GC12 remained constant while GC3 fluctuated widely. The overall codon usage bias was low with higher Nc values and low ICDI. An investigation into the effects of compositional parameters on codon usage bias revealed that the second position of a codon is critical, as determined by a significant correlation of A2, C2, and T2 with Nc (p > 0.001). The genes were highly expressed, evidenced by their very high CAI values. This higher expression shows their involvement in critical physiological processes. Other parameters such as neutrality analysis, parity plot, and Nc-GC3 curve indicated the dominance of selection pressure along with the presence of compositional and mutational constraints. The transcripts exhibited under-representation of dinucleotides TpA and CpG due to selectional pressure. However, unbiased representation (odds ratio > 0.78) of TpA and CpG dinucleotides was observed in 18.33 and 23.33% of genes. These dinucleotides are important as part of regulatory elements (TATA box, stop codons, polyadenylation signal) in the thermodynamic stability of mRNA and missense mutations. The unbiased representation of these dinucleotides suggests selectional forces finely tune the CpG content level to obtain the optimum rate of protein expression for the high demand of these metabolism and metabolite transfer-related genes. The loss of this fine-tuning leads to neurological ailments. Notably, we observed very high RSCU values for CTG and GTG codons resulting from the transition of C to T. This observation indicates the mutational forces move forward to eliminate the CpG and selection pressure in the reverse direction to maintain high protein expression and this critical balance fine tune the CpG content in genes associated with neurodegeneration.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Ethics Statement

Ethical review and approval were not required for the study on human participants in accordance with the local legislation and institutional requirements.

Author Contributions

RK: conceptualization. RK and AS: methodology and writing—original draft preparation. RK, TA, AA, YA, SA, and AMA: validation. RK, TA, and AA: formal analysis. RK, AS, TA, AA, YA, SA, AMA, and MK: writing—review, editing, and visualization. MK: final validation. All authors have read and agreed to the published version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We thank their respective universities for providing necessary facilities to conduct the work. We are grateful to the Deanship of Scientific Research at King Khalid University for funding this study through the Large Research Group Project, under grant number RGP. 2/100/43.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins.2022.887929/full#supplementary-material

Supplementary Table 1 | Nucleotide composition and composition at various codon positions for genes involved in neurodegeneration.

Footnotes

References

Ballard, C., Gauthier, S., Corbett, A., Brayne, C., Aarsland, D., and Jones, E. (2011). Alzheimer’s disease. Lancet 377, 1019–1031. doi: 10.1016/S0140-6736(10)61349-9

CrossRef Full Text | Google Scholar

Barbhuiya, P. A., Uddin, A., and Chakraborty, S. (2020). Analysis of compositional properties and codon usage bias of mitochondrial CYB gene in anura, urodela and gymnophiona. Gene 751:144762. doi: 10.1016/j.gene.2020.144762

PubMed Abstract | CrossRef Full Text | Google Scholar

Bauer, A. P., Leikam, D., Krinner, S., Notka, F., Ludwig, C., Längst, G., et al. (2010). The impact of intragenic CpG content on gene expression. Nucleic Acids Res. 38, 3891–3908. doi: 10.1093/nar/gkq115

PubMed Abstract | CrossRef Full Text | Google Scholar

Bensemain, F., Hot, D., Ferreira, S., Dumont, J., Bombois, S., Maurage, C.-A., et al. (2009). Evidence for induction of the ornithine transcarbamylase expression in Alzheimer’s disease. Mol. Psychiatry 14, 106–116. doi: 10.1038/sj.mp.4002089

PubMed Abstract | CrossRef Full Text | Google Scholar

Bourdenx, M., Koulakiotis, N. S., Sanoudou, D., Bezard, E., Dehay, B., and Tsarbopoulos, A. (2017). Protein aggregation and neurodegeneration in prototypical neurodegenerative diseases: examples of amyloidopathies, tauopathies and synucleinopathies. Prog. Neurobiol. 155, 171–193. doi: 10.1016/j.pneurobio.2015.07.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, H., Sun, S., Norenburg, J. L., and Sundberg, P. (2014). Mutation and selection cause codon usage and bias in mitochondrial genomes of ribbon worms (Nemertea). PLoS One 9:e85631. doi: 10.1371/journal.pone.0085631

PubMed Abstract | CrossRef Full Text | Google Scholar

Cooper, D. N., and Youssoufian, H. (1988). The CpG dinucleotide and human genetic disease. Hum. Genet. 78, 151–155. doi: 10.1007/BF00278187

PubMed Abstract | CrossRef Full Text | Google Scholar

De Amicis, F., and Marchetti, S. (2000). Intercodon dinucleotides affect codon choice in plant genes. Nucleic Acids Res. 28, 3339–3345. doi: 10.1093/nar/28.17.3339

PubMed Abstract | CrossRef Full Text | Google Scholar

Deb, B., Uddin, A., and Chakraborty, S. (2021). Analysis of codon usage of horseshoe bat hepatitis B virus and its host. Virology 561, 69–79. doi: 10.1016/j.virol.2021.05.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Deka, H., and Chakraborty, S. (2014). Compositional constraint is the key force in shaping codon usage bias in hemagglutinin gene in H1N1 subtype of Influenza A Virus. Int. J. Genomics 2014:349139. doi: 10.1155/2014/349139

PubMed Abstract | CrossRef Full Text | Google Scholar

DiRienzo, J. M., and Inouye, M. (1979). Lipid fluidity-dependent biosynthesis and assembly of the outer membrane proteins of E. coli. Cell 17, 155–161. doi: 10.1016/0092-8674(79)90303-9

CrossRef Full Text | Google Scholar

Dittmar, K. A., Goodenbour, J. M., and Pan, T. (2006). Tissue-specific differences in human transfer RNA expression. PLoS Genet. 2:e221. doi: 10.1371/journal.pgen.0020221

PubMed Abstract | CrossRef Full Text | Google Scholar

Duan, J., and Antezana, M. A. (2003). Mammalian mutation pressure, synonymous codon choice, and mRNA degradation. J. Mol. Evol. 57, 694–701. doi: 10.1007/s00239-003-2519-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Durrenberger, P. F., Fernando, F. S., Kashefi, S. N., Bonnert, T. P., Seilhean, D., Nait-Oumesmar, B., et al. (2015). Common mechanisms in neurodegeneration and neuroinflammation: a brainnet Europe gene expression microarray study. J. Neural. Transm. 122, 1055–1068. doi: 10.1007/s00702-014-1293-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Esposito, Z., Belli, L., Toniolo, S., Sancesario, G., Bianconi, C., and Martorana, A. (2013). Amyloid β, glutamate, excitotoxicity in Alzheimer’s disease: are we on the right track? CNS Neurosci. Ther. 19, 549–555. doi: 10.1111/cns.12095

PubMed Abstract | CrossRef Full Text | Google Scholar

Freire-Picos, M. A., González-Siso, M. I., Rodríguez-Belmonte, E., Rodríguez-Torres, A. M., Ramil, E., and Cerdán, M. E. (1994). Codon usage in Kluyveromyces lactis and in yeast cytochrome c-encoding genes. Gene 139, 43–49. doi: 10.1016/0378-1119(94)90521-5

CrossRef Full Text | Google Scholar

Fu, H., Hardy, J., and Duff, K. E. (2018). Selective vulnerability in neurodegenerative diseases. Nat. Neurosci. 21, 1350–1358. doi: 10.1038/s41593-018-0221-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M. R., Appel, R. D., et al. (2005). “Protein identification and analysis tools on the ExPASy server,” in The Proteomics Protocols Handbook, ed. J. M. Walker (Totowa, NJ: Humana Press). 571–607. doi: 10.1385/1-59259-890-0:571

CrossRef Full Text | Google Scholar

González-Domínguez, R., García-Barrera, T., and Gómez-Ariza, J. L. (2015). Metabolite profiling for the identification of altered metabolic pathways in Alzheimer’s disease. J. Pharm. Biomed. Anal. 107, 75–81. doi: 10.1016/j.jpba.2014.10.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Hagemans, D., van Belzen, I. A. E. M., Morán Luengo, T., and Rüdiger, S. G. D. (2015). A script to highlight hydrophobicity and charge on protein surfaces. Front. Mol. Biosci. 2:56. doi: 10.3389/fmolb.2015.00056

PubMed Abstract | CrossRef Full Text | Google Scholar

Henry, I., and Sharp, P. M. (2007). Predicting gene expression level from codon usage bias. Mol. Biol. Evol. 24, 10–12. doi: 10.1093/molbev/msl148

PubMed Abstract | CrossRef Full Text | Google Scholar

Hershberg, R., and Petrov, D. A. (2008). Selection on codon bias. Annu. Rev. Genet. 42, 287–299. doi: 10.1146/annurev.genet.42.110807.091442

PubMed Abstract | CrossRef Full Text | Google Scholar

Ikai, A. (1980). Thermostability and aliphatic index of globular proteins. J. Biochem. 88, 1895–1898.

PubMed Abstract | Google Scholar

Ikemura, T. (1981). Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151, 389–409. doi: 10.1016/0022-2836(81)90003-6

CrossRef Full Text | Google Scholar

Jia, X., Liu, S., Zheng, H., Li, B., Qi, Q., Wei, L., et al. (2015). Non-uniqueness of factors constraint on the codon usage in Bombyx mori. BMC Genomics 16:356. doi: 10.1186/s12864-015-1596-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Khandia, R., Singhal, S., Kumar, U., Ansari, A., Tiwari, R., Dhama, K., et al. (2019). Analysis of Nipah virus codon usage and adaptation to hosts. Front. Microbiol. 10:886. doi: 10.3389/fmicb.2019.00886

PubMed Abstract | CrossRef Full Text | Google Scholar

Kumar, U., Khandia, R., Singhal, S., Puranik, N., Tripathi, M., Pateriya, A. K., et al. (2021). Insight into codon utilization pattern of tumor suppressor gene EPB41L3 from different mammalian species indicates dominant role of selection force. Cancers 13:2739. doi: 10.3390/cancers13112739

PubMed Abstract | CrossRef Full Text | Google Scholar

Kunec, D., and Osterrieder, N. (2016). Codon pair bias is a direct consequence of dinucleotide bias. Cell Rep. 14, 55–67. doi: 10.1016/j.celrep.2015.12.011

PubMed Abstract | CrossRef Full Text | Google Scholar

Kyte, J., and Doolittle, R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132. doi: 10.1016/0022-2836(82)90515-0

CrossRef Full Text | Google Scholar

LaBella, A. L., Opulente, D. A., Steenwyk, J. L., Hittinger, C. T., and Rokas, A. (2019). Variation and selection on codon usage bias across an entire subphylum. PLoS Genet. 15:e1008304. doi: 10.1371/journal.pgen.1008304

PubMed Abstract | CrossRef Full Text | Google Scholar

Mazumder, T. H., Chakraborty, S., and Paul, P. (2014). A cross talk between codon usage bias in human oncogenes. Bioinformation 10, 256–262. doi: 10.6026/97320630010256

PubMed Abstract | CrossRef Full Text | Google Scholar

Muddapu, V. R., Dharshini, S. A. P., Chakravarthy, V. S., and Gromiha, M. M. (2020). Neurodegenerative diseases - is metabolic deficiency the root cause? Front. Neurosci. 14:213. doi: 10.3389/fnins.2020.00213

PubMed Abstract | CrossRef Full Text | Google Scholar

Munjal, A., Khandia, R., Shende, K. K., and Das, J. (2020). Mycobacterium lepromatosis genome exhibits unusually high CpG dinucleotide content and selection is key force in shaping codon usage. Infect. Genet. Evol. 84:104399. doi: 10.1016/j.meegid.2020.104399

PubMed Abstract | CrossRef Full Text | Google Scholar

Muto, A., and Osawa, S. (1987). The guanine and cytosine content of genomic DNA and bacterial evolution. Proc. Natl. Acad. Sci. U.S.A 84, 166–169. doi: 10.1073/pnas.84.1.166

PubMed Abstract | CrossRef Full Text | Google Scholar

Pacelli, C., Giguère, N., Bourque, M.-J., Lévesque, M., Slack, R. S., and Trudeau, L. -É (2015). Elevated mitochondrial bioenergetics and axonal arborization size are key contributors to the vulnerability of dopamine neurons. Curr. Biol. 25, 2349–2360. doi: 10.1016/j.cub.2015.07.050

PubMed Abstract | CrossRef Full Text | Google Scholar

Paul, P., Malakar, A. K., and Chakraborty, S. (2018). Codon usage and amino acid usage influence genes expression level. Genetica 146, 53–63. doi: 10.1007/s10709-017-9996-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Piovesan, A., Vitale, L., Pelleri, M. C., and Strippoli, P. (2013). Universal tight correlation of codon bias and pool of RNA codons (codonome): the genome is optimized to allow any distribution of gene expression values in the transcriptome from bacteria to humans. Genomics 101, 282–289. doi: 10.1016/j.ygeno.2013.02.009

PubMed Abstract | CrossRef Full Text | Google Scholar

Puigbò, P., Bravo, I. G., and Garcia-Vallve, S. (2008). CAIcal: a combined set of tools to assess codon usage adaptation. Biol. Direct 3:38. doi: 10.1186/1745-6150-3-38

PubMed Abstract | CrossRef Full Text | Google Scholar

Puigbò, P., Guzmán, E., Romeu, A., and Garcia-Vallvé, S. (2007). OPTIMIZER: a web server for optimizing the codon usage of DNA sequences. Nucleic Acids Res. 35, W126–W131. doi: 10.1093/nar/gkm219

PubMed Abstract | CrossRef Full Text | Google Scholar

Quax, T. E. F., Claassens, N. J., Söll, D., and van der Oost, J. (2015). Codon bias as a means to fine-tune gene expression. Mol. Cell 59, 149–161. doi: 10.1016/j.molcel.2015.05.035

PubMed Abstract | CrossRef Full Text | Google Scholar

Rodríguez-Belmonte, E., Freire-Picos, M. A., Rodríguez-Torres, A. M., González-Siso, M. I., Cerdán, M. E., and Rodríguez-Seijo, J. M. (1996). PICDI, a simple program for codon bias calculation. Mol. Biotechnol. 5, 191–195. doi: 10.1007/BF02900357

PubMed Abstract | CrossRef Full Text | Google Scholar

Sablok, G., Nayak, K. C., Vazquez, F., and Tatarinova, T. V. (2011). Synonymous codon usage, GC(3), and evolutionary patterns across plastomes of three pooid model species: emerging grass genome models for monocots. Mol. Biotechnol. 49, 116–128. doi: 10.1007/s12033-011-9383-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Sancesario, G., Esposito, Z., Mozzi, A. F., Sancesario, G. M., Martorana, A., Giordano, A., et al. (2013). Transient global amnesia: linked to a systemic disorder of amino acid catabolism? J. Neurol. 260, 1429–1432. doi: 10.1007/s00415-013-6927-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Sharp, P. M., and Li, W. H. (1987). The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295. doi: 10.1093/nar/15.3.1281

PubMed Abstract | CrossRef Full Text | Google Scholar

Sheikh, A., Al-Taher, A., Al-Nazawi, M., Al-Mubarak, A. I., and Kandeel, M. (2020). Analysis of preferred codon usage in the coronavirus N genes and their implications for genome evolution and vaccine design. J. Virol. Methods 277:113806. doi: 10.1016/j.jviromet.2019.113806

PubMed Abstract | CrossRef Full Text | Google Scholar

Soto, C. (2003). Unfolding the role of protein misfolding in neurodegenerative diseases. Nat. Rev. Neurosci. 4, 49–60. doi: 10.1038/nrn1007

PubMed Abstract | CrossRef Full Text | Google Scholar

Sueoka, N. (1988). Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. U.S.A 85, 2653–2657. doi: 10.1073/pnas.85.8.2653

PubMed Abstract | CrossRef Full Text | Google Scholar

Sueoka, N. (1995). Intrastrand parity rules of DNA base composition and usage biases of synonymous codons. J. Mol. Evol. 40, 318–325. doi: 10.1007/BF00163236

PubMed Abstract | CrossRef Full Text | Google Scholar

Sueoka, N. (1999). Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 238, 53–58. doi: 10.1016/s0378-1119(99)00320-0

CrossRef Full Text | Google Scholar

Torres, A. G., Reina, O., Stephan-Otto Attolini, C., and Ribas de Pouplana, L. (2019). Differential expression of human tRNA genes drives the abundance of tRNA-derived fragments. Proc. Natl. Acad. Sci. U.S.A 116, 8451–8456. doi: 10.1073/pnas.1821120116

PubMed Abstract | CrossRef Full Text | Google Scholar

Uddin, A., and Chakraborty, S. (2016). Codon usage trend in mitochondrial CYB gene. Gene 586, 105–114. doi: 10.1016/j.gene.2016.04.005

PubMed Abstract | CrossRef Full Text | Google Scholar

Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., et al. (2001). The sequence of the human genome. Science 291, 1304–1351. doi: 10.1126/science.1058040

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, L., Xing, H., Yuan, Y., Wang, X., Saeed, M., Tao, J., et al. (2018). Genome-wide analysis of codon usage bias in four sequenced cotton species. PLoS One 13:e0194372. doi: 10.1371/journal.pone.0194372

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, X., Yu, S., Gao, S.-J., Hu, J.-P., Wang, Y., and Liu, H.-X. (2014). Insulin inhibits Abeta production through modulation of APP processing in a cellular model of Alzheimer’s disease. Neuro Endocrinol. Lett. 35, 224–229.

PubMed Abstract | Google Scholar

Young, L., Jernigan, R. L., and Covell, D. G. (1994). A role for surface hydrophobicity in protein-protein recognition. Protein Sci. 3, 717–729. doi: 10.1002/pro.5560030501

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, X., Liu, J., Li, H., Liu, B., Zhao, B., and Ning, Z. (2021). Comprehensive analysis of synonymous codon usage patterns and influencing factors of porcine epidemic diarrhea virus. Arch. Virol. 166, 157–165. doi: 10.1007/s00705-020-04857-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Zámborszky, J., Szikriszt, B., Gervai, J. Z., Pipek, O., Póti, Á, Krzystanek, M., et al. (2017). Loss of BRCA1 or BRCA2 markedly increases the rate of base substitution mutagenesis and has distinct effects on genomic deletions. Oncogene 36, 746–755. doi: 10.1038/onc.2016.243

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, Z., Dang, Y., Zhou, M., Li, L., Yu, C.-H., Fu, J., et al. (2016). Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc. Natl. Acad. Sci. U.S.A 113, E6117–E6125. doi: 10.1073/pnas.1606724113

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: neurodegeneration, metabolism-related genes, codon usage, dinucleotide ratio, RSCU, fine tuning of CpG dinucleotide

Citation: Khandia R, Sharma A, Alqahtani T, Alqahtani AM, Asiri YI, Alqahtani S, Alharbi AM and Kamal MA (2022) Strong Selectional Forces Fine-Tune CpG Content in Genes Involved in Neurological Disorders as Revealed by Codon Usage Patterns. Front. Neurosci. 16:887929. doi: 10.3389/fnins.2022.887929

Received: 02 March 2022; Accepted: 04 April 2022;
Published: 10 June 2022.

Edited by:

Wael M. Y. Mohamed, International Islamic University Malaysia, Malaysia

Reviewed by:

Yehuda Ben-Shahar, Washington University in St. Louis, United States
Tarikul Huda Mazumder, EduCare Academy, India

Copyright © 2022 Khandia, Sharma, Alqahtani, Alqahtani, Asiri, Alqahtani, Alharbi and Kamal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Rekha Khandia, cmVraGEua2hhbmRpYUBidWJob3BhbC5hYy5pbg==; YnUucmVraGEua2hhbmRpYUBnbWFpbC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.