Comparative Genomics and Characterization of SARS-CoV-2 P.1 (Gamma) Variant of Concern From Amazonas, Brazil

Zimerman, Ricardo Ariel; Ferrareze, Patrícia Aline Gröhs; Cadegiani, Flavio Adsuara; Wambier, Carlos Gustavo; Fonseca, Daniel do Nascimento; de Souza, Andrea Roberto; Goren, Andy; Rotta, Liane Nanci; Ren, Zhihua; Thompson, Claudia Elizabeth

doi:10.3389/fmed.2022.806611

ORIGINAL RESEARCH article

Front. Med., 15 February 2022

Sec. Infectious Diseases – Surveillance, Prevention and Treatment

Volume 9 - 2022 | https://doi.org/10.3389/fmed.2022.806611

This article is part of the Research TopicEmerging SARS-COV-2 Variants: Genomic Variations, Transmission, Pathogenesis, Clinical Impact and InterventionsView all 65 articles

Comparative Genomics and Characterization of SARS-CoV-2 P.1 (Gamma) Variant of Concern From Amazonas, Brazil

Updated

A correction has been applied to this article in:

Expression of concern on: Comparative genomics and characterization of SARS-CoV-2 P.1 (Gamma) variant of concern from Amazonas, Brazil
1. Read correction

Ricardo Ariel Zimerman¹

Patrícia Aline Gröhs Ferrareze²

Flavio Adsuara Cadegiani³

Carlos Gustavo Wambier⁴

Daniel do Nascimento Fonseca⁵

Andrea Roberto de Souza⁵

Andy Goren⁶

Liane Nanci Rotta²

Zhihua Ren⁷^*^†

Claudia Elizabeth Thompson^2,8^*^†

¹Hospital da Brigada Militar, Porto Alegre, Brazil
²Graduate Program in Health Sciences, Universidade Federal de Ciências da Saúde de Porto Alegre, Porto Alegre, Brazil
³Corpometria Institute, Brasília, Brazil
⁴Department of Dermatology, Warren Alpert Medical School of Brown University, Providence, RI, United States
⁵Samel and Oscar Nicolau Hospitals, Manaus, Brazil
⁶Applied Biology, Inc., Irvine, CA, United States
⁷Suzhou Kintor Pharmaceutical, Inc., Suzhou, China
⁸Department of Pharmacosciences, Universidade Federal de Ciências da Saúde de Porto Alegre, Porto Alegre, Brazil

Background: P.1 lineage (Gamma) was first described in the State of Amazonas, northern Brazil, in the end of 2020, and has emerged as a very important variant of concern (VOC) of SARS-CoV-2 worldwide. P.1 has been linked to increased infectivity, higher mortality, and immune evasion, leading to reinfections and potentially reduced efficacy of vaccines and neutralizing antibodies.

Methods: The samples of 276 patients from the State of Amazonas were sent to a central referral laboratory for sequencing by gold standard techniques, through Illumina MiSeq platform. Both global and regional phylogenetic analyses of the successfully sequenced genomes were conducted through maximum likelihood method. Multiple alignments were obtained including previously obtained unique human SARS-CoV-2 sequences. The evolutionary histories of spike and non-structural proteins from ORF1a of northern genomes were described and their molecular evolution was analyzed for detection of positive (FUBAR, FEL, and MEME) and negative (FEL and SLAC) selective pressures. To further evaluate the possible pathways of evolution leading to the emergence of P.1, we performed specific analysis for copy-choice recombination events. A global phylogenomic analysis with subsampled P.1 and B.1.1.28 genomes was applied to evaluate the relationship among samples.

Results: Forty-four samples from the State of Amazonas were successfully sequenced and confirmed as P.1 (Gamma) lineage. In addition to previously described P.1 characteristic mutations, we find evidence of continuous diversification of SARS-CoV-2, as rare and previously unseen P.1 mutations were detected in spike and non-structural protein from ORF1a. No evidence of recombination was found. Several sites were demonstrated to be under positive and negative selection, with various mutations identified mostly in P.1 lineage. According to the Pango assignment, phylogenomic analyses indicate all samples as belonging to the P.1 lineage.

Conclusion: P.1 has shown continuous evolution after its emergence. The lack of clear evidence for recombination and the positive selection demonstrated for several sites suggest that this lineage emergence resulted mainly from strong evolutionary forces and progressive accumulation of a favorable signature set of mutations.

Introduction

The Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) P.1 (Gamma) lineage has accounted for the majority of the genomes sequenced in Brazil during the second wave of the COVID-19 pandemic (1). It is considered one of the most relevant “Variants Of Concern” (VOC) worldwide. It carries 10 non-synonymous mutations in the spike protein, including three located at the Receptor Binding Domain (RBD: K417T, N501Y, and E484K). Remarkably, N501Y has demonstrated enhanced affinity for ACE-2 binding, potentially resulting in higher viral loads (2). The E484K substitution was previously demonstrated by our group to have arisen and maintained in four genetically diverse lineages in Brazil (3). It can induce immune escape with consequent reinfection and is associated with convalescent plasma therapy failure (4). We have previously shown that both substitutions are under strong positive selection (5). Since these mutations are present in other VOCs around the world, their emergence is evidence of convergent SARS-CoV-2 evolution (5, 6). K417T occurs in an important footprint for class I anti-RBD antibodies and has been associated to decrease neutralization activity for some monoclonal antibodies (mAb). Despite the mutations in spike predominate, P.1 also presents substitutions at other sites potentially relevant. Mutations in non-structural proteins may alter their function related to transcription, duplication, and immune modulation. In fact, it was suggested that at least emerging lineages might induce weaker antiviral interferon activity (7). The association of higher infectivity related to spike mutations with possible innate immune evasion could be associated with higher viral load. An earlier study describing P.1 in northern Brazil has linked this lineage to lower Cycle Threshold (C_t) values (8). This may be of major significance, since persistent higher viral loads after 1 week of symptom initiation were independently associated with worse outcomes, as demonstrated in a testing mAb study (9).

P.1 has been tied to higher case fatality rate (8) and this phenotypic trait may be related to its genetic background. The origin of the unique set of ~35 amino acid substitutions that characterize P.1 remain largely unknown. Homologous recombination of entire segments of genomes by “copy-choice” recombinations in viruses with non-segmented genomes are well-described among coronaviruses (10). Alternatively, the high seroprevalence of anti-SARS-CoV-2 antibodies in northern Brazil citizens could indicate that strong selective pressure was responsible for the new lineage origin. Accordingly, B.1.351 (Beta) has also emerged in a heavily previously exposed population at Mandela's Bay, South Africa (11). Antigenic drift “by chance” appears less likely, because intense ongoing transmission relieves evolutionary bottlenecks required for this phenomenon. Due to intense viral turnover, we expected to find signs of diversification of the original P.1 in clinical samples obtained during the highs of the second wave in north Brazil.

In this study, we describe the full-length SARS-CoV-2 genomes of 44 clinical samples from Amazonas, Brazil, sequenced and analyzed by our group, and compare them to previously described P.1 from Brazil and to worldwide. In order to better understand the evolutionary forces driving the P.1 emergence and evolution in the North region from Brazil, we conducted tests for recombination, phylogenomics, phylogenetic analyses of spike and non-structural proteins from ORF1a, and the detection of selective pressures acting on these sequences. New or uncommon Single Nucleotide Polymorphisms (SNPs) were also described and their potential relevance discussed.

Materials and Methods

In total, 276 nasopharyngeal swab samples of patients from Manaus, Parintins, and Itacoatiara (Amazonas, Brazil) collected between February 18, and March 10, 2021, for SARS-CoV-2 Quantitative Reverse Transcription Polymerase Reaction (RT-qPCR), which would be discarded by the laboratory, were initially included in this study. Patient SARS-CoV-2 status was determined by RT-qPCR testing following the Cobas SARS-CoV-2 RT-qPCR kit test protocol (Roche, USA).

SARS-CoV-2 Genome Sequencing and Assembly

For the genome sequencing, confirmed SARS-CoV-2 patients had their clinical samples submitted to a second RT-qPCR, which was performed by BiomeHub (Florianópolis, Santa Catarina, Brazil), with a charite-berlin protocol (12). Samples with quantification cycle (C_q) up to 30 for at least one primer were selected for SARS-CoV-2 genome sequencing and assembly by the BiomeHub laboratory.

The total RNAs were prepared according to reference protocol (dx.doi.org/10.17504/protocols.io.befyjbpw), with the cDNA synthesized with SuperScript IV (Invitrogen) and the DNA amplified with Platinum Taq High Fidelity (Invitrogen). The library preparation was performed with Nextera Flex (Illumina) and quantification was performed with Picogreen and Collibri Library Quantification Kit (Invitrogen). The genome sequencing was generated on Illumina MiSeq Platform by 150 × 150 runs with 500xSARS-CoV-2 coverage (50–100 mil reads/per sample).

For the genome assembly (BiomeHub in-house script), the adapters removal and read trimming for 150 nt read sequences were performed by fastqtools.py. The alignment of the sequenced reads to the reference SARS-CoV-2 genome (GenBank ID: NC_045512.2) was performed by Bowtie v2.4.2 (13) and additional parameters as end-to-end and very-sensitive. The analysis of the sequencing coverage and depth was generated by samtools v1.11 (14) with minimum base quality per base (Q) ≥30. Finally, the consensus sequence for each SARS-CoV-2 genome was generated by a bcftools pipeline (15), including the commands mpileup (parameters: Q ≥30; depth (d) ≤1,000), filter (parameters: DP >50) and consensus.

Genomic Analyses

The assignment of the SARS-CoV-2 lineages (Pango lineages assignment for September 2021) for each assembled genome was obtained with the Pangolin v2.3.8 web server (https://github.com/cov-lineages/pangolin). The clade assignment, mutation calling, and sequence quality check were performed by the Nextclade web server (https://clades.nextstrain.org/). The Snippy variant calling and core genome alignment pipeline v4.6.0 (https://github.com/tseemann/snippy) identified Single Nucleotide Polymorphisms (SNPs) and insertions/deletions from each genome sequence. For the genome map and SNP histogram, the genome alignment was performed by the MAFFT v7.475 (16) web server followed by the running of the msastats.py script, and the plotAlignment and plotSNPHist functions in RStudio (17). The SARS-CoV-2 genome with GenBank ID NC_045512.2 was used as reference.

Recombination Analysis of Samples From Northern Brazil

Initially, 2,485 complete and high covered genomes from the northern states in Brazil (57 genomes from Acre, 298 from Amapá, 1,511 from Amazonas, 394 from Pará, 32 from Rondônia, 38 from Roraima, and 155 from Tocantins) were retrieved from the GISAID database (submission up to September 12, 2021) and added to our 44 sequenced genomes. The genome alignment was performed using the Wuhan genome NC_045512.2 as a reference sequence in the MAFFT web server (16) with 1PAM/κ = 2 scoring matrix. The resulting alignment was trimmed in 266 positions for the 5′ end and 264 positions in the 3′ end with UGENE (18) v39.0, followed by the deletion of sequence duplicates, maintaining 1,931 unique genomes in the sequence set.

For the recombination analysis, the aligned sequence set was tested by the Recombination Detection Program (RDP) software v4.101 (19). The faster methods RDP (20), GENECONV (21), Chimaera (22), MaxChi (23), and 3Seq (24) were applied for the primary scan, followed by the verification of the detected recombination signals by the lower methods BootScan (25) and SiScan (26). After the detection of genomic breakpoints and possible recombination events, the results were re-evaluated with the available methods RDP, GENECONV, BootScan, Chimaera, MaxChi, SiScan, 3Seq, LARD (27), and Phylpro (28) if the detected recombination evidence was accepted by <4 methods. Recombination events supported by at least four methods after Bonferroni correction at a p-value ≤ 0.05 were accepted (29, 30). All analyses were performed with default parameters. The genetic distance plot comparison of the 44 sequenced genomes from this study with the NC_045512.2 reference was performed with the Python package recan using a window of 200 nt and a shift of 50 nt as parameters (31).

Phylogenomic Analysis of Samples From Northern Brazil

For the phylogenetic analysis of the northern genomes, the aligned set of 1,931 unique sequences also used in the recombination analysis was selected. The best evolutionary model was inferred as GTR + R4 by ModelTest-ng (32). The phylogenetic tree was built by the Maximum Likelihood method using the IQTREE software (33) with a Shimodaira-Hasegawa-like approximate likelihood ratio test (SH-aLRT) of 1,000 replicates added to 1,000 replicated of an ultrafast bootstrap, 2,000 iterations and the optimization of the UFBoot trees by NNI on bootstrap alignment. The tree visualization and editing were generated by the FigTree software (http://tree.bio.ed.ac.uk/software/figtree/).

Phylogenetic Analyses of Spike and Non-structural Proteins From ORF1a of Samples From Northern Brazil

Considering that spike and ORF1a were the genomic regions in the northern Brazilian samples with higher number of mutations identified in our genomic analyses, we performed phylogenetic analyses of those regions to better understand the specific patterns of molecular evolution. The nucleotide sequence alignments for the spike and non-structural proteins from the ORF1a sequence were recovered from the alignment of 1,931 northern Brazilian genome samples, according to the CDS coordinates of the reference genome (NC_045512.2). In total, eleven alignment sets (spike, NSP1, NSP2, NSP3, NSP4, NSP5, NSP6, NSP7, NSP8, NSP9, and NSP10) had their duplicated sequences removed and followed the same Modeltest-ng/IQTREE/FigTree workflow method previously described.

Molecular Evolution of Spike and Non-structural Proteins From ORF1a of Samples From Northern Brazil

Selection tests were performed with HyPhy v2.5.32 (34) using the nucleotide sequence alignment and the maximum likelihood tree (previously described) for spike and non-structural proteins from ORF1a. The FUBAR (35), FEL (36), and MEME (37) methods were applied with default parameters for the analysis of positive selection, while FEL (36) and SLAC (36) methods were used with default parameters to evaluate if those regions were submitted to negative selection.

Global Phylogenomic Analyses

A set of 54,214 P.1 and 2,081 B.1.1.28 completed, dated and high-covered genomes, available on GISAID, with collection date from January 01, 2020 and submission date until September 12, 2021, were date-ordered and had their duplicated sequences removed, keeping the first occurrence of each sequence. Subsequently, the software Augur (38), implemented a probabilistic subsampling based on the sample location (country) in order to select 5,000 P.1 and 100 B.1.1.28 genomes. The multiple sequence alignment with the P.1, B.1.1.28, the 44 sequenced genomes from this study, and the SARS-CoV-2 reference genome NC_045512.2 was generated by the MAFFT (16) web server (1PAM/κ = 2 scoring matrix), while the alignment trimming (deletion of 265 and 259 nucleotides at 5' and 3' ends, respectively) was performed with UGENE (18). A second round of sequence duplicates exclusion was applied to keep the first occurrence of each unique sequence. In case of sequence identity between a genome from this study and genomes downloaded from GISAID, the date was disregarded, and our genomes were kept as representative sequences.

The global phylogenetic analysis was then performed with the forty-four genomes sequenced in this study added to 4,953 GISAID data, totalizing 4,997 unique genomes. The 4,953 GISAID genomes were assigned to the B (SARS-CoV-2 reference genome), P.1 (4,856 genomes), and B.1.1.28 (96 genomes) lineages. The best evolutionary model was inferred as GTR + R4 by ModelTest-ng (32). The phylogenetic tree was built by the Maximum Likelihood method using the IQTREE software (33) with Shimodaira-Hasegawa-like approximate likelihood ratio test of 1,000 replicates added to 1,000 replicated of an ultrafast bootstrap, 2,000 iterations and the optimization of the UFBoot trees by NNI on bootstrap alignment. The FigTree software (http://tree.bio.ed.ac.uk/software/figtree/) generated the tree visualization and editing.

In all phylogenetic analyses, including the genome samples from northern Brazil, spike, non-structural proteins from ORF1a and the global phylogenomics, a group was considered monophyletic when satisfied the criteria of ≥80% for SH-aLRT (39) and ≥95% for ultrafast bootstrap branch support values (40). However, we also always described all cases whose branch support values for both statistical tests were ≥80%, considering the previously cited criteria seem too conservative.

Results

The flowchart of SARS-CoV-2 genome sequencing is presented in Figure 1. The nasopharyngeal swab samples (n = 276) were firstly analyzed to detect SARS-CoV-2 using RT-qPCR in a private lab and, subsequently, sent to the reference sequencing center. Twenty-five of them were excluded from this study due to their poor conditions, such as low volume. After the first RT-qPCR, 120 samples that tested positive for SARS-CoV-2 were submitted to a second RT-qPCR for C_q values verification. Only 45 samples that had C_q below 30 for at least one primer were submitted to sequencing. In total, 44 patients had their SARS-CoV-2 positive samples successfully genotyped.

FIGURE 1

Figure 1. Flowchart of SARS-CoV-2 RT-qPCR and genome sequencing.

Genomic Analysis

Forty-four successfully sequenced SARS-CoV-2 genomes were obtained from samples taken from February, 18 to March, 10, 2021, representing patients from three different cities: Manaus (n = 42), Parintins (n = 1), and Itatiacoara (n = 1). The sequence coverage for the forty-four assemblies ranged between 95.26 up and 99.79% of the 29,903 bp of NC_045512.2 reference genome (average mean of 99.50%). The mean sequencing depth was calculated to 771.79x, with a range variation between 499 and 1,060x (Supplementary File 1). At least 83.21% of the sequence positions achieved a coverage depth ≥51x (max = 99.64%, mean = 96.04%). According to the Pango Lineages assignment, the forty-four samples were characterized as belonging to the P.1 lineage (subclade 20J, Gamma, V3). Figure 2 shows the frequency of SNPs per SARS-CoV-2 genome position along the 44 sequenced genomes. In summary, in this study, the sequenced genomes from Amazonas showed between 21 and 26 amino acid mutations and two up to five deletions. A set of 12 genomes present a four nucleotides insertion (AACA) after the genomic position 28,269.

FIGURE 2

Figure 2. Mutations of the SARS-CoV-2 genomes from the Amazonas state, Northern Brazil sampled in February and March 2021. (A) Genome map for the 44 genomes sequenced. Nucleotide substitutions are colored in red and blank regions represent low sequencing coverage. (B) Frequency of SNPs per SARS-CoV-2 genome position along the 44 genomes. These mutations are corresponding to the red lines in (A) and only missense substitutions represented by >10 sequences have their respective amino acid changes indicated above the bars. Main Open Reading Frames (ORFs) and SARS-CoV-2 proteins are indicated at the bottom to allow a rapid visualization of the viral proteins affected.

Beyond the constellation of mutations expected from the P.1 lineage, we have found some unusual and/or uncertain deletions and substitutions. After automatic alignment using MAFFT v7.475 and visual inspection, we identified six nucleotide deletions of amino acid residues N188 and L189 from the spike protein in three genomic sequences followed by the R190S substitution in two of them. This mutational pattern is being firstly considered since the AGT codon for serine claims for two nucleotide substitutions in N188 residue (AAT > AGT) but only one in R190 (AGG > AGT), besides the substitution R190S being already known for P.1 genomes. However, by observing the alignment data is not possible to completely discard the occurrence of a N188S substitution followed by deletion of the residues L189 and R190, since in both cases (N188S/R190S) the codons are being mutated to a serine residue, which is the pattern commonly seen on GISAID. The finding of 15 P.1 genomes from Brazil and Taiwan presenting N188S/L189-R190del does not exclude the possibility of generalized alignment errors in these positions due to the mistaken AGT codon mapping, which is probably the case. It is important to note that there is no previous evidence on the GISAID database of a coupled N188-L189 deletion with the conservation of the neighbor residues. In 22 genome sequences (P.1.10 and B.1.36, mostly) available on GISAID from Belgium, Egypt, India, Germany, Greece, Suriname, and USA, there is a deletion of the adjacent positions (e.g., 186, 187, and 190), in addition to N188-L189 deletion.

The frequency of spike mutations identified in the P.1 sequenced genomes (this study) compared with sequences from Amazonas, Brazil, and World (up to September 12, 2021) available on GISAID is shown in Table 1. Interestingly, the P209H mutation at spike NTD was not previously detected in the P.1 samples sequenced around the world. Actually, this mutation was identified in several lineages, including B.1.1.7 (Alpha; England, Italy, and USA), B.1.351 (Beta; South Africa and Sweden), and B.1.617.2 (Delta; England and Indonesia). The deletions S: A243 and S: L244 in the spike protein were simultaneously found in 43 P.1 genomes, excluding those sequenced in our study, from Brazil (Amazonas, Distrito Federal, Rio de Janeiro, Sergipe, and São Paulo), Belgium, Canada, Chile, French Guiana, Malta, Peru, and USA. Another example is the substitution S: T1066A, which was only found in genomes from the lineages AY.4, AY.7.1, B.1.1.7, and B.1.258.10 in USA and European countries. This substitution has been identified for the first time in a P.1 genome sequenced in this study. The S: D215G mutation was described in other eight P.1 Amazonas genomes after our sample collection, as well as in P.1 samples from the states of Goiás (first occurrence in Brazil), Paraná, Santa Catarina, São Paulo and also in Chile, Mexico, Netherlands, and USA. Therefore, one genome sequenced in this study and one additional genome were the first sequences identified carrying this mutation in P.1 lineage in Amazonas. The spike mutations L18F, P26S, E484K, N501Y, D614G, and V1176 were identified in all 44 genomes sequenced in this study, followed by the recognition of T20N, D138Y, H655Y, K417T, and T1027I in 43 of them. Eight spike mutations were observed in three sequences or less (N188-L189del, P209H, D215G, A243-L244del, A688V, and T1066A).

TABLE 1

Table 1. Frequency of spike mutations identified in the P.1 sequenced genomes (this study) compared with sequences from Amazonas, Brazil and World (up to September 12, 2021) available on GISAID.

The frequency of mutations in N, non-structural proteins (NSPs) and accessory proteins identified in the P.1 sequenced genomes (this study) compared with sequences from Amazonas, Brazil and World (up to September 12, 2021) available on GISAID is shown in Table 2 (ORF1a) and Supplementary File 2 (complete table of mutations). The N protein sequence exhibited the substitutions P80R, R203K, and G204R in all 44 genomes, which are highly frequent mutations in the genomes sequenced in Brazil and World, and two of them also presented—separately—the mutations P199L and T135I. A high number of mutations was identified for ORF1a, totalizing 29 possible amino acid substitutions and three deletions. The most frequent ones include NSP3:K977Q, NSP3:T133I, NSP5:A70V, NSP5:V86I, NSP6:A51V, NSP6:G107S, NSP6:F108L, and NSP6:S106/G107/F108del. The NSP3:S370L substitution was identified in 43 genomes, exception for one sequence presenting the combination NSP2:K456R, NSP3:T186P, NSP3:K977Q, NSP3:T1189I, and NSP6:V149A. The NSP2:N9S and NSP4:V30A mutations were identified for the first time in P.1 genomes sequenced in this study. ORF1b is majoritarily represented by the substitutions NSP12:P323L and NSP13:E341D, which were described in 43 genomes each. Other nine amino acid substitutions were also found. The NS3:S253P mutation in ORF3a occurred in 44 sequences, occasionally followed by other substitutions such as P25L, P1045S, S117I, W131C, and E181V. Finally, the mutations NS8:E92K and ORF9b:Q77E (accessory protein, alternative ORF from nucleocapsid—N—gene) were assigned to 43 and 44 genome sequences, respectively. Other substitutions include E59D in ORF8, as well as NS7a:E22D and NS7b:S31L.

TABLE 2

Table 2. Frequency of ORF1a mutations identified in the P.1 sequenced genomes (this study) compared with sequences from Amazonas, Brazil and World (up to September 12, 2021) available on GISAID.

According to the collection date registered on GISAID, the first occurrence for each spike mutation identified in all sequenced genomes in this study is shown in Table 3. Five mutations have their first occurrence in Brazilian P.1 genomes sequenced in this study: A243/L244del, N188del, P209H, and T1066A. Specifically, the A243/L244 double deletion identified in our samples is the first occurrence of this combination in P.1 genomes worldwide. Before February 22, 2021, these deletions were mostly found in B.1.351 genomes (n = 6,461 of 8,009). The case of N188del is not clear, since several genomes may contain alignment errors and, consequently, not being possible to identify this mutation on the GISAID data. The P209H and T1066A substitutions, which are present in lineages such as B.1.1.7 and AY.4 (B.1.617.2 derivative), are only represented by our Amazonas genomes in P.1 data. Moreover, the genome sequences in this study represent the first occurrence of mutations A688V, D215G, and L189 del in P.1 genomes from Amazonas.

TABLE 3

Table 3. First occurrence (by collection date) registered on GISAID for spike mutations identified in the sequenced genomes from this study.

Following the collection dates on Supplementary File 3, it is possible to observe that the first occurrence for the mutations N:P199L, NSP3:T1365A, NSP3:S1437F, NSP3:S1670F, NSP4:S481L, NSP5:A70V, NSP6:A51V, NSP8:E155G, NSP13:M274I, NSP15:D39Y, NSP16:K160R, and NS3:P104S in P.1 sequences from Brazil is related to the sequenced genomes from this study. Except for N:P199L and NSP16:K160R, previously reported in the USA and Italy, all others have their first appearance for P.1 genomes in the world with the genomes from this study. Before February 22, 2021, N:P199L is found in more than 100,000 genomes worldwide, mostly in lineages such as B.1.2, B.1.221, and B.1.596, despite being present in several other lineages. Other substitutions, such as NSP1:T170I, NSP14:P158H, NSP15:E170D, have their only P.1 occurrence in Brazil identified in the genomes from this study, despite other countries posteriorly reporting their presence. A third group was formed by the mutations NSP2:N9S, NSP4:V30A, and NS3:E181V, which were not found in any other place from Brazil or the World. For 28 mutations, our genomes represent their first occurrence in P.1 Amazonas samples.

Recombination Analysis of Samples From Northern Brazil

The analysis performed by the RDP software did not find any statistically supported recombination event in the set of 1,931 northern genomes from Brazil. Despite the 3Seq method detected the P.2 genome EPI_ISL_2245087 (from Pará) as potential recombinant (begin breakpoint: 25,729, end breakpoint: 27,459, p-value = 0.007) in the first scan, no other method supports this evidence, suggesting a possible false-positive. Similarly, the genetic distance plot inferred by the recan package did not suggest any potential breakpoints in the evaluated sequences. The indicated peaks of genetic distance are related to the presence of undefined genomic regions in the sequenced samples (Figure 3).

FIGURE 3

Figure 3. Genetic distance plot of the 44 sequenced genomes in comparison with the NC_045512.2 reference genome (Wuhan). (A) Genomic sequence identity. (B) Sequence identity for the spike gene region.

Phylogenomic Analysis of Samples From Northern Brazil

The phylogenetic analysis of the 1,931 genomes obtained by the maximum likelihood method showed the formation of statistically supported clades for the lineages P.1 and P.2 as derivatives of the B.1.1.28 lineage (Figure 4). All samples from this study were included in the P.1 monophyletic group. Inside the P.1 clade was identified a group of P.1.7 genomes. Both P.1.4 and P.1.6 groups did not reach the minimal branch support cutoff of 80% in both statistical tests. Two P.1.7 genomes were found outside the P.1.7 group, clustering with P.1 sequences. The analysis of these two genomes showed the absence of the P.1.7 lineage defining mutation P681H for the spike sequence.

FIGURE 4

Figure 4. Maximum Likelihood tree of 1,887 unique genomes from the North region—Brazil, added to the 44 sequenced samples of this study. Pangolin lineages represented by <five genomes are labeled as “Other.”

As expected, B.1.1.33 and N.9 sequences were clustered, despite the absence of statistical validation when considering both methods together. The N.9 subgroup achieved branch support to be validated as a clade. Sequences from the lineages B.1.195, B.1.212, B.1.617.2, and B.1.111 formed their own groups in the B.1 related most external branch (>75% of statistical support by all tests). Two B.1 sequences were found inside the B.1.195 and B.1.212 groups (one in each cluster).

Phylogeny of Spike and Non-structural Proteins of Samples From Northern Brazil

Since the spike and ORF1a regions accumulated a higher number of mutations in the sequenced genomes, phylogenetic analyses were performed for these sequences, allowing to compare the clusterization of the North samples according to specific protein profiles. The best evolutionary model was predicted for each sequence alignment, as follows: spike – GTR + G4, NSP1 - TrNef, NSP2 - TIM1, NSP3 – GTR + G4, NSP4 - TrN, NSP5 - TrN, NSP6 - TIM1 + G4, NSP7 - HKY, NSP8 - TrN, NSP9 – TrN + I, and NSP10 - TIM1. In fact, the exclusion of sequence duplicates reduced the sample data set, reflecting the lack of sequence variability for some ORF1a regions. The evident difference in sequence conservation among NSPs as well as the inference of different evolutionary models denote the various selective pressures possible acting on these non-structural proteins.

Interestingly, in the spike phylogeny, the sequences are clustered according to the assigned genome lineages (Figure 5A; Supplementary File 4A). According to the spike mutational pattern, which is reflected in the phylogenetic tree of 584 unique sequences, P.1, P.1.4, P.1.6, and P.1.7 formed a monophyletic group inside the P.2 cluster. The group including P.1, P.2, and B.1.1.28 genomes showed no branch support by SH-aLRT and ultrafast bootstrap tests, which indicates it does not form a clade.

FIGURE 5

Figure 5. Phylogenetic analyses of spike and non-structural proteins from ORF1a including northern Brazilian SARS-CoV-2 genomes. (A) spike, (B) NSP1, (C) NSP3, (D) NSP5, (E) NSP6, and (F) NSP9.

For the non-structural protein 1 (NSP1 - 86 unique sequences), the P.1 sequences and their derivatives (P.1.4, P.1.6, and P.1.7) were clustered together, without statistical validation. A few random samples from different lineages such as B.1.1.28, B.1.1.33, P.2, N.9, and B.1.195 were found in this cluster (Figure 5B; Supplementary File 4B). In NSP3 phylogeny (747 unique sequences), P.1, P.1.4, P.1.5, P.1.6, and P.1.7 sequences also formed a statistically non-validated group (Figure 5C; Supplementary File 4D). B.1.195 formed a cluster not statistically well-supported. NSP5 (116 unique sequences) presented two non-statistically supported groups of P.1.6 and P.2 sequences (Figure 5D; Supplementary File 4F). For NSP6 phylogeny (140 unique sequences), P.1 sequences (and their derivatives) are mostly clustered together. However, the cluster is not statistically supported (Figure 5E; Supplementary File 4G). The same pattern was observed for P.2 sequences. A similar model was obtained for NSP9 (58 unique sequences), where most of all P.1 sequences and P.2 sequences clustered without statistical support (Figure 5F; Supplementary File 4J).

In all these phylogenies, it was observed the dispersion of lineages along the tree by the formation of some statistically well-supported groups including different lineages, which may suggest a low genetic divergence among them in the ORF1a. In this way, it was not possible to identify a sequence clustering based on the expected similarity intralineage for the non-structural proteins 2 (329 unique sequences), 4 (172 unique sequences), 7 (29 unique sequences), 8 (58 unique sequences), and 10 (38 unique sequences) (Supplementary File 4). Considering the SH-aLRT and ultrafast bootstrap cutoff values (≥80% for SH-aLRT and ≥95% for ultrafast bootstrap), the formation of a monophyletic group was only supported for P.1 lineage and their derivatives (P.1.4, P.1.6, and P.1.7) in the spike phylogenetic tree.

Molecular Evolution of Spike and Non-structural Proteins of Samples From Northern Brazil

The selection tests performed by the FUBAR method using the spike sequence alignment detected 23 sites potentially under pervasive diversifying selective pressure (Table 4). Of these, five (5, 138, 417, 681, and 1,264) were also indicated by the FEL method. Interestingly, the sites 138 and 417 were observed in MEME results for episodic diversifying selection. A number of 78 sites were found to be under purifying selective pressure by FEL, of which 18 were also predicted by SLAC. The full set of results, including FEL, MEME, and SLAC methods, is available on the Supplementary File 5.

TABLE 4

Table 4. Fast Unbiased AppRoximate Bayesian (FUBAR) analysis of pervasive diversifying selection on spike protein.

About the FUBAR identified sites, seven are largely found in P.1 genomes (as lineage-defining mutations), including in those 44 sequenced in this study. This mutation set, which comprised substitutions in sites 20, 138, 417, 484, 501, 655, and 1,176, is found in more than 90% of the P.1 genomes in the world (n = 62,863), as well in other 9,282 genomes currently assigned in lineages such as B.1, B.1.1, B.1.628, N.9, and P.1-derivatives (e.g., P.1.2, P.1.1.6, P.1.10). Mutations such as L5F are mostly found in the lineages B.1.526 and B.1.1.7 worldwide. In Brazil, its presence is majoritarily identified in P.1 lineage. Substitutions such as T29I, H49Y/L, A67S/T, T95I, E96D/G, A222V/S, G257D/S, Q613H/E, A653S/V, Q677H, S689I, G769V/R, A1174V/S, and V1264L were found in a wide range of Brazilian lineages (average mean of seven lineages per mutation, 50% with at least five lineages), despite their low frequency (average mean of 68 sequences per mutation, 50% of them with at least 13 sequences), including VOC and VOI lineages. However, the amino acid substitutions P681H/R/T present a higher frequency. The P681R mutation is found in more than 2,800 genomes and 35 lineages. The variation P681H exceeds 2,000 genomes in 21 different lineages.

In the constellation of possible substitutions occurring in the selected sites, it was observed that the mutation H49L is only found in one P.1 genome from Amazonas. About 40.30% of sequences with the prevalent mutation H49Y were identified in genomes from Amazonas and Tocantins, comprising the P.1, B.1.1.28, and B.1.195 lineages. Similarly, P681T has its unique appearance in a P.1.4 genome from Amazonas. Of the eight genomes presenting the A1174V substitution, seven are from the North region (Amapá and Pará) including P.1 and P.1.7 lineages.

In relation to the mutations identified in ORF1a by the selection methods applied, NSP1 (three sites), NSP3 (six sites), and NSP9 (two sites) achieved the higher number of sites suggested to be under adaptive pressure (Table 5). For NSP1, the sites V5A/I, R43C/H, and R77Q/L presented evidence of pervasive diversifying selection by the FUBAR analysis. Specifically, site 5 was also identified by the FEL method. Interestingly, 136 genomes from 16 lineages in the world, including P.1 and B.1.617.2, carry the V5A mutation. In Brazil, this substitution was previously detected in only one genome (in Amazonas) from the P.1.4 lineage. The V5I substitution, identified in 224 sequences in the world (two in Brazil), was detected in a set of 27 lineages; the Amapá sample, specifically, belongs to P.1. In turn, R43C is part of 348 genomes from 53 lineages. In Brazil, three genomes (two of them from Amazonas) belonging to the lineages P.1, B.1.1, and B.1.1.33 present this mutation. The variant R43H is found in 279 sequences in the world, comprising 34 lineages, of which P.1 is represented by a sample from Pará. Finally, R77Q, which was found in 280 genomes in the world and 29 lineages, has in Pará its Brazilian representative for the P.1.7 lineage. The least prevalent of the NSP1 substitutions is R77L, with 33 sequences belonging to 15 lineages. The sample from Amapá (P.1) is the only one in Brazilian territory. No sites in NSP1 were found to be under episodic selection by MEME analysis. By FEL method, additional four sites in NSP1 were also detected with evidence of purifying selection: 36, 42, 74, and 156. Of these, site 156 was identified by SLAC as well.

TABLE 5

Table 5. Fast Unbiased AppRoximate Bayesian (FUBAR) analysis of pervasive diversifying selection on non-structural proteins (NSPs) from ORF1a.

The codon 339 in the NSP2 CDS, presented two possible amino acid substitutions in our analyzed data. G339S was found in 4.0% (13 of 329 unique NSP2 sequences) of our aligned data, while G339D was identified in only one genome. Similarly, the GISAID database shows that 25,389 genomes in the world present the G339S mutation, including several lineages. In Brazil, a total of 247 sequences carry the G339S substitution in 13 lineages. By the FEL method, other three sites were suggested to be under pervasive adaptive pressure in NSP2: 136, 165, and 447. Moreover, MEME has detected episodic diversifying selection in the 165, 339, 447, and 458. All sites identified as purifying selection targets are described in the Supplementary File 5.

A total of six sites were identified to be under positive selection in NSP3. About 5.9% (44 of 746 unique sequences) of the evaluated North genomes presented the amino acid substitution T133I in the NSP3 sequence alignment. In the world, 3,306 sequences are sampled with this substitution, of which 431 are from Brazil and include lineages such as B.1.1.33, P.1, P.1.5, P.1.9, and P.7. A231V corresponds to 97 Brazilian samples (of 5,281 in the world pool) from nine different lineages in GISAID, while its frequency is about 0.9% of the unique NSP2 sequences in our sequenced data. Despite the lower occurrence, a variant A231T is also present in the analyzed genomes. Another example is K1211N/E, which is mutated in 0.4% of the unique NSP3 sequences from North genomes. The K1211N substitution is found in 50 samples from Brazil, occurring in eight lineages. In the world, 3,253 sequences carry this mutation. The P1228L/S mutations are part of 2.0% of the unique NSP3 sequences (1.2 and 0.8%, respectively). Considering 2,815 deposited genomes in Brazil (and genomes in the world), the P1228L mutation comprises 27 lineages, mostly from the AY group of B.1.617.2 derivatives. Accountable for 1% of the unique sequences, the mutations N1610K/S (0.4/0.1%) and S1670F (0.5%) are found in low numbers (n = 3/3/43) in Brazil (S1670F has a higher number at world level), despite the diversified set of lineages (2, 2, and 9, respectively). In Brazil, N1610K is only found in Amapá (n = 2) and Roraima (n = 1). The sites 231, 618, 727, 1,211, 1,303, and 1,440 were also detected by FEL/MEME methods. A number of 88 sites were suggested to be under negative selection pressure by FEL, 18 of them were also suggested by SLAC inference.

The substitutions NSP5:T93I/A (1.7/0.9%), NSP6:V149F/A/I (2.1/1.4/0.7%), NSP8:E155G/D (1.7/1.7%), NSP9:D47F (1.7%), NSP9:T77I/A (3.4/1.7%), and NSP10:T49I (5.3%) are found in one up to three unique sequences of their alignment sets. The NSP5 T93 substitution to an isoleucine residue is found in 47 genomes from Brazil (one sequence from Rondônia), including six lineages. In the world, this event is found in 1,690 GISAID genomes. However, the variant T93A only occurs in one Amapá sample from P.2, besides other 54 sequences in the world, in a set of 19 lineages. By the FEL results, the NSP5 sequence presents 15 sites under purifying selection, while the SLAC method only detected site 151. The NSP6:V149F amino acid change was identified in 16,214 genomes, of which 1,090 are from Brazil (16 lineages). The variants V149A/I are associated with a lower number of sequences and lineages when compared to the others. However, this site was also detected by the FEL method. For NSP6, the MEME method also detected evidence of episodic diversifying selection in the sites 106 and 194. Six sites (118, 120, 200, 262, 270, 284, and 286) were identified in the FEL/SLAC analysis for negative selection.

NSP7:L71F (27.6% of the unique sequences from North genomes) reaches 10,569 world samples on the GISAID database; of these, 4,317 (22 lineages) are Brazilian. A low frequency pattern can be observed in the mutated site E155G/D (NSP8). The NSP8 sequence analysis by FEL indicated seven sites under negative selection: 18, 91, 111, 121, 137, 162, and 183. For NSP9, the substitution D47F is only found in one sample from Roraima (B.1.1.33). The substitution NSP9:T77I can be located in 783 genomes, 10 of them from Brazilian samples (six lineages). In NSP9, FEL detected evidence of pervasive selection in site 33, while the MEME method detected in site 47. The sites 6, 28, 31, 42, 68, 93, 103, and 112 were inferred by FEL as submitted to negative selection, while the SLAC method detected the sites 31 and 95. In relation to NSP10:T49I, it is found in 521 genomes in the world, of which 11 (in five lineages) are from Brazil. Six sites NSP10 (30, 59, 64, 65, 66, and 83) were identified to be under purifying selective pressure by FEL.

It is important to notice that nearly all substitutions potentially maintained by adaptive pressure are found in VOC lineages such as P.1, B.1.1.7, and B.1.617.2, besides their derivatives and other lineages of interest and/or monitoring. Despite their low prevalence, the high number of lineages carrying these mutations may also indicate a positive selective pressure acting on these sites, maintaining mutational events in ORF1a.

Global Phylogenomic Analyses

The phylogenetic analysis of 4,952 worldwide P.1 and B.1.1.28 genomes with the 44 genome sequences of this study (and the NC_045512.2 SARS-CoV-2 reference genome) showed the formation of a P.1 group with the B.1.1.28 sequences in the basal branches. One additional B.1.1.28 sequence from Turkey was found inside the P.1 group. However, the analysis of the genomic mutations suggest that this sequence probably belongs to the P.1 lineage due to the presence of substitutions such as L18F, T20N, P26S, E484K, N501Y, and H655Y. Interestingly, this genome is one of the two B.1.1.28 genomes from GISAID presenting the mutation N440K in the spike protein. The other one, also from Turkey, presents this same mutation set and is located at the basis of the P.1 group, separately clustering with a P.1 genome from South Korea also presenting this mutation (Figure 6).

FIGURE 6

Figure 6. Collapsed tree representing the assigned lineages of the 4,997 subsampled genomes from global phylogeny.

Despite only P.1 and B.1.1.28 sequences have been selected on GISAID (at September 12, 2021), six genomes from Turkey were posteriorly reassigned (at September 28, 2021) as B.1, corroborating the formation of a basal clade (Figure 6). The large monophyletic group formed by B.1.1.28 and P.1 genomes is validated in both tests, evidencing the ancestor-descendent relationship between B.1.1.28 (ancestor) and P.1 (descendant) lineages. As expected, the 44 sequenced genomes from the Amazonas were in the P.1 cluster (Figure 7), despite their location in different subgroups along the tree.

FIGURE 7

Figure 7. Global phylogeny of 4,997 subsampled genomes from P.1 and B.1.1.28 lineages.

Eighteen sequences (38,966, 39,448, 39,454, 39,432, 39,443, 39,438, 39,428, 39,436, 39,452, 39,440, 39,465, 39,437, 38,967, 39,444, 39,456, 39,433, 39,449, and 39,450) grouped with Brazilian genomes. Eight of them (39,432, 39,443, 39,436, 39,452, 39,431, 39,433, 39,449, and 39,450) only with Amazonas genomes (Figure 8). Only four sequences (39,431, 39,433, 39,449, and 39,450) were found in statistically validated clades with the subsampled genomes. The genome 39,431 (presenting the double deletion of the spike A243–L244 residues and the absence of the lineage-defining mutations ORF1a:S1188L, ORF1b:E1264D, S:T20N, R190S, S:H655Y, and S:T1027I) formed a fully supported clade with a genome from Amazonas. The monophyletic group with the sequences 39,433, 39,449, and 39,450 (that share mutations such as ORF1a:T951I, A3333V, V3349I, and A3620V) and Brazilian genomes from Amazonas was supported by the SH-aLRT and ultrafast bootstrap tests. The sequences 38,969 and 39,459 (sharing two ORF1b substitutions: I539V and E2221D) form their own clade.

FIGURE 8

Figure 8. Representative subtree of the 44 sequenced samples of this study and their nearest grouped neighbors inside the P.1 clade. The branch support values are represented by the SH-aLRT and ultrafast bootstrap percentages ≥70.

Other clusters were also found despite the absence of statistical evidence as monophyletic groups by both tests (only the SH-aLRT test achieved values above the cutoff of 80%). The closest neighbors of the genomes 38,969 and 39,459 are Brazilian genomes from Rondônia and Santa Catarina, as well as a genome from Ireland. A group between the genomes 39,448, 39,454 (that share a L1504F substitution in ORF1b) and a Brazilian sequence from Rio de Janeiro is found. The sequence 39,443 was in a group with two Amazonas genomes. The genome 39,438 clustered with a Brazilian sequence from Rio Grande do Sul. The sequence 39,436 formed a group with a Brazilian genome from Amazonas. The sequence 39,452 was also grouped with an Amazonas genome. The sequence 39,447 (presenting the substitutions T2183I, H3076Y, G3676S, and F3677L in ORF1a) grouped with genomes from Brazil (Pará, Pernambuco, and Amazonas), Argentina, Chile, and Germany. Inside the P.1 group, the sequence 39,442 formed a separate cluster in relation to the previously described sequences, clustering with genomes from Brazil (Amazonas, Acre, Rio de Janeiro, Minas Gerais, and Pará), Switzerland, Scotland, and Philippines.

The genomes 39,457, 39,446, 39,435, 39,441, 39,430, 39,464, 39,439, 39,467, and 39,462 grouped as external branches, unrelated to other subsampled genomes. The genomes 39,451, 39,432, 39,463, 39,445 (including the residue alterations ORF1a:F3677L and ORF1b:P1682H), 39,434, 39,458, and 39,460 (carrying the mutations S:A688V and ORF3a:P104S), 39,440 (presenting all the P.1 lineage-defining mutations added to the substitutions ORF1a:S2255F and F3677L), 39,465, 38,968 and 39,461 (sharing the mutations ORF1a:T951I and ORF3a:P25L), 39,437, 39,466, 39,429 and 39,453 formed clusters with branch support values below 80% for both tests.

Discussion

P.1 (Gamma) lineage is arguably the most concerning VOC emerged thus far. Originally described on January 6, Japan, from four returning travelers from Manaus, the lineage exhibited accelerated dissemination from December 2020. Thirty-five SNPs, including ten located at the spike, were initially characterized (41). We have demonstrated signs of diversification from the original P1. More specifically, we have found mutations that were either rare in this lineage (such as deletions at sites 188 and 189 coupled with substitutions at site 190 and deletions at 243 and 244) or previously not described at all (P209H), all located at the spike NTD. Despite this site has not been the target of much attention as RBD, it is in fact a hotspot for mutations, including “indels” and other missense substitutions, which may affect the activity of neutralizing antibodies and/or lead fitness recovery activity (42, 43). Deletions at amino acids 243 and 244 are common in B.1.351 (Beta), but are infrequent in worldwide P.1 (0.06%). These mutations occur at the exposed antigenic supersite from NTD and abolish the activity of some anti-NTD neutralizing antibodies (e.g., 4A8) (44). Besides being rare to date, to our knowledge, our sequences are the first to describe the presence of 243–244 del in P.1 lineage. The P209H substitution replaces a proline, which has a rigid impact on peptide structure, for a positively charged, imidazole-containing, histidine. We project that this could have an effect on the resulting NTD structure as well. Since ~10–50% of neutralizing antibodies are directed toward NTD, it is noteworthy that we have found additional mutations at this site (44). We hypothesized that they could have had a critical role in allowing the emergence of a new lineage (P.1) despite a populational background of 75% previous seroprevalence. Similarly, S: T1066A is a low frequent mutation between the heptad repeat (HR1 and HR2) regions of the spike subunit 2, which was previously found in lineages such as B.1.258.10 and B.1.1.7, being firstly detected in the Brazilian territory in our P.1 genome. However, we were not able to demonstrate that any of these newly described mutations were being actively selected at this moment. On the other hand, canonical substitutions present in the P.1 lineage, such as E484K, K417T, N501Y, and H655Y were shown to be positively selected. E484K is important in immune evasion, particularly against class II anti-RBD antibodies, while K417T may be strongly related to immune evasion against class I anti-RBD antibodies (44). N501Y has minimal impact on antigenic evasion but results in an important increase in binding affinity to hACE-2 (4, 45). Substitutions near the Furin Binding Site (FCS), such as H655Y, may augment FCS activity, further increasing infectivity or may be involved in dynamics S1 changes through increased hydrophobicity in the Gear-like Domain (46). The combination of the known phenotypic effects of these mutations with our positive selection results strongly support the idea that P.1 emergence is the result of intense selective pressure, which also explains the appearance of mutations at these sites in different lineages (convergent evolution).

Substitutions in genome regions outside the S gene may be critical for the infection severity, with potential impact on the modulation of innate immune response. For example, ORF9b, which is an alternative open reading frame within the nucleocapsid (N) gene, may have a important role in SARS-CoV-2-human interactions and has been demonstrated to significantly inhibit the IFN-I production as a result of targeting mitochondria. Moreover, the analyses of the sera of convalescent SARS-CoV or SARS-CoV-2 patients showed antibodies against ORF9b (47, 48). It is interesting to note that mutations at this site were found in most of our sequences.

In relation to the non-structural proteins from ORF1a, only NSP4 did not present evidence of sites under diversifying selection pressure. In NSP1, none of the sites involved in the translation inhibition (first and second C-terminal helix residues 153–160/166–178) (49) or host mRNA cleavage (R124 and K125) (50) were detected by the positive selection tests. However, site 156 was found to be under purifying selection pressure by FEL and SLAC methods. The NSP3 residue 1,031, belonging to the canonical cysteine protease catalytic triad (related to the Papain-like viral protease site D286) on the active site (51) was also detected by FEL and SLAC, which indicates it is submitted to negative selection. Coronaviral proteases as Papain-like protease (PLpro) are essential for viral replication and modulate host immune response to viral infection (51), being known as an IFN antagonist (52). All positively selected sites identified by the FUBAR method are associated with NSP3 interdomain regions. Site 618, detected as positively selected by FEL and MEME, belongs to the Single-stranded poly (A) binding domain (SUD-M), and is involved in the recognition of Guanine-rich non-canonical nucleic acid structures (G-quadruplexes or G4), being essential for SARS-CoV replication (53); whereas site 727, also positively selected, is part of the Coronavirus polyprotein cleavage domain (Nsp3_PL2pro). At NSP5 C-terminal autocleavage sequence (S301–Q306) (54), the residue T304 is under purifying selection. Both NSP7 and NSP8 have essential functions for the RNA synthesis machinery orchestrated by the NSP12 RNA-dependent RNA polymerase. In NSP7, evidence of positive selection was found for site 71, located at the interface between helix α1 and α4 in the NSP7–NSP8 complex (55). The NSP12 residues contact the NSP8 N-terminal region (77–126) where the primase activity lies. Of these, sites 91, 111, and 121 were identified to be under purifying selection. Specifically, site 91 from NSP8 helix α1 is involved in the NSP7–NSP8 complex formation (54, 56). NSP9 sites 6, 31, 103 detected under negative selection interact with peptides by hydrogen and Van der Waals bonds at the hydrophobic cavity in a region close to the dimer interface, with potential impact on the juxtapositioning of the monomers within the homodimer (57). In NSP10, site 83 in the zinc binding site located between the helices α2 and α3 (58) is under purifying selection, possibly due to its importance to stabilize the structural conformation.

All sites indicated as negatively selected in our study probably have an important role for the viral infection mechanisms, consequently being conserved among SARS-CoV-2 genomes. However, our results also indicated that positive selection is leading to an increase in amino acid variability in some viral sites, which results in an amplified potential to viral adaptation and evolutionary success. While a first and important mechanism for viral evolution is the action of positive selection, another possible mechanism for rapid acquisition of new substitutions would be the occurrence of “copy choice recombinations,” a well-described mechanism of recombination for coronavirus, which displays discontinuous replication process, favoring frame shifting. In fact, recombination between B.1.1.7 (Alpha) and other lineages have been characterized (59), but the evidence of the process of selecting new lineages is scarce. However, we were not able to find consistent evidence for copy choice recombination between full genomes from northern Brazil. This process could have explained the fast acquisition of a plethora of distinctive mutations and the apparent lack of recombination in our study strengthens the hypothesis of intense and continuous evolution under strong selective forces. Alternatively, additional substitutions found in our P.1 sequences may be simply the consequence of a matter of chance related, for instance, to “bottlenecks” or other evolutionary constraints acting as “drifters.”

In the phylogenomic analysis of north samples, it was possible to observe the formation of monophyletic groups for P.1 and P.2 genomes (E484K-presenting lineages derivative from B.1.1.28), as well as for the N.9 lineage (E484K presenting lineage derivative from B.1.1.3). Similarly, the phylogenetic analysis of spike sequences showed that P.1 genomes formed a clade. In the global phylogenomics, B.1.1.28 and P.1 were recognized as a clade, exposing their ancestor-descendent relationship. The clade formation for P.1 genomes in the phylogenomic analysis of north samples, as well as in the spike sequences, indicates the importance of the spike protein for SARS-CoV-2 evolution and lineage definition of these genomes. Sequences of non-structural proteins from ORF1a accumulated multiple substitutions without evidence of diversifying selection pressure in these sites. Considering the limitations related to a subsampled global phylogeny analysis, it is not possible to know if the absence of well-supported monophyletic groups is due to the missing branches and nodes representing the most related sequences and their ancestors or some technical limitation inherent to the application of the branch support statistical tests to the SARS-CoV-2 genomic sequences, since bootstrapping approaches require multiple sites supporting a clade to infer strong support value in near-perfect trees (60). In fact, SARS-CoV-2 genomes present a low number of informative sites, which may generate topology with low statistical support and ambiguous clustering of large data sets (61). Additionally, both SH-aLRT and ultrafast bootstrap methods rely on bootstrap resampling (60). Considering all aspects discussed above and the divergence among sequences, it is expected that the P.1 genomes (especially in the spike sequence analysis) cluster together and phylogenetic trees including multiple lineages present better statistical support values inter-lineages in comparison with intra-lineages.

In summary, the association of diversification of P.1 sequences, the known phenotypic consequences of some signature mutations, the confirmation of positive selection acting on some sites and the absence of evidence for recombinations, all suggest that the main driving force in the evolution of P.1 viruses was selective pressure.

Data Availability Statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.gisaid.org/, EPI_ISL_3149974 up to EPI_ISL_3150017.

Ethics Statement

The studies involving human participants were reviewed and approved by Comitê de Ética em Pesquisa em Seres Humanos da Universidade Federal de Ciências da Saúde de Porto Alegre (CEP - UFCSPA) under process number CAAE 35083220.2.0000.5345. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author Contributions

CT and RZ: conceived the study. CT and PF: designed and performed the formal bioinformatics analyses. CT, PF, and RZ: interpretation of genomic data. CT and LR: resources, supervision, project administration related to the molecular, and bioinformatics analyses. AS and DF: resources related to the clinical samples. AG, CW, FC, RZ, and ZR: funding acquisition. CT, PF, and RZ: wrote the original draft. All authors contributed to the final reviewed manuscript, contributed to the article, and approved the submitted version.

Funding

The genome sequencing was funded by Kintor Pharmaceuticals, Ltd. PF fellowship is supplied by the Universidade Federal de Ciências da Saúde de Porto Alegre (UFCSPA).

Conflict of Interest

ZR is an employee of Kintor Pharmaceuticals, Ltd. AG is an employee of Applied Biology, Inc. FC has served as a clinical director for Applied Biology, Inc. CW has served as an advisor to Applied Biology, Inc.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We thank the administrators of the GISAID database and research groups across the world for supporting the rapid and transparent sharing of genomic data during the COVID-19 pandemic.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2022.806611/full#supplementary-material

Supplementary File 1. Sequencing depth and coverage plots for the forty-four sequenced genomes from this study.

Supplementary File 2. Frequency of mutations identified in the P.1 sequenced genomes (this study) compared with sequences from Amazonas, Brazil and World (up to September 12, 2021) available on GISAID.

Supplementary File 3. Collection dates (first occurrence) on GISAID database for SARS-CoV-2 mutations identified in the forty-four sequenced genomes from this study.

Supplementary File 4. Phylogenetic analysis of spike and non-structural proteins from ORF1a. (A) Spike, (B) NSP1, (C) NSP2, (D) NSP3, (E) NSP4, (F) NSP5, (G) NSP6, (H) NSP7, (I) NSP8, (J) NSP9, and (K) NSP10. Branch support values below 70 for both SH-aLRT and ultrafast bootstrap tests were not labeled in the trees.

Supplementary File 5. Sites under adaptive or purifying pressure according to the HyPhy tests FUBAR, FEL, MEME, and SLAC.

Supplementary File 6. Authors acknowledgement lists for GISAID genomes used in the genomic and phylogenomic analyses.

References

1. Franceschi VB, Ferrareze PAG, Zimerman RA, Cybis GB, Thompson CE. Mutation hotspots, geographical and temporal distribution of SARS-CoV-2 lineages in Brazil, February 2020 to February 2021. Virus Res. (2021) 304:198532. doi: 10.1016/j.virusres.2021.198532

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Golubchik T, Lythgoe KA, Hall M, Ferretti L, Fryer HR, MacIntyre-Cockett G . Early Analysis of a Potential Link Between Viral Load the N501Y Mutation in the SARS-COV-2 Spike Protein. Cold Spring Harbor Laboratory. (2021). Available online atat: http://dx.doi.org/10.1101/2021.01.12.20249080 (accessed July 19, 2021).

Google Scholar

3. Ferrareze PAG, Franceschi VB, Mayer A de M, Caldana GD, Zimerman RA, Thompson CE. E484K as an innovative phylogenetic event for viral evolution: genomic analysis of the E484K spike mutation in SARS-CoV-2 lineages from Brazil. Infect Genet Evol. (2021) 93:104941. doi: 10.1016/j.meegid.2021.104941

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Nelson G, Buzko O, Spilman P, Niazi K, Rabizadeh S, Soon-Shiong P. Molecular Dynamic Simulation Reveals E484K Mutation Enhances Spike RBD-ACE2 Affinity the Combination of E484K, K417N N501Y Mutations (501Y.V2 variant) Induces Conformational Change Greater Than N501Y Mutant Alone, Potentially Resulting in an Escape Mutant. Cold Spring Harbor Laboratory. (2021). Available online at: http://dx.doi.org/10.1101/2021.01.13.426558 (accessed July 19, 2021).

Google Scholar

5. Ferrareze PAG, Zimerman RA, Franceschi VB, Thompson CE. Molecular evolution and structural analyses of the spike glycoprotein from Brazilian SARS-CoV-2 genomes: the impact of the fixation of selected mutations. Cold Spring Harbor Lab. (2021) 2021.07.16.452571. doi: 10.1101/2021.07.16.452571

CrossRef Full Text | Google Scholar

6. Fourati S, Decousser J-W, Khouider S, N'Debi M, Demontant V, Trawinski E, et al. Novel SARS-CoV-2 Variant Derived from Clade 19B, France. Emerging Infectious Diseases Journal. Available online at: https://wwwnc.cdc.gov/eid/article/27/5/21-0324_article#suggestedcitation (accessed July 15, 2021).

Google Scholar

7. Guo K, Barrett BS, Mickens KL, Hasenkrug KJ, Santiago ML. Interferon Resistance of Emerging sars-cov-2 Variants. Cold Spring Harbor Laboratory. (2021). Available online at: http://dx.doi.org/10.1101/2021.03.20.436257 (accessed October 29, 2021).

Google Scholar

8. Naveca FG, Nascimento V, de Souza VC, Corado A de L, Nascimento F, Silva G, et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nat Med. (2021) 27:1230–8. doi: 10.1038/s41591-021-01378-7

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Chen P, Nirula A, Heller B, Gottlieb RL, Boscia J, Morris J, et al. SARS-CoV-2 neutralizing antibody LY-CoV555 in outpatients with Covid-19. N Engl J Med. (2021) 384:229–37. doi: 10.1056/NEJMoa2029849

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Vakulenko Y, Deviatkin A, Drexler JF, Lukashev A. Modular evolution of coronavirus genomes. Viruses. (2021) 13:1270. doi: 10.3390/v13071270

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Tegally H, Wilkinson E, Giovanetti M, Iranzadeh A, Fonseca V, Giandhari J, et al. Detection of a SARS-CoV-2 variant of concern in South Africa. Nature. (2021) 592:438–43. doi: 10.1038/s41586-021-03402-9

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Corman VM, Landt O, Kaiser M, Molenkamp R, Meijer A, Chu DK, et al. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance. (2020) 25:2000045. doi: 10.2807/1560-7917.ES.2020.25.3.2000045

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. (2012) 9:357–9. doi: 10.1038/nmeth.1923

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. (2009) 25:2078–9. doi: 10.1093/bioinformatics/btp352

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. (2011) 27:2987–93. doi: 10.1093/bioinformatics/btr509

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. (2013) 30:772–80. doi: 10.1093/molbev/mst010

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Louis. Laduplessis/SARS-CoV-2_Guangdong_Genomic_Epidemiology: Initial Release Zenodo (2020). doi: 10.5281/zenodo.3922606

CrossRef Full Text | Google Scholar

18. Okonechnikov K, Golosova O, Fursov M. Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. (2012) 28:1166–7. doi: 10.1093/bioinformatics/bts091

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Martin DP, Murrell B, Golden M, Khoosal A, Muhire B. RDP4: detection and analysis of recombination patterns in virus genomes. Virus Evol. (2015) 1:vev003. doi: 10.1093/ve/vev003

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Martin D, Rybicki E. RDP: detection of recombination amongst aligned sequences. Bioinformatics. (2000) 16:562–3. doi: 10.1093/bioinformatics/16.6.562

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Padidam M, Sawyer S, Fauquet CM. Possible emergence of new geminiviruses by frequent recombination. Virology. (1999) 265:218–25. doi: 10.1006/viro.1999.0056

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Posada D, Crandall KA. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci. (2001) 98:13757–62. doi: 10.1073/pnas.241370698

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Maynard Smith J. Analyzing the mosaic structure of genes. J Mol Evol. (1992) 34:126–9. doi: 10.1007/BF00182389

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Lam HM, Ratmann O, Boni MF. Improved algorithmic complexity for the 3SEQ recombination detection algorithm. Mol Biol Evol. (2018) 35:247–51. doi: 10.1093/molbev/msx263

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Martin DP, Posada D, Crandall KA, Williamson C. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res Hum Retrov. (2005) 21:98–102. doi: 10.1089/aid.2005.21.98

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Gibbs MJ, Armstrong JS, Gibbs AJ. Sister-Scanning: a Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics. (2000) 16:573–82. doi: 10.1093/bioinformatics/16.7.573

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Holmes EC, Worobey M, Rambaut A. Phylogenetic evidence for recombination in dengue virus. Mol Biol and Evol. (1999) 16:405–9. doi: 10.1093/oxfordjournals.molbev.a026121

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Weiller GF. Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. Mol Biol Evol. (1998) 15:326–35. doi: 10.1093/oxfordjournals.molbev.a025929

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Jia L, Hu F, Li H, Li L, Tang X, Liu Y, et al. Characterization of small genomic regions of the hepatitis B virus should be performed with more caution. Virol J. (2018) 15:1–8. doi: 10.1186/s12985-018-1100-x

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Fei D, Guo Y, Fan Q, Wang H, Wu J, Li M, et al. Phylogenetic and recombination analyses of two deformed wing virus strains from different honeybee species in China. PeerJ. (2019) 7:e7214. doi: 10.7717/peerj.7214

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Babin Y. Recan: python tool for analysis of recombination events in viral genomes. J Open Source Softw. (2020) 5:2014. doi: 10.21105/joss.02014

CrossRef Full Text | Google Scholar

32. Darriba D, Posada D, Kozlov AM, Stamatakis A, Morel B, Flouri T. ModelTest-NG: a new and scalable tool for the selection of DNA and protein evolutionary models. Mol Biol Evol. (2019) 37:291–4. doi: 10.1093/molbev/msz189

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. (2014) 32:268–74. doi: 10.1093/molbev/msu300

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Kosakovsky Pond SL, Frost SDW, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics. (2005) 21:676–9. doi: 10.1093/bioinformatics/bti079

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Murrell B, Moola S, Mabona A, Weighill T, Sheward D, Kosakovsky Pond SL, et al. FUBAR: a fast, unconstrained bayesian approximation for inferring selection. Mol Biol Evol. (2013) 30:1196–205. doi: 10.1093/molbev/mst030

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Kosakovsky Pond SL, Frost SDW. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol. (2005) 22:1208–22. doi: 10.1093/molbev/msi105

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Kosakovsky Pond SL. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. (2012) 8:e1002764. doi: 10.1371/journal.pgen.1002764

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. (2018) 34:4121–3. doi: 10.1093/bioinformatics/bty407

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. (2010) 59:307–21. doi: 10.1093/sysbio/syq010

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Minh BQ, Nguyen MAT, von Haeseler A. Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol. (2013) 30:1188–95. doi: 10.1093/molbev/mst024

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Faria NR, Mellan TA, Whittaker C, Claro IM, Candido D da S, Mishra S, et al. Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science. (2021) 372:815–21. doi: 10.1126/science.abh2644

PubMed Abstract | CrossRef Full Text | Google Scholar

42. McCallum M, De Marco A, Lempp FA, Tortorici MA, Pinto D, Walls AC, et al. N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2. Cell. (2021) 184:2332–47.e16. doi: 10.1016/j.cell.2021.03.028

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Kemp SA, Meng B, Ferriera IA, Datir R, Harvey WT, Papa G . Recurrent Emergence Transmission of a SARS-CoV-2 Spike Deletion H69/V70. Cold Spring Harbor Laboratory. (2020). Available online at: http://dx.doi.org/10.1101/2020.12.14.422555 (accessed October 29, 2021).

Google Scholar

44. Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol. (2021) 19:409–24. doi: 10.1038/s41579-021-00573-0

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Tian F, Tong B, Sun L, Shi S, Zheng B, Wang Z, et al. N501Y Mutation of Spike Protein in SARS-CoV-2 Strengthens Its Binding to Receptor ACE2. eLife Sciences Publications, Ltd (2021) 10:e69091. doi: 10.7554/eLife.69091

PubMed Abstract | CrossRef Full Text | Google Scholar

46. González-Puelma J, Aldridge J, Montes de Oca M, Pinto M, Uribe-Paredes R, Fernández-Goycoolea J, et al. Mutation in a SARS-CoV-2 haplotype from sub-antarctic chile reveals new insights into the spike's dynamics. Viruses. (2021) 13:883. doi: 10.3390/v13050883

PubMed Abstract | CrossRef Full Text | Google Scholar

47. Jiang H-W, Zhang H-N, Meng Q-F, Xie J, Li Y, Chen H, et al. SARS-CoV-2 Orf9b suppresses type I interferon responses by targeting TOM70. Cell Mol Immunol. (2020) 17:998–1000. doi: 10.1038/s41423-020-0514-8

PubMed Abstract | CrossRef Full Text | Google Scholar

48. Gao X, Zhu K, Qin B, Olieric V, Wang M, Cui S. Crystal structure of SARS-CoV-2 Orf9b in complex with human TOM70 suggests unusual virus-host interactions. Nat Commun. (2021) 12:1–9. doi: 10.1038/s41467-021-23118-8

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Schubert K, Karousis ED, Jomaa A, Scaiola A, Echeverria B, Gurzeler L-A, et al. SARS-CoV-2 Nsp1 binds the ribosomal mRNA channel to inhibit translation. Nat Struct Mol Biol. (2020) 27:959–66. doi: 10.1038/s41594-020-0511-8

PubMed Abstract | CrossRef Full Text | Google Scholar

50. Min Y-Q, Mo Q, Wang J, Deng F, Wang H, Ning Y-J. SARS-CoV-2 nsp1: bioinformatics, potential structural and functional features, and implications for drug/vaccine designs. Front Microbiol. (2020) 11:587317. doi: 10.3389/fmicb.2020.587317

PubMed Abstract | CrossRef Full Text | Google Scholar

51. Osipiuk J, Azizi S-A, Dvorkin S, Endres M, Jedrzejczak R, Jones KA, et al. Structure of papain-like protease from SARS-CoV-2 and its complexes with non-covalent inhibitors. Nat Commun. (2021) 12:1–9. doi: 10.1038/s41467-021-21060-3

PubMed Abstract | CrossRef Full Text | Google Scholar

52. Shin D, Mukherjee R, Grewe D, Bojkova D, Baek K, Bhattacharya A, et al. Papain-like protease regulates SARS-CoV-2 viral spread and innate immunity. Nature. (2020) 587:657–62. doi: 10.1038/s41586-020-2601-5

PubMed Abstract | CrossRef Full Text | Google Scholar

53. Lavigne M, Helynck O, Rigolet P, Boudria-Souilah R, Nowakowski M, Baron B, et al. SARS-CoV-2 Nsp3 unique domain SUD interacts with guanine quadruplexes and G4-ligands inhibit this interaction. Nucleic Acids Res. (2021) 49:7695–712. doi: 10.1093/nar/gkab571

PubMed Abstract | CrossRef Full Text | Google Scholar

54. Lee J, Worrall LJ, Vuckovic M, Rosell FI, Gentile F, Ton A-T, et al. Crystallographic structure of wild-type SARS-CoV-2 main protease acyl-enzyme intermediate with physiological C-terminal autoprocessing site. Nat Commun. (2020) 11:1–9. doi: 10.1038/s41467-020-19662-4

PubMed Abstract | CrossRef Full Text | Google Scholar

55. Biswal M, Diggs S, Xu D, Khudaverdyan N, Lu J, Fang J, et al. Two conserved oligomer interfaces of NSP7 and NSP8 underpin the dynamic assembly of SARS-CoV-2 RdRP. Nucleic Acids Res. (2021) 49:5956–66. doi: 10.1093/nar/gkab370

PubMed Abstract | CrossRef Full Text | Google Scholar

56. Kirchdoerfer RN, Ward AB. Structure of the SARS-CoV nsp12 polymerase bound to nsp7 and nsp8 co-factors. Nat Commun. (2019) 10:1–9. doi: 10.1038/s41467-019-10280-3

PubMed Abstract | CrossRef Full Text | Google Scholar

57. Littler DR, Gully BS, Colson RN, Rossjohn J. Crystal structure of the SARS-CoV-2 non-structural protein 9, Nsp9. iScience. (2020) 23:101258. doi: 10.1016/j.isci.2020.101258

PubMed Abstract | CrossRef Full Text | Google Scholar

58. Krafcikova P, Silhan J, Nencka R, Boura E. Structural analysis of the SARS-CoV-2 methyltransferase complex involved in RNA cap creation bound to sinefungin. Nat Commun. (2020) 11:1–7. doi: 10.1038/s41467-020-17495-9

PubMed Abstract | CrossRef Full Text | Google Scholar

59. Rambaut A. Recombinant SARS-CoV-2 genomes involving lineage B.1.1.7 in the UK. Virological. (2021). Available online at: https://virological.org/t/recombinant-sars-cov-2-genomes-involving-lineage-b-1-1-7-in-the-uk/658 (accessed September 30, 2021).

Google Scholar

60. Wertheim JO, Steel M, Sanderson MJ. Accuracy in near-perfect virus phylogenies. Syst Biol. (2021) syab069. doi: 10.1101/2021.05.06.442951

PubMed Abstract | CrossRef Full Text | Google Scholar

61. Bukin YS, Bondaryuk AN, Kulakova NV, Balakhonov SV, Dzhioev YP, Zlobin VI. Phylogenetic reconstruction of the initial stages of the spread of the SARS-CoV-2 virus in the Eurasian and American continents by analyzing genomic data. Virus Res. (2021) 305:198551. doi: 10.1016/j.virusres.2021.198551

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: COVID-19, SARS-CoV-2, P.1, VOC, genomic epidemiology, phylogenomics, spike, NSP

Citation: Zimerman RA, Ferrareze PAG, Cadegiani FA, Wambier CG, Fonseca DdN, de Souza AR, Goren A, Rotta LN, Ren Z and Thompson CE (2022) Comparative Genomics and Characterization of SARS-CoV-2 P.1 (Gamma) Variant of Concern From Amazonas, Brazil. Front. Med. 9:806611. doi: 10.3389/fmed.2022.806611

Received: 01 November 2021; Accepted: 12 January 2022;
Published: 15 February 2022.

Edited by:

Sanjay Kumar, Armed Forces Medical College Pune, India

Reviewed by:

Teemu Smura, University of Helsinki, Finland
Stephen S.-T. Yau, Tsinghua University, China

Copyright © 2022 Zimerman, Ferrareze, Cadegiani, Wambier, Fonseca, de Souza, Goren, Rotta, Ren and Thompson. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Zhihua Ren, emhyZW5Aa2ludG9yLmNvbS5jbg==; Claudia Elizabeth Thompson, Y3Rob21wc29uQHVmY3NwYS5lZHUuYnI=; dGhvbXBzb24udWZjc3BhQGdtYWlsLmNvbQ==

^†These authors share senior authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.