AUTHOR=Lin Yan-Jun , Ding Xiao-Ya , Huang Yi-Wei , Lu Lu TITLE=First De Novo genome assembly and characterization of Gaultheria prostrata JOURNAL=Frontiers in Plant Science VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1456102 DOI=10.3389/fpls.2024.1456102 ISSN=1664-462X ABSTRACT=

Gaultheria Kalm ex L. (Ericaceae), a type of evergreen shrub, known as a natural source of methyl salicylate, possesses rich germplasm resources, strong habitat adaptability, significant ornamental value, and noteworthy pharmacological activities. However, due to the paucity of whole genomic information, genetically deep research in these areas remains limited. Consequently, we intend to obtain genome data through high-throughput sequencing, gene annotation, flow cytometry, transcription factors prediction and genetic marker analysis for a representative species of this genus, with Gaultheria prostrata selected for our study. In this study, we preliminarily obtained the genome of G. prostrata through next-generation sequencing methods. Utilizing 47.94 Gb of high-quality sequence data (108.95× coverage), assembled into 114,436 scaffolds, with an N50 length of 33,667 bp. The genome size assembled by SOAPdenovo, approximately 417 Mb, corresponded closely to predictions by flow cytometry (440 Mb) and k-mer analysis (447 Mb). The genome integrity was evaluated using BUSCO with 91%. The heterozygosity ratio was 0.159%, the GC content was 38.85%, and the repetitive regions encompassed over 34.6% of the genome. A total of 26,497 protein-coding genes have been predicted and annotated across Nr, Swissprot, GO, KEGG, and Pfam databases. Among these, 14,377 and 2,387 genes received functional annotation in Nr and Swissprot, respectively; 21,895, 24,424, and 22,330 genes were similarly annotated in GO, KEGG, and Pfam. Moreover, A total of 279,785 SSRs were identified and 345,270 primers for these SSRs were designed. Within the various nucleotide types of SSRs, AG/CT and AAG/CTT constituted the predominant dinucleotide and trinucleotide repeat types in G. prostrata. In addition, 1,395 transcription factors (TFs) from 75 TF families, 462 transcription regulators (TRs) from 33 TR families and 840 protein kinase (PKs) from 118 PK families were identified in this genome. We also performed phylogenetic analyses of G. prostrata and related species, including estimation of divergence times and expansion and contraction analyses, followed by positive selection analyses of orthologous gene pairs of G. prostrata and its close relative Vaccinium corymbosum. These results provide a reference for in-depth study of genus Gaultheria, contributing to future functional and comparative genomics analyses and providing supporting data for the development of molecular markers.