AUTHOR=Couto Evellyn G. O. , Chaves Saulo F. S. , Dias Kaio Olimpio G. , Morales-Marroquín Jonathan A. , Alves-Pereira Alessandro , Motoike Sérgio Yoshimitsu , Colombo Carlos Augusto , Zucchi Maria Imaculada TITLE=Training set optimization is a feasible alternative for perennial orphan crop domestication and germplasm management: an Acrocomia aculeata example JOURNAL=Frontiers in Plant Science VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1441683 DOI=10.3389/fpls.2024.1441683 ISSN=1664-462X ABSTRACT=

Orphan perennial native species are gaining importance as sustainability in agriculture becomes crucial to mitigate climate change. Nevertheless, issues related to the undomesticated status and lack of improved germplasm impede the evolution of formal agricultural initiatives. Acrocomia aculeata - a neotropical palm with potential for oil production - is an example. Breeding efforts can aid the species to reach its full potential and increase market competitiveness. Here, we present genomic information and training set optimization as alternatives to boost orphan perennial native species breeding using Acrocomia aculeata as an example. Furthermore, we compared three SNP calling methods and, for the first time, presented the prediction accuracies of three yield-related traits. We collected data for two years from 201 wild individuals. These trees were genotyped, and three references were used for SNP calling: the oil palm genome, de novo sequencing, and the A. aculeata transcriptome. The traits analyzed were fruit dry mass (FDM), pulp dry mass (PDM), and pulp oil content (OC). We compared the predictive ability of GBLUP and BayesB models in cross- and real validation procedures. Afterwards, we tested several optimization criteria regarding consistency and the ability to provide the optimized training set that yielded less risk in both targeted and untargeted scenarios. Using the oil palm genome as a reference and GBLUP models had better results for the genomic prediction of FDM, OC, and PDM (prediction accuracies of 0.46, 0.45, and 0.39, respectively). Using the criteria PEV, r-score and core collection methodology provides risk-averse decisions. Training set optimization is an alternative to improve decision-making while leveraging genomic information as a cost-saving tool to accelerate plant domestication and breeding. The optimized training set can be used as a reference for the characterization of native species populations, aiding in decisions involving germplasm collection and construction of breeding populations