AUTHOR=Jones Darcy , Fornarelli Roberta , Derbyshire Mark , Gibberd Mark , Barker Kathryn , Hane James TITLE=The pursuit of genetic gain in agricultural crops through the application of machine-learning to genomic prediction JOURNAL=Frontiers in Genetics VOLUME=14 YEAR=2023 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2023.1186782 DOI=10.3389/fgene.2023.1186782 ISSN=1664-8021 ABSTRACT=

Current practice in agriculture applies genomic prediction to assist crop breeding in the analysis of genetic marker data. Genomic selection methods typically use linear mixed models, but using machine-learning may provide further potential for improved selection accuracy, or may provide additional information. Here we describe SelectML, an automated pipeline for testing and comparing the performance of a range of linear mixed model and machine-learning-based genomic selection methods. We demonstrate the use of SelectML on an in silico-generated marker dataset which simulated a randomly-sampled (mixed) and an unevenly-sampled (unbalanced) population, comparing the relative performance of various methods included in SelectML on the two datasets. Although machine-learning based methods performed similarly overall to linear mixed models, they performed worse on the mixed dataset and marginally better on the unbalanced dataset, being more affected than linear mixed models by the imposed sampling bias. SelectML can assist in the training, comparison, and selection of genomic selection models, and is available from https://github.com/darcyabjones/selectml.