AUTHOR=Novielli Pierfrancesco , Romano Donato , Pavan Stefano , Losciale Pasquale , Stellacci Anna Maria , Diacono Domenico , Bellotti Roberto , Tangaro Sabina 

TITLE=Explainable artificial intelligence for genotype-to-phenotype prediction in plant breeding: a case study with a dataset from an almond germplasm collection

JOURNAL=Frontiers in Plant Science

VOLUME=Volume 15 - 2024

YEAR=2024

URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1434229

DOI=10.3389/fpls.2024.1434229

ISSN=1664-462X

ABSTRACT=Background Advances in DNA sequencing revolutionized plant genomics and significantly contributed to the study of genetic diversity. However, predicting phenotypes from genomic data remains a challenge, particularly in the context of plant breeding. Explainable Artificial Intelligence (XAI) techniques offer promising avenues for deciphering genotype-phenotype relationships. Despite significant progress, accurately predicting phenotypes from high-dimensional genomic data remains a challenge, particularly in identifying the key genetic factors influencing these predictions. This study aims to bridge this gap by integrating explainable artificial intelligence (XAI) techniques with advanced machine learning models. This approach is intended to enhance both the predictive accuracy and interpretability of genotype-to-phenotype models, thereby improving their reliability and supporting more informed breeding decisions.This study compares several ML mehods for genotype-to-phenotype prediction, using data available from an almond germplasm collection. After preprocessing and feature selection, regression models are employed to predict almond shelling fraction. Best predictions were obtained by the Random Forest method (correlation = 0.727 ± 0.020, an R 2 = 0.511 ± 0.025, and an RMSE = 7.746 ± 0.199). Notably, the application of the SHAP (SHapley Additive exPlanations) values algorithm to explain the results highlighted several genomic regions associated with the trait, including one, having the highest feature importance, located in a gene potentially involved in seed development.Conclusions Employing explainable artificial intelligence algorithms enhances model interpretability, identifying identifying genetic polymorphisms associated with the shelling percentage. These findings underscore XAI's efficacy in predicting phenotypic traits from genomic data, highlighting its significance in optimizing crop production for sustainable agriculture.