AUTHOR=Novielli Pierfrancesco , Romano Donato , Pavan Stefano , Losciale Pasquale , Stellacci Anna Maria , Diacono Domenico , Bellotti Roberto , Tangaro Sabina TITLE=Explainable artificial intelligence for genotype-to-phenotype prediction in plant breeding: a case study with a dataset from an almond germplasm collection JOURNAL=Frontiers in Plant Science VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1434229 DOI=10.3389/fpls.2024.1434229 ISSN=1664-462X ABSTRACT=Background

Advances in DNA sequencing revolutionized plant genomics and significantly contributed to the study of genetic diversity. However, predicting phenotypes from genomic data remains a challenge, particularly in the context of plant breeding. Despite significant progress, accurately predicting phenotypes from high-dimensional genomic data remains a challenge, particularly in identifying the key genetic factors influencing these predictions. This study aims to bridge this gap by integrating explainable artificial intelligence (XAI) techniques with advanced machine learning models. This approach is intended to enhance both the predictive accuracy and interpretability of genotype-to-phenotype models, thereby improving their reliability and supporting more informed breeding decisions.

Results

This study compares several ML methods for genotype-to-phenotype prediction, using data available from an almond germplasm collection. After preprocessing and feature selection, regression models are employed to predict almond shelling fraction. Best predictions were obtained by the Random Forest method (correlation = 0.727 ± 0.020, an R2 = 0.511 ± 0.025, and an RMSE = 7.746 ± 0.199). Notably, the application of the SHAP (SHapley Additive exPlanations) values algorithm to explain the results highlighted several genomic regions associated with the trait, including one, having the highest feature importance, located in a gene potentially involved in seed development.

Conclusions

Employing explainable artificial intelligence algorithms enhances model interpretability, identifying genetic polymorphisms associated with the shelling percentage. These findings underscore XAI’s efficacy in predicting phenotypic traits from genomic data, highlighting its significance in optimizing crop production for sustainable agriculture.