Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. AI in Food, Agriculture and Water
Volume 7 - 2024 | doi: 10.3389/frai.2024.1312115

A HYBRID DEEP LEARNING-BASED APPROACH FOR OPTIMAL GENOTYPE BY ENVIRONMENT SELECTION A PREPRINT

Provisionally accepted

The final, formatted version of the article will be published soon.

    The ability to accurately predict crop yields based on genotype and weather variability is essential for developing climate-resilient cultivars. Genotype-environment interactions introduce significant variability in crop-climate responses, posing challenges for breeding programs. Data-driven approaches, particularly machine learning, offer a solution by incorporating genotype-environment interactions into yield predictions. Using a comprehensive dataset of 93,028 records from soybean hybrids across 159 locations, 28 states, and 13 years, which includes 5,838 unique genotypes and daily weather data over a 214-day growing season, we developed two convolutional neural network (CNN) models: a CNN model combining CNN and fully-connected layers, and a CNN-LSTM model that adds a long short-term memory (LSTM) layer after the CNN. To enhance prediction accuracy, we used the Generalized Ensemble Method (GEM) to integrate and optimize the weights of these CNN-based models. This dataset provided genotype-specific information, allowing us to explore genotype suitability across diverse weather conditions. Using the GEM model, we identified optimal genotypes for various environmental conditions, predicting yields for different genotypes in each setting. We evaluated GEM’s performance on unseen genotype-location combinations, simulating real-world conditions where new genotypes are introduced. Results showed that the GEM ensemble approach achieved higher predictive accuracy than the CNN-LSTM model alone and slightly better performance than the CNN model, with improvements measured by RMSE and MAE on validation and test sets. This method shows potential for genotype selection in situations with limited historical data. We also examined the effect of incorporating state-level soil data alongside weather, location, genotype, and year variables. Due to limitations in the data, such as the lack of latitude and longitude details, we used uniform soil variables for all locations within a state, restricting spatial specificity to state-level data. Our findings indicated that adding state-level soil information did not significantly improve model performance. Feature importance analysis using RMSE changes identified location as the most critical factor, followed by genotype and year. Among weather variables, maximum direct normal irradiance (MDNI) and average precipitation (AP) had the highest RMSE changes, highlighting their importance in yield prediction.

    Keywords: Convolutional Neural Network, Genotype selection, Crop yield prediction, Generalized ensemble method, genotype-environment interaction, Feature importance analysis

    Received: 10 Oct 2023; Accepted: 11 Nov 2024.

    Copyright: © 2024 Khalilzadeh, Kashanian, Khaki and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Zahra Khalilzadeh, Iowa State University, Ames, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.