AUTHOR=Aviles Toledo Claudia , Crawford Melba M. , Tuinstra Mitchell R. TITLE=Integrating multi-modal remote sensing, deep learning, and attention mechanisms for yield prediction in plant breeding experiments JOURNAL=Frontiers in Plant Science VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1408047 DOI=10.3389/fpls.2024.1408047 ISSN=1664-462X ABSTRACT=

In both plant breeding and crop management, interpretability plays a crucial role in instilling trust in AI-driven approaches and enabling the provision of actionable insights. The primary objective of this research is to explore and evaluate the potential contributions of deep learning network architectures that employ stacked LSTM for end-of-season maize grain yield prediction. A secondary aim is to expand the capabilities of these networks by adapting them to better accommodate and leverage the multi-modality properties of remote sensing data. In this study, a multi-modal deep learning architecture that assimilates inputs from heterogeneous data streams, including high-resolution hyperspectral imagery, LiDAR point clouds, and environmental data, is proposed to forecast maize crop yields. The architecture includes attention mechanisms that assign varying levels of importance to different modalities and temporal features that, reflect the dynamics of plant growth and environmental interactions. The interpretability of the attention weights is investigated in multi-modal networks that seek to both improve predictions and attribute crop yield outcomes to genetic and environmental variables. This approach also contributes to increased interpretability of the model's predictions. The temporal attention weight distributions highlighted relevant factors and critical growth stages that contribute to the predictions. The results of this study affirm that the attention weights are consistent with recognized biological growth stages, thereby substantiating the network's capability to learn biologically interpretable features. Accuracies of the model's predictions of yield ranged from 0.82-0.93 R2ref in this genetics-focused study, further highlighting the potential of attention-based models. Further, this research facilitates understanding of how multi-modality remote sensing aligns with the physiological stages of maize. The proposed architecture shows promise in improving predictions and offering interpretable insights into the factors affecting maize crop yields, while demonstrating the impact of data collection by different modalities through the growing season. By identifying relevant factors and critical growth stages, the model's attention weights provide valuable information that can be used in both plant breeding and crop management. The consistency of attention weights with biological growth stages reinforces the potential of deep learning networks in agricultural applications, particularly in leveraging remote sensing data for yield prediction. To the best of our knowledge, this is the first study that investigates the use of hyperspectral and LiDAR UAV time series data for explaining/interpreting plant growth stages within deep learning networks and forecasting plot-level maize grain yield using late fusion modalities with attention mechanisms.