Skip to main content

ORIGINAL RESEARCH article

Front. Plant Sci.
Sec. Technical Advances in Plant Science
Volume 15 - 2024 | doi: 10.3389/fpls.2024.1408047
This article is part of the Research Topic UAVs for Crop Protection: Remote Sensing, Prescription Mapping and Precision Spraying View all articles

Integrating Multi-Modal Remote Sensing, Deep Learning, and Attention Mechanisms for Yield Prediction in Plant Breeding Experiments

Provisionally accepted
  • Purdue University, West Lafayette, United States

The final, formatted version of the article will be published soon.

    In this study, a multi-modal deep learning architecture that assimilates inputs from heterogeneous data streams, including high-resolution hyperspectral imagery, LiDAR point clouds, and environmental data, to forecast maize crop yields, is proposed. The architecture includes attention mechanisms that assign varying levels of importance to different modalities and temporal features, reflecting the dynamics of plant growth and environmental interactions. The interpretability of the attention weights is investigated in multi-modal networks that seek to both improve predictions and attribute crop yield outcomes to genetic and environmental variables. This approach also contributes to increased interpretability of the model's predictions. The temporal attention weight distributions were examined to identify relevant factors and critical growth stages that contribute to the predictions. The results of this study affirm that the attention weights are consistent with recognized biological growth stages, thereby substantiating the network's capability to learn biologically interpretable features. Accuracies of the model's predictions of yield ranged from 0.82-0.93 R 2 ref in the genetics-focused study, further highlighting the potential of attention-based models. The primary objective of this research is to explore and evaluate the potential contributions of deep learning network architectures that employ stacked LSTM for end-of-season maize grain yield prediction. A secondary aim is to expand the capabilities of these networks by adapting them to better accommodate and leverage the multi-modality properties of remote sensing data. Further, this research facilitates understanding of how multimodality remote sensing aligns with the physiological stages of maize. In both plant breeding and crop management, interpretability plays a crucial role in instilling trust in AI-driven approaches and enabling the provision of actionable insights. To the best of our knowledge, this is the first study that investigates the use of hyperspectral and LiDAR UAV time series data for explaining/interpreting plant growth stages within deep learning networks and forecasting plot-level maize grain yield using late fusion modalities with attention mechanisms.

    Keywords: hyperspectral, lidar, Stacked LSTM, attention mechanisms, Multi-modal networks, Yield prediction, precision agriculture

    Received: 27 Mar 2024; Accepted: 04 Jul 2024.

    Copyright: © 2024 Aviles Toledo, Crawford and Tuinstra. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Claudia E. Aviles Toledo, Purdue University, West Lafayette, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.