AUTHOR=Li Zongpeng , Zhou Xinguo , Cheng Qian , Zhai Weiguang , Mao Bohan , Li Yafeng , Chen Zhen TITLE=An integrated feature selection approach to high water stress yield prediction JOURNAL=Frontiers in Plant Science VOLUME=14 YEAR=2023 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1289692 DOI=10.3389/fpls.2023.1289692 ISSN=1664-462X ABSTRACT=

The timely and precise prediction of winter wheat yield plays a critical role in understanding food supply dynamics and ensuring global food security. In recent years, the application of unmanned aerial remote sensing has significantly advanced agricultural yield prediction research. This has led to the emergence of numerous vegetation indices that are sensitive to yield variations. However, not all of these vegetation indices are universally suitable for predicting yields across different environments and crop types. Consequently, the process of feature selection for vegetation index sets becomes essential to enhance the performance of yield prediction models. This study aims to develop an integrated feature selection method known as PCRF-RFE, with a focus on vegetation index feature selection. Initially, building upon prior research, we acquired multispectral images during the flowering and grain filling stages and identified 35 yield-sensitive multispectral indices. We then applied the Pearson correlation coefficient (PC) and random forest importance (RF) methods to select relevant features for the vegetation index set. Feature filtering thresholds were set at 0.53 and 1.9 for the respective methods. The union set of features selected by both methods was used for recursive feature elimination (RFE), ultimately yielding the optimal subset of features for constructing Cubist and Recurrent Neural Network (RNN) yield prediction models. The results of this study demonstrate that the Cubist model, constructed using the optimal subset of features obtained through the integrated feature selection method (PCRF-RFE), consistently outperformed the RNN model. It exhibited the highest accuracy during both the flowering and grain filling stages, surpassing models constructed using all features or subsets derived from a single feature selection method. This confirms the efficacy of the PCRF-RFE method and offers valuable insights and references for future research in the realms of feature selection and yield prediction studies.