Introduction

AUTHOR=Wang Fenghua , Jiang Jin , Chen Yu , Sun Zhexing , Tang Yuan , Lai Qinghui , Zhu Hailong 

TITLE=Rapid detection of Yunnan Xiaomila based on lightweight YOLOv7 algorithm

JOURNAL=Frontiers in Plant Science

VOLUME=Volume 14 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1200144

DOI=10.3389/fpls.2023.1200144

ISSN=1664-462X

ABSTRACT=<sec><title>Introduction</title><p>Real-time fruit detection is a prerequisite for using the Xiaomila pepper harvesting robot in the harvesting process.</p></sec><sec><title>Methods</title><p>To reduce the computational cost of the model and improve its accuracy in detecting dense distributions and occluded Xiaomila objects, this paper adopts YOLOv7-tiny as the transfer learning model for the field detection of Xiaomila, collects images of immature and mature Xiaomila fruits under different lighting conditions, and proposes an effective model called YOLOv7-PD. Firstly, the main feature extraction network is fused with deformable convolution by replacing the traditional convolution module in the YOLOv7-tiny main network and the ELAN module with deformable convolution, which reduces network parameters while improving the detection accuracy of multi-scale Xiaomila targets. Secondly, the SE (Squeeze-and-Excitation) attention mechanism is introduced into the reconstructed main feature extraction network to improve its ability to extract key features of Xiaomila in complex environments, realizing multi-scale Xiaomila fruit detection. The effectiveness of the proposed method is verified through ablation experiments under different lighting conditions and model comparison experiments.</p></sec><sec><title>Results</title><p>The experimental results indicate that YOLOv7-PD achieves higher detection performance than other single-stage detection models. Through these improvements, YOLOv7-PD achieves a mAP (mean Average Precision) of 90.3%, which is 2.2%, 3.6%, and 5.5% higher than that of the original YOLOv7-tiny, YOLOv5s, and Mobilenetv3 models, respectively, the model size is reduced from 12.7 MB to 12.1 MB, and the model’s unit time computation is reduced from 13.1 GFlops to 10.3 GFlops.</p></sec><sec><title>Discussion</title><p>The results shows that compared to existing models, this model is more effective in detecting Xiaomila fruits in images, and the computational complexity of the model is smaller.</p></sec>