Skip to main content

ORIGINAL RESEARCH article

Front. Plant Sci.
Sec. Technical Advances in Plant Science
Volume 15 - 2024 | doi: 10.3389/fpls.2024.1452821
This article is part of the Research Topic Machine-Learning Implementation to Analyze Plant-Associated Microbiomes and Their Contribution to Improving Crop Production View all 11 articles

An Improved YOLOv7 Model Based on Swin Transformer and Trident Pyramid Networks for Accurate Tomato Detection

Provisionally accepted
Guoxu Liu Guoxu Liu 1Yonghui Zhang Yonghui Zhang 1Jun Liu Jun Liu 2Deyong Liu Deyong Liu 3Chunlei Chen Chunlei Chen 1Yujie Li Yujie Li 1Xiujie Zhang Xiujie Zhang 1Philippe Lyonel Touko Mbouembe Philippe Lyonel Touko Mbouembe 4*
  • 1 School of Computer Engineering, Weifang University, Weifang, China
  • 2 Shandong Provincial Hospital, Jinan, China
  • 3 School of Computer Science, Weifang University of Science and Technology, Weifang, China
  • 4 Department of Electronics Engineering, Pusan National University, Busan, Republic of Korea

The final, formatted version of the article will be published soon.

    Accurate fruit detection is crucial for automated fruit picking. However, real-world scenarios, influenced by complex environmental factors such as illumination variations, occlusion, and overlap, pose significant challenges to accurate fruit detection. These challenges subsequently impact the commercialization of fruit harvesting robots. A tomato detection model named YOLO-SwinTF, based on YOLOv7, is proposed to address these challenges. Integrating Swin Transformer (ST) blocks into the backbone network enables the model to capture global information by modeling long-range visual dependencies. Trident Pyramid Networks (TPN) are introduced to overcome the limitations of PANet's focus on communication-based processing.TPN incorporates multiple self-processing (SP) modules within existing top-down and bottom-up architectures, allowing feature maps to generate new findings for communication. In addition, Focaler-IoU is introduced to reconstruct the original intersection-over-union (IoU) loss to allow the loss function to adjust its focus based on the distribution of difficult and easy samples. The proposed model is evaluated on a tomato dataset, and the experimental results demonstrated that the proposed model's detection recall, precision, F1 score, and AP reach 96.27%, 96.17%, 96.22%, and 98.67%, respectively. These represent improvements of 1.64%, 0.92%, 1.28%, and 0.88% compared to the original YOLOv7 model. When compared to other state-of-theart detection methods, this approach achieves superior performance in terms of accuracy while maintaining comparable detection speed. In addition, the proposed model exhibits strong robustness under various lighting and occlusion conditions, demonstrating its significant potential in tomato detection.

    Keywords: Tomato detection, YOLOv7, swin transformer, trident pyramid network, focaler-iou

    Received: 21 Jun 2024; Accepted: 26 Aug 2024.

    Copyright: © 2024 Liu, Zhang, Liu, Liu, Chen, Li, Zhang and Touko Mbouembe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Philippe Lyonel Touko Mbouembe, Department of Electronics Engineering, Pusan National University, Busan, Republic of Korea

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.