Skip to main content

ORIGINAL RESEARCH article

Front. Neurorobot.
Volume 18 - 2024 | doi: 10.3389/fnbot.2024.1518878
This article is part of the Research Topic Multi-modal Learning with Large-scale Models View all 8 articles

A Scalable Multi-modal Learning Fruit Detection Algorithm for Dynamic Environments

Provisionally accepted
  • 1 Shenzhen Polytechnic, Shenzhen, China
  • 2 University of Science and Technology Liaoning, Anshan, Liaoning, China

The final, formatted version of the article will be published soon.

    This paper proposes a method for enhancing the detection of litchi fruits in natural scenes, addressing challenges such as dense occlusion and small target identification. The approach is built on the YOLOv5s network model. Initially, the Neck layer network of YOLOv5s is simplified by changing its FPN+PAN structure to an FPN structure and increasing the number of detection heads from 3 to 5. Additionally, the detection heads with resolutions of 80 × 80 pixels and 160 × 160 pixels are replaced by TSCD detection heads to enhance the model?s capability for detecting small targets. Subsequently, the positioning loss function is replaced with the EIoU loss function, and the confidence loss is substituted by VFLoss to further improve the accuracy of the detection bounding box and reduce the missed detection rate in occluded targets. A sliding slice method is then employed to predict image targets, thereby reducing the miss rate of small targets.Experimental results demonstrate that the proposed model improves accuracy, recall, and mean average precision (mAP) by 9.5, 0.9, and 12.3 percentage points, respectively, compared to the original YOLOv5s model. When benchmarked against other models such as YOLOx, YOLOv6, and YOLOv8, the proposed model's AP value increases by 4.0, 6.3, and 3.7 percentage points, respectively. Therefore, this method significantly enhances the detection accuracy of mature litchi fruits and effectively addresses the challenges of dense occlusion and small target detection, providing crucial technical support for subsequent litchi yield estimation.

    Keywords: multi-modal learning, machine learning, Fruit recognition, deep learning, Objective detection

    Received: 29 Oct 2024; Accepted: 29 Nov 2024.

    Copyright: © 2024 Mao, Guo, Liu, Li, Wang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Linlin Wang, Shenzhen Polytechnic, Shenzhen, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.