The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Neurorobot.
Volume 18 - 2024 |
doi: 10.3389/fnbot.2024.1484276
This article is part of the Research Topic Advancing Autonomous Robots: Challenges and Innovations in Open-World Scene Understanding View all articles
Improved object detection method for autonomous driving based on DETR
Provisionally accepted- 1 School of Information and Electronics Technology, Jiamusi University, Jiamusi, China
- 2 School of Materials Science and Engineering, Jiamusi University, Jiamusi, China
Object detection is a critical component in the development of autonomous driving technology and has demonstrated significant growth potential. To address the limitations of current techniques, this paper presents an improved object detection method for autonomous driving based on a Detection Transformer (DETR). First, we introduce a multi-scale feature and location information extraction method, which solves the inadequacy of the model for multi-scale object localization and detection. In addition, we developed a Transformer encoder based on the group axial attention mechanism. This allows for efficient attention range control in the horizontal and vertical directions while reducing computation, ultimately enhancing the inference speed. Furthermore, we propose a novel dynamic hyperparameter tuning training method based on Pareto efficiency, which coordinates the training state of the loss functions through dynamic weights, overcoming issues associated with manually setting fixed weights and enhancing model convergence speed and accuracy. Experimental results demonstrate that the proposed method surpasses others, with improvements of 3.3%, 4.5% and 3% in Average Precision on the COCO, PASCAL VOC and KITTI datasets, respectively, and an 84% increase in FPS.
Keywords: object detection, feature extraction, transformer encoder, Loss function, parameter tuning
Received: 21 Aug 2024; Accepted: 30 Dec 2024.
Copyright: © 2024 Zhao, Zhang, Peng, Lu and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Huaqi Zhao, School of Information and Electronics Technology, Jiamusi University, Jiamusi, China
Zhengguang Lu, School of Information and Electronics Technology, Jiamusi University, Jiamusi, China
Guojing Li, School of Materials Science and Engineering, Jiamusi University, Jiamusi, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.