Skip to main content

ORIGINAL RESEARCH article

Front. Robot. AI
Sec. Robot Vision and Artificial Perception
Volume 11 - 2024 | doi: 10.3389/frobt.2024.1469588
This article is part of the Research Topic Perceiving the World, Planning the Future: Advanced Perception and Planning Technology in Robotics View all articles

Enhanced Outdoor Visual Localization using Py-Net Voting Segmentation Approach

Provisionally accepted
Jing Wang Jing Wang *Cheng Guo Cheng Guo Shaoyi Hu Shaoyi Hu *Yibo Wang Yibo Wang *Xuhui Fan Xuhui Fan *
  • Xi'an University of Science and Technology, Xi'an, China

The final, formatted version of the article will be published soon.

    Camera relocalization determines the position and orientation of a camera in a 3D space. Although methods based on scene coordinate regression yield highly accurate results in indoor scenes, they exhibit poor performance in outdoor scenarios due to their large scale and increased complexity. A visual localization method, Py-Net, is therefore proposed herein. Py-Net is based on voting segmentation and comprises a main encoder containing Py-layer and two branch decoders. The Pylayer comprises pyramid convolution and 1 × 1 convolution kernels for feature extraction across multiple levels, with fewer parameters to enhance the model's ability to extract scene information. Coordinate attention was added at the end of the encoder for feature correction, which improved the model robustness to interference. To prevent the feature loss caused by repetitive structures and lowtexture images in the scene, deep over-parameterized convolution modules were incorporated into the seg and vote decoders. Landmark segmentation and voting maps were used to establish the relation between images and landmarks in 3D space, reducing anomalies and achieving high precision with a small number of landmarks. The experimental results show that, in multiple outdoor scenes, Py-Net achieves lower distance and angle errors compared to existing methods. Additionally, compared to VS-Net, which also uses a voting segmentation structure, Py-Net reduces the number of parameters by 31.85% and decreases the model size from 236MB to 170MB.

    Keywords: Camera relocalization, Coordinate attention, pyramidal convolution, landmark segmentation map, landmark voting map

    Received: 24 Jul 2024; Accepted: 10 Sep 2024.

    Copyright: © 2024 Wang, Guo, Hu, Wang and Fan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Jing Wang, Xi'an University of Science and Technology, Xi'an, China
    Shaoyi Hu, Xi'an University of Science and Technology, Xi'an, China
    Yibo Wang, Xi'an University of Science and Technology, Xi'an, China
    Xuhui Fan, Xi'an University of Science and Technology, Xi'an, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.