Skip to main content

ORIGINAL RESEARCH article

Front. Mar. Sci.

Sec. Ocean Observation

Volume 12 - 2025 | doi: 10.3389/fmars.2025.1522160

Semantic segmentation of underwater images based on the improved SegFormer

Provisionally accepted
  • 1 Harbin Engineering University, Harbin, China
  • 2 Qingdao Innovation and Development Base, Harbin Engineering University, Qingdao, Shandong Province, China
  • 3 Jilin Agricultural Science and Technology College, Jilin, Jilin, China
  • 4 Jilin Agriculture University, Changchun, Jilin Province, China
  • 5 College of Agriculture, Yanbian University, Yan Ji, China

The final, formatted version of the article will be published soon.

    Underwater images segmentation is essential for tasks such as underwater exploration, marine environmental monitoring, and resource development. Nevertheless, given the complexity and variability of the underwater environment, improving model accuracy remains a key challenge in underwater image segmentation tasks. To address these issues, this study presents a highperformance semantic segmentation approach for underwater images based on the standard SegFormer model. First, the Mix Transformer backbone in SegFormer is replaced with a Swin Transformer to enhance feature extraction and facilitate efficient acquisition of global context information. Next, the Efficient Multi-scale Attention (EMA) mechanism is introduced in the backbone's downsampling stages and the decoder to better capture multi-scale features, further improving segmentation accuracy. Furthermore, a Feature Pyramid Network (FPN) structure is incorporated into the decoder to combine feature maps at multiple resolutions, allowing the model to integrate contextual information effectively, enhancing robustness in complex underwater environments. Testing on the SUIM underwater image dataset shows that the proposed model achieves high performance across multiple metrics: mean Intersection over Union (MIoU) of 77.00%, mean Recall (mRecall) of 85.04%, mean Precision (mPrecision) of 89.03%, and mean F1score (mF1score) of 86.63%. Compared to the standard SegFormer, it demonstrates improvements of 3.73% in MIoU, 1.98% in mRecall, 3.38% in mPrecision, and 2.44% in mF1score, with an increase of 9.89M parameters. The results demonstrate that the proposed method achieves superior segmentation accuracy with minimal additional computation, showcasing high performance in underwater image segmentation.

    Keywords: Underwater Images, Semantic segmentation, attention mechanism, Feature fusion, SegFormer

    Received: 04 Nov 2024; Accepted: 21 Feb 2025.

    Copyright: © 2025 Chen, Zhao, Zhang, Li, Qi and Tang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Mingyang Qi, Jilin Agricultural Science and Technology College, Jilin, Jilin, China
    You Tang, Jilin Agricultural Science and Technology College, Jilin, Jilin, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

    Research integrity at Frontiers

    Man ultramarathon runner in the mountains he trains at sunset

    94% of researchers rate our articles as excellent or good

    Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


    Find out more