Skip to main content

ORIGINAL RESEARCH article

Front. Comput. Neurosci.
Volume 18 - 2024 | doi: 10.3389/fncom.2024.1404623
This article is part of the Research Topic The mutual promotion of Control Science and Neuroscience View all articles

MULTI-LABEL REMOTE SENSING CLASSIFICATION WITH SELF-SUPERVISED GATED MULTI-MODAL TRANSFORMERS

Provisionally accepted
Na Liu Na Liu 1Ye Yuan Ye Yuan 1Guodong Wu Guodong Wu 2*Sai Zhang Sai Zhang 2*Jie Leng Jie Leng 1*Lihong Wan Lihong Wan 2*
  • 1 University of Shanghai for Science and Technology, Shanghai, China
  • 2 Origin Dynamics Intelligent Robot Co.,Ltd., Zhengzhou, Henan Province, China

The final, formatted version of the article will be published soon.

    With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of selfsupervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of "pre-training and fine-tuning" paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly. In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control(MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information. After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified.

    Keywords: Self-supervised learning, Pre-training, vision Transformer, Multi-Modal, gated units

    Received: 21 Mar 2024; Accepted: 03 Sep 2024.

    Copyright: © 2024 Liu, Yuan, Wu, Zhang, Leng and Wan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Guodong Wu, Origin Dynamics Intelligent Robot Co.,Ltd., Zhengzhou, Henan Province, China
    Sai Zhang, Origin Dynamics Intelligent Robot Co.,Ltd., Zhengzhou, Henan Province, China
    Jie Leng, University of Shanghai for Science and Technology, Shanghai, China
    Lihong Wan, Origin Dynamics Intelligent Robot Co.,Ltd., Zhengzhou, Henan Province, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.