Skip to main content

ORIGINAL RESEARCH article

Front. Environ. Sci.
Sec. Environmental Informatics and Remote Sensing
Volume 12 - 2024 | doi: 10.3389/fenvs.2024.1395337

Efficient Greenhouse Segmentation with Visual Foundation Models: Achieving More with Fewer Samples

Provisionally accepted
Yuxiang Lu Yuxiang Lu 1,2Jiahe Wang Jiahe Wang 1,2Dan Wang Dan Wang 3Tang Liu Tang Liu 1,4*
  • 1 State Key Laboratory of Resources and Environmental Information Systems, Institute of Geography Science and Natural Resources (CAS), Beijing, Beijing Municipality, China
  • 2 University of Chinese Academy of Sciences, Beijing, Beijing, China
  • 3 Provincial Geomatics Center of Jiangsu, Nanjing, China
  • 4 Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences (CAS), Beijing, China

The final, formatted version of the article will be published soon.

    The Vision Transformer (ViT) model based on self-supervised learning has achieved outstanding performance in the field of natural image segmentation, which substantiates its broad prospects in visual tasks. However, its performance has declined in the field of remote sensing due to the varying perspectives of remote sensing images and the unique optical properties of certain features, such as the translucency of greenhouses. Additionally, the high cost of training visual foundation model(VFM) also makes it difficult to deploy them from scratch for a specific scene. This study explores the feasibility of rapidly deploying visual foundation model on the new tasks, utilizing the embedding vectors generated by visual foundation model as prior knowledge to enhance the performance of traditional segmentation models. We discovered that the usage of embedding vectors could assist the visual foundation model in achieving rapid convergence as well as significantly improving segmentation accuracy and robustness, with the same amount of trainable parameters. Furthermore, our comparative experiments demonstrated that using only about 40% of the annotated samples can achieve or even exceed the performance of traditional segmentation models using all samples, which has important implications for reducing the reliance on manual annotation. Especially for greenhouse detection and management, our method significantly enhances the accuracy of greenhouse segmentation and reduces dependence on samples, helping the model adapt more quickly to different lighting conditions and enabling more precise monitoring of agricultural resources. This study not only proves the potential ability of visual foundation model in remote sensing tasks but also opens new avenues for the massive and diversified expansion of downstream tasks.

    Keywords: visual foundation model1, remote sensing downstream tasks2, greenhouse3, Deep Learning4, remote sensing foundation model5

    Received: 03 Mar 2024; Accepted: 12 Jul 2024.

    Copyright: © 2024 Lu, Wang, Wang and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Tang Liu, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences (CAS), Beijing, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.