Skip to main content

ORIGINAL RESEARCH article

Front. Neurorobot.
Volume 18 - 2024 | doi: 10.3389/fnbot.2024.1457843

LS-VIT: Vision Transformer for Action Recognition Based on Long and Short-term Temporal Difference

Provisionally accepted
Dong Chen Dong Chen 1,2Peisong Wu Peisong Wu 1Mingdong Chen Mingdong Chen 1*Mengtao Wu Mengtao Wu 1*Tao Zhang Tao Zhang 1*Chuanqi Li Chuanqi Li 1,2*
  • 1 Nanning Normal University, Nanning, China
  • 2 Guangxi Normal University, Guilin, Guangxi Zhuang Region, China

The final, formatted version of the article will be published soon.

    Over the past few years, a growing number of researchers have dedicated their efforts to focusing on temporal modeling. The advent of transformer-based methods has notably advanced the field of 2D image-based vision tasks. However, with respect to 3D video tasks such as action recognition, applying temporal transformations directly to video data significantly increases both computational and memory demands. This surge in resource consumption is due to the multiplication of data patches and the added complexity of self-aware computations.Accordingly, building efficient and precise 3D self-attentive models for video content represents as a major challenge for transformers. In our research, we introduce an Long and Short-term

    Keywords: Temporal ::::::::::: Difference ::::::: Vision ::::::::::::: Transformer :::::::::: (LS-VIT). This method Action recognition, Motion extraction, Temporal crossing fusion, vision Transformer, Deep learning :::::::::::: Transformer ::::::::: (LS-VIT) that adeptly captures spatio-temporal Self-Attention (SA) features. LSMD :::::::: LS-VIT

    Received: 01 Jul 2024; Accepted: 16 Oct 2024.

    Copyright: © 2024 Chen, Wu, Chen, Wu, Zhang and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Mingdong Chen, Nanning Normal University, Nanning, China
    Mengtao Wu, Nanning Normal University, Nanning, China
    Tao Zhang, Nanning Normal University, Nanning, China
    Chuanqi Li, Nanning Normal University, Nanning, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.