The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Comput. Neurosci.
Volume 18 - 2024 |
doi: 10.3389/fncom.2024.1508297
This article is part of the Research Topic Deep Spiking Neural Networks: Models and Learning Algorithms View all articles
Spike-HAR++: An Energy-efficient and Lightweight Parallel Spiking Transformer for Event-based Human Action Recognition
Provisionally accepted- 1 Tsinghua University, Beijing, China
- 2 Fudan University, Shanghai, Shanghai Municipality, China
Event-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the event cameras due to their spike-based event-driven paradigm, with less power consumption compared to artificial neural networks. In this paper, we propose two end-to-end SNNs, namely Spike-HAR and Spike-HAR++, to introduce spiking transformer into event-based HAR. Spike-HAR includes two novel blocks: \textcolor{blue}{a spike attention branch}, which enables model to focus on regions with high spike rates, reducing the impact of noise to improve the accuracy, and a parallel spike transformer block with simplified spiking self-attention mechanism, increasing computational efficiency. To better extract crucial information from high-level features, we modify the architecture of the spike \textcolor{blue}{attention branch} and extend it in Spike-HAR to a higher dimension, proposing Spike-HAR++ to further enhance classification performance. Comprehensive experiments were conducted on four HAR datasets: SL-Animals-DVS, N-LSA64, DVS128 Gesture and DailyAction-DVS, to demonstrate the superior performance of our proposed model. Additionally, the proposed Spike-HAR and Spike-HAR++ require only 0.03 mJ and 0.06 mJ, respectively, to process a sequence of event frames, with model sizes of only 0.7M and 1.8M. This efficiency positions it as a promising new SNN baseline for the HAR community.
Keywords: Spiking Neural network, Human action recognition, transformer, attention branch, event-based vision
Received: 09 Oct 2024; Accepted: 04 Nov 2024.
Copyright: © 2024 Lin, Liu and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Hong Chen, Tsinghua University, Beijing, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.