The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Neuroinform.
Volume 19 - 2025 |
doi: 10.3389/fninf.2025.1526259
This article is part of the Research Topic Advanced Neural Network Modeling in Neuroscience and Brain Research View all 4 articles
An action decoding framework combined with Deep Neural Network for predicting the semantics of human actions in videos from evoked brain activities
Provisionally accepted- University of Science and Technology Beijing, Beijing, China
Recently, numerous studies have focused on the semantic decoding of perceived images based on functional magnetic resonance imaging (fMRI) activities. However, it remains unclear whether it is possible to establish relationships between brain activities and semantic features of human actions in video stimuli. Here we construct a framework for decoding action semantics by establishing relationships between brain activities and semantic features of human actions. To effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image action recognition network model based on an expanding three-dimensional (X3D) deep neural network framework (DNN). To apply brain activities to the image action recognition network, we train regression models that learn the relationship between brain activities and deep-layer image features. To improve decoding accuracy, we join by adding the nonlocal-attention mechanism module to the X3D model to capture long-range temporal and spatial dependence, proposing a multilayer perceptron (MLP) module of multi-task loss constraint to build a more accurate regression mapping approach and performing data enhancement through linear interpolation to expand the amount of data to reduce the impact of a small sample.Our findings indicate that the features in the X3D-DNN are biologically relevant, and capture information useful for perception. The proposed method enriches the semantic decoding model. We have also conducted several experiments with data from different subsets of brain regions known to process visual stimuli. The results suggest that semantic information for human actions is widespread across the entire visual cortex.
Keywords: functional magnetic resonance imaging, Decoding, Action semantic, three-dimension convolutional neural network, multi-subject model
Received: 11 Nov 2024; Accepted: 30 Jan 2025.
Copyright: © 2025 Zhang, Tian and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Baolin Liu, University of Science and Technology Beijing, Beijing, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.