Deep Feature Mining via the Attention-Based Bidirectional Long Short Term Memory Graph Convolutional Neural Network for Human Motor Imagery Recognition

Hou, Yimin; Jia, Shuyue; Lun, Xiangmin; Zhang, Shu; Chen, Tao; Wang, Fang; Lv, Jinglei

doi:10.3389/fbioe.2021.706229

ORIGINAL RESEARCH article

Front. Bioeng. Biotechnol., 11 February 2022

Sec. Bionics and Biomimetics

Volume 9 - 2021 | https://doi.org/10.3389/fbioe.2021.706229

Deep Feature Mining via the Attention-Based Bidirectional Long Short Term Memory Graph Convolutional Neural Network for Human Motor Imagery Recognition

Yimin Hou¹

Shuyue Jia²*

Xiangmin Lun^1,3

Shu Zhang⁴

Tao Chen¹

Fang Wang¹

Jinglei Lv⁵

¹School of Automation Engineering, Northeast Electric Power University, Jilin, China
²School of Computer Science, Northeast Electric Power University, Jilin, China
³College of Mechanical and Electric Engineering, Changchun University of Science and Technology, Changchun, China
⁴School of Computer Science, Northwestern Polytechnical University, Xi’an, China
⁵School of Biomedical Engineering and Brain and Mind Center, University of Sydney, Sydney, NSW, Australia

Recognition accuracy and response time are both critically essential ahead of building the practical electroencephalography (EEG)-based brain–computer interface (BCI). However, recent approaches have compromised either the classification accuracy or the responding time. This paper presents a novel deep learning approach designed toward both remarkably accurate and responsive motor imagery (MI) recognition based on scalp EEG. Bidirectional long short-term memory (BiLSTM) with the attention mechanism is employed, and the graph convolutional neural network (GCN) promotes the decoding performance by cooperating with the topological structure of features, which are estimated from the overall data. Particularly, this method is trained and tested on the short EEG recording with only 0.4 s in length, and the result has shown effective and efficient prediction based on individual and groupwise training, with 98.81% and 94.64% accuracy, respectively, which outperformed all the state-of-the-art studies. The introduced deep feature mining approach can precisely recognize human motion intents from raw and almost-instant EEG signals, which paves the road to translate the EEG-based MI recognition to practical BCI systems.

1 Introduction

Recently, the brain–computer interface (BCI) has played a promising role in assisting and rehabilitating patients from paralysis, epilepsy, and brain injuries via interpreting neural activities to control the peripherals (Bouton et al., 2016; Schwemmer et al., 2018). Among the noninvasive brain activity acquisition systems, electroencephalography (EEG)-based BCI has gained extensive attention recently given its higher temporal resolution and portability. Hence, it has been popularly employed to assist the recovery of patients from motor impairments, e.g., amyotrophic lateral sclerosis (ALS), spinal cord injury (SCI), or stroke survivors (Daly and Wolpaw, 2008; Pereira et al., 2018). Specifically, researchers have focused on the recognition of motor imagery (MI) based on EEG and translating brain activities into specific motor intentions. In such a way, users can further manipulate external devices or exchange information with the surroundings (Pereira et al., 2018). Although researchers have developed several MI-based prototype applications, there is still space for improvement before the practical clinical translation could be promoted (Schwemmer et al., 2018; Mahmood et al., 2019). De facto, to achieve effective and efficient control via only MI, both precise EEG decoding and quick response are eagerly expected. However, few existing works of literature are competent in both perspectives. In this study, we explore the possibility of a deep learning framework to tackle the challenge.

1.1 Related Work

Lately, deep learning (DL) has attracted increasing attention in many disciplines because of its promising performance in classification tasks (LeCun et al., 2015). A growing number of works have shown that DL will play a pivotal role in the precise decoding of brain activities (Schwemmer et al., 2018). Especially, recent works have been carried out on EEG motion intention detection. A primary current focus is to implement the DL-based approach to decode EEG MI tasks, which have attained promising results (Lotte et al., 2018). Due to the high temporal resolution of EEG signals, methods related to the recurrent neural network (RNN) (Rumelhart et al., 1986), which can analyze time-series data, were extensively applied to filter and classify EEG sequences, i.e., time points (Güler et al., 2005; Wang P et al., 2018; Luo et al., 2018; Zhang T et al., 2018; Zhang X et al., 2018). In reference to Zhang T et al. (2018), a novel RNN framework with spatial and temporal filtering was put forward to classify EEG signals for emotion recognition and achieved 95.4% accuracy for three classes with a 9-s segment as a sample. Yang et al. also proposed an emotion recognition method using long short-term memory (LSTM) (Yang J et al., 2020). Wang et al. and Luo et al. performed LSTM (Hochreiter and Schmidhuber, 1997) to handle signals of time slices and achieved 77.30% and 82.75% accuracy, respectively (Wang P et al., 2018; Luo et al., 2018). Zhang X et al. (2018) presented attention-based RNN for EEG-based person identification, which attained 99.89% accuracy for eight participants at the subject level with 4-s signals as a sample. LSTM was also employed in some medical fields, such as seizure detection (Hu et al., 2020), with the recorded EEG signals. However, it can be found that in these studies, signals over experimental duration were recognized as samples, which resulted in a slow responsive prediction.

Apart from RNN, the convolutional neural network (CNN) (Fukushima, 1980; LeCun et al., 1998) has been performed to decode EEG signals as well (Dose et al., 2018; Hou et al., 2020). Hou et al. proposed ESI and CNN and achieved competitive results, i.e., 94.50% and 96.00% accuracy at the group and subject levels, respectively, for four-class classification. What is more, by combining CNN with the graph theory, the graph convolutional neural network (GCN) (Bruna et al., 2014; Henaff et al., 2015; Duvenaud et al., 2015; Niepert et al., 2016; Defferrard et al., 2016) approach was presented lately, taking consideration of the functional topological relationship of EEG electrodes (Wang XH et al., 2018; Song et al., 2018; Zhang T et al., 2019; Wang et al., 2019). In reference to Wang XH et al. (2018) and Zhang T et al. (2019), a GCN with a broad learning approach was proposed and attained 93.66% and 94.24% accuracy, separately, for EEG emotion recognition. Song et al. and Wang et al. introduced dynamical GCN (90.40% accuracy) and phase-locking value-based GCN (84.35% accuracy) to recognize different emotions (Song et al., 2018; Wang et al., 2019). A highly accurate prediction has been accomplished via the GCN model. Few researchers have investigated the approach in the area of EEG MI decoding.

1.2 Contribution of This Paper

Toward accurate and fast MI recognition, an attention-based BiLSTM–GCN was introduced to mine the effective features from raw EEG signals. The main contributions were summarized as follows:

i) As far as we know, this work was the first that combined BiLSTM with the GCN to decode EEG tasks.

ii) The attention-based BiLSTM successfully derived relevant features from raw EEG signals. Followed by the GCN model, it enhanced the decoding performance by considering the internal topological structure of features.

iii) The proposed feature mining approach managed to decode EEG MI signals with stably reproducible results yielding remarkable robustness and adaptability that deals with the considerable intertrial and intersubject variability.

1.3 Organization of This Paper

The rest of this paper was organized as follows. The preliminary knowledge of the BiLSTM, attention mechanism, and GCN was introduced in the Methodology section, which was the foundation of the presented approach. In the Results and Discussion section, experimental details and numerical results were presented, followed by the conclusion in the Conclusion section.

2 Methodology

2.1 Pipeline Overview

The framework of the proposed method is presented in Figure 1.

i) The 64-channel raw EEG signals were acquired via the BCI 2000, and then the 4-s (experimental duration) signals were sliced into 0.4-s segments over time, where the dimension of each segment was 64 channels × 64 time steps.

ii) The attention-based BiLSTM was put forward to filter 64-channel (spatial information) and 0.4-s (temporal information) raw EEG data and derived features from the fully connected neurons.

iii) The Pearson, adjacency, and Laplacian matrices of overall features were introduced sequentially to represent the topological structure of features, i.e., as a graph. Followed by the features and its corresponding graph representation as the input, the GCN model was performed to classify four-class MI tasks.

FIGURE 1

FIGURE 1. The schematical overview consisted of the 64-channel raw electroencephalography (EEG) signal acquisition, the bidirectional long short-term memory (BiLSTM) with the attention model for feature extraction, and the graph convolutional neural network (GCN) model for classification.

2.2 Bidirectional Long Short Term Memory With Attention

2.2.1 Bidirectional Long Short Term Memory Model

RNN-based approaches have been extensively applied to analyze EEG time-series signals. An RNN cell, though alike a feedforward neural network, has connections pointing backward, which sends its output back to itself. The learned features of an RNN cell at time step t are influenced by not only the input signals x_(t) but also the output (state) at time step t − 1. This design mechanism dictates that RNN-based methods can handle sequential data, e.g., time point signals, by unrolling the network through time. The LSTM and gated recurrent unit (GRU) (Cho et al., 2014) are the most popular variants of the RNN-based approaches. In theProposed approachsection, the paper compared the performance of the welcomed models experimentally, and the BiLSTM with attention displayed in Figure 2 outperformed others due to better detection of the long-term dependencies of raw EEG signals.

i_{(t)} = σ (W_{x i}^{T} \cdot x_{(t)} + W_{h i}^{T} \cdot h_{(t - 1)} + b_{i}) (1)

f_{(t)} = σ (W_{x f}^{T} \cdot x_{(t)} + W_{h f}^{T} \cdot h_{(t - 1)} + b_{f}) (2)

o_{(t)} = σ (W_{x o}^{T} \cdot x_{(t)} + W_{h o}^{T} \cdot h_{(t - 1)} + b_{o}) (3)

g_{(t)} = \tanh (W_{x g}^{T} \cdot x_{(t)} + W_{h g}^{T} \cdot h_{(t - 1)} + b_{g}) (4)

c_{(t)} = f_{(t)} \otimes c_{(t - 1)} + i_{(t)} \otimes g_{(t)} (5)

y_{(t)} = h_{(t)} = o_{(t)} \otimes \tanh (c_{(t)}) (6)

FIGURE 2

FIGURE 2. Presented BiLSTM with the attention mechanism for feature extraction.

As illustrated in Figure 2, three kinds of gates manipulate and control the memories of EEG signals, namely, the input gate, forget gate, and output gate. Demonstrated by the i_(t), the input gate partially stores the information of x_(t) and controls which part of it should be added to the long-term state c_(t). The forget gate controlled by the f_(t) decides which piece of the c_(t) should be overlooked. The output gate, controlled by o_(t), allows which part of the information from c_(t) should be outputted, denoted as y_(t), known as the short-term state h_(t). Manipulated by the above gates, two kinds of states are stored. The long-term state c_(t) travels through the cell from left to right, dropping some memories at the forget gate and adding something new from the input gate. After that, the information passes through a nonlinear activation function, tanh activation function usually, and then it is filtered by the output gate. In such a way, the short-term state h_(t) is produced.

Eqs. 1–6 describe the procedure of an LSTM cell, where W and b are the weights and biases for different layers to store the memory and learn a generalized model, and σ is a nonlinear activation function, i.e., sigmoid function used in the experiments. For bidirectional LSTM, BiLSTM for short, the signals x_(t) are inputted from left to right for the forward LSTM cell. What is more, they are reversed and inputted into another LSTM cell, the backward LSTM. Thus, there are two output vectors, which store much more comprehensive information than a single LSTM cell. Then they are concatenated as the final output of the cell.

2.2.2 Attention Mechanism

The attention mechanism, imitated from the human vision, has a vital part to play in the field of computer vision (CV), natural language processing (NLP), and automatic speech recognition (ASR) (Bahdanau et al., 2014; Chorowski et al., 2015; Xu et al., 2015; Yang et al., 2016). Not all the signals contribute equally toward the classification. Hence, an attention mechanism s_(t) is jointly trained as a weighted sum of the output of the BiLSTM with attention based on the weights.

u_{(t)} = \tanh (W_{w} y_{(t)} + b_{w}) (7)

α_{(t)} = \frac{\exp (u_{(t)}^{⊤} u_{w})}{\sum_{t} \exp (u_{(t)}^{⊤} u_{w})} (8)

s_{(t)} = \sum_{t} α_{(t)} y_{(t)} (9)

u_(t) is a fully connected (FC) layer for learning features of y_(t), followed by a softmax layer which outputs a probability distribution α_(t). W_w, u_w, and b_w denote trainable weights and biases, respectively. It selects and extracts the most significant temporal and spatial information from y_(t) by multiplying α_(t) with regard to the contribution to the decoding tasks.

2.3 Graph Convolutional Neural Network

2.3.1 Graph Convolution

In the graph theory, a graph is presented by the graph Laplacian L. It is computed by the degree matrix D minus the adjacency matrix A, i.e., L = D − A. In this work, Pearson’s matrix P was utilized to measure the inner correlations among features.

P_{X, Y} = \frac{E ((X - μ_{X}) (Y - μ_{Y}))}{σ_{X} σ_{Y}} (10)

where X and Y are two variables regarding different features, ρ_X,Y is their correlation, σ_X and σ_Y are the standard deviations, and μ_X and μ_Y are the expectations. Besides, the adjacency matrix A is recognized as:

A = | P | - I (11)

where |P| is the absolute of Pearson’s matrix P, and $I \in R^{N \times N}$ is an identity matrix. In addition, the degree matrix D of the graph is computed as follows:

D_{i i} = \sum_{j = 1}^{N} A_{i j} (12)

Then the normalized graph Laplacian is computed as:

L = D - A = I_{N} - D^{- 1 / 2} A D^{- 1 / 2} (13)

It is then decomposed by the Fourier basis $U = [u_{0}, \dots, u_{N - 1}] \in R^{N \times N}$ . The graph Laplacian is described as L = UΛU^T, where $Λ = diag ([λ_{0}, \dots, λ_{N - 1}]) \in R^{N \times N}$ are the eigenvalues of L. The graph convolution is defined as:

y = g_{θ} (L) x = g_{θ} (U Λ U^{T}) x = U g_{θ} (Λ) U^{T} x (14)

in which g_θ is a nonparametric filter. Specifically, the operation is as follows:

y_{:, j}^{k} = σ (\sum_{i = 1}^{f_{k - 1}} U g_{θ} (Λ) U^{T} x_{:, i}^{k - 1}) (15)

in which $x^{k - 1} \in R^{N \times f_{k - 1}}$ denotes the signals, N is the number of vertices of the graph, f_k−1 and f_k are the numbers of input and output channels, respectively, and σ denotes a nonlinearity activation function. What is more, g_θ is approximated by the Chebyshev polynomials because it is not localized in space and very time-consuming (Hammond et al., 2011). The Chebyshev recurrent polynomial approximation is described as T_k(x) = 2xT_k−1(x) − T_k−2(x), T₀ = 1, T₁ = x. The filter can be presented as $g_{θ} (Λ) = \sum_{k = 0}^{K - 1} θ_{k} T_{k} (\tilde{Λ})$ , in which $θ \in R^{K}$ is a set of coefficients, and $T_{k} (\tilde{Λ}) \in R^{K}$ is the kth-order polynomial at $\tilde{Λ} = 2 Λ / λ_{m a x} - I_{n}$ , and I_n ∈ (−1, 1) is a diagonal matrix of the scaled eigenvalues. The convolution can be rewritten as:

y = \sum_{k = 0}^{K - 1} θ_{k} L^{k} x (16)

2.3.2 Graph Pooling

The graph pooling operation can be achieved via the Graclus multilevel clustering algorithm, which consists of node clustering and one-dimensional pooling (Dhillon et al., 2007). A greedy algorithm was implemented to compute the successive coarser of a graph and minimize the clustering objective, from which the normalized cut was chosen (Shi and Malik, 2000). Through such a way, meaningful neighborhoods on graphs were acquired. Defferrard et al. (2016) proposed to carry out a balanced binary tree to store the neighborhoods, and a one-dimensional pooling was then applied for precise dimensionality reduction.

2.4 Proposed Approach

The presented approach was a combination of the attention-based BiLSTM and the GCN, as illustrated in Figure 1. The BiLSTM with the attention mechanism was presented to derive relevant features from raw EEG signals. During the procedure, features were obtained from neurons at the FC layer. In Figure 3, we demonstrated the topological connections of the Subject Nine’s features via the Pearson Matrix, Absolute Pearson Matrix, Adjacency Matrix, and Laplacian Matrix. The GCN was then applied to classify the extracted features. It was the combination of two models that promoted and enhanced the decoding performance by a significant margin compared with existing studies. Details were provided in the following.

FIGURE 3

FIGURE 3. The Pearson, absolute Pearson, adjacency, and Laplacian matrices for subject nine. (A) Pearson matrix for subject nine. (B) Absolute Pearson matrix for subject nine. (C) Adjacency matrix for subject nine. (D) Laplacian matrix for subject nine.

First of all, an optimal RNN-based model was explored to obtain relevant features from raw EEG signals. As shown in Figure 4, in this work, the BiLSTM with the attention model was best performed, which achieved 77.86% global average accuracy (GAA). The input size x_(t) of the model was 64, denoting 64 channels (electrodes) of raw EEG signals. The maximum time t was chosen as 64, which was a 0.4-s segment. According to Figures 4A, B, higher accuracy has been obtained while increasing the number of cells of the BiLSTM model. It should, however, be noted in Figure 3F that when there were more than 256 cells, the loss showed an upward trend, which indicated the concern of overfitting due to the increment of the model complexity. As a result, 256 LSTM cells (76.67% GAA) were chosen to generalize the model. Meanwhile, it was apparent that, in Figure 4C, as for the linear size of the attention weights, the majority of the choices did not make a difference. Thus, eight neurons, with 79.40% GAA, were applied during the experiments empirically. Comparing Figures 4D, H, it showed that a compromise solution should be applied, which took into consideration both performance and input size of the GCN. As a result, a linear size of 64 (76.73% GAA) was utilized at the FC layer.

FIGURE 4

FIGURE 4. Comparison of models and hyperparameters w.r.t. the recurrent neural network (RNN)-based methods for feature extraction. (A) Global average accuracy (GAA) w.r.t. RNN-based models. (B) GAA w.r.t. BiLSTM cell size. (C) GAA w.r.t. attention size of the BiLSTM. (D) GAA w.r.t. the number of the extracted features. (E) Loss w.r.t. RNN-based models. (F) Loss w.r.t. BiLSTM cell size. (G) Loss w.r.t. attention size of the BiLSTM. (H) Loss w.r.t. the number of the extracted features.

Besides, to prevent overfitting, a 25% dropout (Srivastava et al., 2014) for the BiLSTM and FC layer was implemented. The model carried out batch normalization (BN) (Ioffe and Szegedy, 2015) for the FC layer, which was activated by the softplus function (Hahnloser et al., 2000). The L2 norm with the 1 × 10⁻⁷ coefficient was applied to the Euclidean distance as the loss function. A total of 1,024 batch sizes were used to maximize the usage of GPU resources. The 1 × 10⁻⁴ learning rate was applied to the Adam optimizer (Kingma and Ba, 2014).

Furthermore, the second-order Chebyshev polynomial was applied to approximate convolutional filters in the experiments. The GCN consisted of six graph convolutional layers with 16, 32, 64, 128, 256, and 512 filters, respectively, each followed by a graph max-pooling layer, and a softmax layer derived the final prediction.

In addition, for the GCN model, before the nonlinear softplus activation function, BN was utilized at all of the layers except the final softmax. The 1 × 10⁻⁷ L2 norm was added to the loss function, which was a cross-entropy loss. Stochastic gradient descent (Zhang, 2004) with 16 batch sizes was optimized by the Adam (1 × 10⁻⁷ learning rate).

All the experiments above were performed and implemented by the Google TensorFlow (Abadi et al., 2016) 1.14.0 under NVIDIA RTX 2080ti and CUDA10.0.

3 Results and Discussion

3.1 Description of the Dataset

The data collected from the EEG Motor Movement/Imagery Dataset (Goldberger et al., 2000) was employed in this study. Numerous EEG trials were acquired from 109 participants performing four MI tasks, i.e., imagining the left fist (L), the right fist (R), both fists (B), and both feet (F) (21 trials per task). Each trial is a 4-s experiment duration (160 Hz sample rate) with one single task (Hou et al., 2020). In this work, a 0.4-s temporal segment of 64 channel signals, i.e., 64 channels × 64 time points, was regarded as a sample. In the Groupwise prediction section, we used a group of 20 subject data (S₁ − S₂₀) to train and validate our method. The 10-fold cross-validation was carried out. Further, 50 subjects (S₁ − S₅₀) were selected to verify the repeatability and stability of our approach. In the Subject-specific adaptation section, the dataset of individual subjects (S₁ − S₁₀) was utilized to perform subject-level adaptation. For all the experiments, the dataset was randomly divided into 10 parts, where 90% was the training set, and the remaining 10% was regarded as the test set. In the Groupwise prediction section, the above procedure has been carried out 10 times. Thus, it left us 10 results of 10-fold cross-validation.

3.2 Groupwise Prediction

It was suggested that intersubject variability remains one of the concerns for interpreting EEG signals (Tanaka, 2020). First, a small group size (20 subjects) was adopted for groupwise prediction. In Figure 4A, 63.57% GAA was achieved by the BiLSTM model. After applying the attention mechanism, it enhanced the decoding performance, which accomplished 77.86% GAA (14.29% improvement). Further, we employed an attention-based BiLSTM–GCN model in this work. It attained 94.64% maximum GAA (Hou et al., 2020) (31.07% improvement compared with the BiLSTM model) and 93.04% median accuracy from 10-fold cross-validation. Our method promoted the classification capability under subject variability and variations by taking the topological relationship of relevant features into consideration. Meanwhile, as illustrated in Figure 5A, the median values of GAA, kappa, precision, recall, and F1 score were 93.04%, 90.71%, 93.02%, 93.01%, and 92.99%, respectively. To the knowledge of the authors, the proposed method has achieved the best state-of-the-art performance in group-level prediction. Besides, remarkable results of 10-fold cross-validation have verified the repeatability and stability. Furthermore, the confusion matrix of test one (94.64% GAA) was given in Figure 5B. Accuracies of 91.69%, 92.11%, 94.48%, and 100% were obtained for each task. It can be observed that our method was unprecedentedly effective and efficient in detecting human motion intents from raw EEG signals.

FIGURE 5

FIGURE 5. Box plot and confusion matrix for 10-fold cross-validation. (A) Box plot for repetitive experiments. (B) Confusion matrix for test one.

By grouping signals from additional 30 subjects (in total 50 subjects), the robustness of the method has been validated in Figure 6.

FIGURE 6

FIGURE 6. GAA and receiver operating characteristic curve (ROC curve) for 20 and 50 subjects, separately. (A) GAA w.r.t. groupwise prediction. (B) ROC curve w.r.t. groupwise prediction.

Toward practical EEG-based BCI applications, it is essential to develop a robust model to counter serious individual variability (Tanaka, 2020). Figure 6A illustrates the GAA of our method through iterations. As listed in Figure 6B, we can see that 94.64% and 91.40% GAA were obtained with regard to the group of 20 and 50 subjects, respectively. The area under the curves (AUCs) were 0.964 and 0.943. Indicated by the above results, the presented approach can successfully filter the distinctions of signals, even though the dataset was extended. In other words, by increasing the intersubject variability, the robustness and effectiveness of the method were evaluated.

The comparison of groupwise evaluation was demonstrated, measured by the maximum of GAA (Hou et al., 2020) during experiments (Ma et al., 2018; Hou et al., 2020). Here, we compared the performance of several state-of-the-art methods in Table 1.

TABLE 1

TABLE 1. Comparison on groupwise evaluation.

Table 1 lists the performance of related methods. Hou et al. achieved competitive results. However, our method obtained higher performance (0.14% accuracy improvement) even with doubling the number of subjects. It can be found that our approach has outperformed those by giving the highest accuracy of decoding EEG MI signals.

3.3 Subject-Specific Adaptation

The performance of individual adaptation has witnessed a flourishing increment (Dose et al., 2018; Amin et al., 2019; Zhang R et al., 2019; Ji et al., 2019; Ortiz-Echeverri et al., 2019; Sadiq et al., 2019; Taran and Bajaj, 2019; Hou et al., 2020). The results of our method on subject-level adaptation have been reviewed in Table 2, and we compared the results in Table 3.

TABLE 2

TABLE 2. Subject-level evaluation.

TABLE 3

TABLE 3. Comparison of current studies on subject-level prediction.

Results are given in Table 2, from which the highest GAA was 98.81% achieved by subjects S₇ and S₉, and the lowest was 90.48% by S₄. On average, the presented approach can handle the challenge of subject-specific adaptation. It achieved competitive results, with an average accuracy of 95.48%. Moreover, Cohen’s kappa coefficient (kappa), precision, recall, and F1 score were 93.94%, 95.50%, 95.61%, and 95.35%, respectively. The promising results above indicated that the introduced method filtered raw EEG signals and succeeded in classifying MI tasks.

As can be seen from Figure 7A, the model has been shown to converge for the subject-specific adaptation. The receiver operating characteristic curve (ROC curve) with its corresponding AUC is visible in Figure 7B.

FIGURE 7

FIGURE 7. Loss and ROC curve for subject-level evaluation. (A) Loss w.r.t. subject-level validation. (B) ROC curve w.r.t. subject-level validation.

The comparison of subject-level prediction was put forward between the presented approach and the competitive models (Dose et al., 2018; Amin et al., 2019; Zhang R et al., 2019; Ji et al., 2019; Ortiz-Echeverri et al., 2019; Sadiq et al., 2019; Taran and Bajaj, 2019; Hou et al., 2020). The attention-based BiLSTM–GCN approach has achieved highly accurate results, which suggested robustness and effectiveness toward EEG signal processing, as shown in Table 3.

The presented approach has improved classification accuracy and obtained state-of-the-art results. The reason for the outstanding performance was that the attention-based BiLSTM model managed to extract relevant features from raw EEG signals. The followed GCN model successfully classified features by cooperating with the topological relationship of overall features.

4 Conclusion

To address the challenge of intertrial and intersubject variability in EEG signals, an innovative approach of attention-based BiLSTM–GCN was proposed to accurately classify four-class EEG MI tasks, i.e., imagining the left fist, the right fist, both fists, and both feet. First of all, the BiLSTM with the attention model succeeded in extracting relevant features from raw EEG signals. The followed GCN model intensified the decoding performance by cooperating with the internal topological relationship of relevant features, which were estimated from Pearson’s matrix of the overall features. Besides, results provided compelling evidence that the method has converged to both the subject-level and groupwise predictions and achieved the best state-of-the-art performance, i.e., 98.81% and 94.64% accuracy, respectively, for handling individual variability, which were far ahead of existing studies. The 0.4-s sample size was proven effective and efficient in prediction compared with the traditional 4-s trial length, which means that our proposed framework can provide a time-resolved solution toward fast response. Results on a group of 20 subjects were derived by 10-fold cross-validation, indicating repeatability and stability. The proposed method is predicted to advance the clinical translation of the EEG MI-based BCI technology to meet the diverse demands, such as of paralyzed patients. In summary, the unprecedented performance with the highest accuracy and time-resolved prediction were fulfilled via the introduced feature mining approach.

In addition, the proposed method in this paper could be potentially applied in relevant practical directions, such as digital neuromorphic computing to assist movement disorder (Yang et al., 2018; Yang et al., 2019; Yang S et al., 2020; Yang et al., 2021).

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.physionet.org/content/eegmmidb/1.0.0/.

Author Contributions

YH conceived and designed the research. SJ and XL collected the data and conducted the research. SZ, TC, FW, and JL interpreted the results. All authors contributed to the writing and revisions.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors would like to thank the Brain Team at Google for developing TensorFlow. We further acknowledge PhysioNet for open-sourcing the EEG Motor Movement/Imagery Dataset to promote the research.

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). “Tensorflow: A System for Large-Scale Machine Learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI-16), Savannah, GA, November 2–4, 2016, 265–283.

Google Scholar

Amin, S. U., Alsulaiman, M., Muhammad, G., Bencherif, M. A., and Hossain, M. S. (2019). Multilevel Weighted Feature Fusion Using Convolutional Neural Networks for Eeg Motor Imagery Classification. IEEE Access 7, 18940–18950. doi:10.1109/access.2019.2895688

Deep Feature Mining via the Attention-Based Bidirectional Long Short Term Memory Graph Convolutional Neural Network for Human Motor Imagery Recognition

1 Introduction

1.1 Related Work

1.2 Contribution of This Paper

1.3 Organization of This Paper

2 Methodology

2.1 Pipeline Overview

2.2 Bidirectional Long Short Term Memory With Attention

2.2.1 Bidirectional Long Short Term Memory Model

2.2.2 Attention Mechanism

2.3 Graph Convolutional Neural Network

2.3.1 Graph Convolution

2.3.2 Graph Pooling

2.4 Proposed Approach

3 Results and Discussion

3.1 Description of the Dataset

3.2 Groupwise Prediction

3.3 Subject-Specific Adaptation

4 Conclusion

Data Availability Statement

Author Contributions

Conflict of Interest

Publisher’s Note

Acknowledgments

References

94% of researchers rate our articles as excellent or good