Skip to main content

ORIGINAL RESEARCH article

Front. Aging Neurosci., 18 July 2022
Sec. Neurocognitive Aging and Behavior
This article is part of the Research Topic Functional and Structural Brain Network Construction, Representation and Application View all 47 articles

A Deep Spatiotemporal Attention Network for Mild Cognitive Impairment Identification

\nQuan Feng&#x;Quan Feng1Yongjie Huang&#x;Yongjie Huang2Yun Long&#x;Yun Long3Le Gao
Le Gao2*Xin Gao
Xin Gao4*
  • 1State Key Laboratory of Public Big Data, GuiZhou University, Guizhou, China
  • 2Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, China
  • 3Nanjing Huayin Medical Laboratory Co., Ltd., Nanjing, China
  • 4Department of PET/MR, Universal Medical Imaging Diagnostic Center, Shanghai, China

Mild cognitive impairment (MCI) is a nervous system disease, and its clinical status can be used as an early warning of Alzheimer's disease (AD). Subtle and slow changes in brain structure between patients with MCI and normal controls (NCs) deprive them of effective diagnostic methods. Therefore, the identification of MCI is a challenging task. The current functional brain network (FBN) analysis to predict human brain tissue structure is a new method emerging in recent years, which provides sensitive and effective medical biomarkers for the diagnosis of neurological diseases. Therefore, to address this challenge, we propose a novel Deep Spatiotemporal Attention Network (DSTAN) framework for MCI recognition based on brain functional networks. Specifically, we first extract spatiotemporal features between brain functional signals and FBNs by designing a spatiotemporal convolution strategy (ST-CONV). Then, on this basis, we introduce a learned attention mechanism to further capture brain nodes strongly correlated with MCI. Finally, we fuse spatiotemporal features for MCI recognition. The entire network is trained in an end-to-end fashion. Extensive experiments show that our proposed method significantly outperforms current baselines and state-of-the-art methods, with a classification accuracy of 84.21%.

1. Introduction

Alzheimer's disease (AD) is an irreversible degenerative brain disease, and it is also one of the most common forms of dementia (Raju et al., 2020). AD usually occurs in the later years of human life. According to previous statistics released by the Global Health Organization, the global prevalence of AD reached a staggering 26.6 million in 2006, and this statistic will double every 20 years (Brookmeyer et al., 2007). In the future in 2046, 1.2% of the global population will be at risk of developing AD. Recent studies have shown that the prediction of mild cognitive impairment (MCI) is helpful for the early diagnosis of AD (Morris et al., 2001; Association, 2019). Because MCI is a clinical state between the normal population and patients with AD, it has a high probability of developing AD (Kang et al., 2020). In addition, individuals with features of MCI almost always have neuropathological features of AD (Morris et al., 2001). In medicine, MCI is a nervous system disease with the main symptoms being mild memory impairment and mild executive function impairment with additional visuospatial deficits (Gauthier et al., 2006). These symptoms are usually not life-threatening and can be detected by the patient or his or her family members. Current research suggests that MCI may not be a simple disease, but an early manifestation of AD disease (Ithapu et al., 2015). Therefore, MCI can often be regarded as an ideal clinical test subject for predicting AD disease. With the in-depth study of MCI, people can anticipate their risk for AD earlier and take preventive and therapeutic measures, such as taking oral medications to improve cognition (Roberson and Mucke, 2006) and changing their daily routine (Zubatiy et al., 2021). Since the variation between MCI and normal population is very subtle and slow (Association, 2019). Therefore, the prediction of MCI is a challenging task. After years of research, a large number of machine learning-based diagnostic methods have been developed for MCI identification (Li et al., 2017, 2019), which can be briefly classified into the following two types:

1) Based on traditional machine learning methods, which mainly use traditional machine learning techniques to model MCI data into a binary classification problem. For example, in Zhang et al. (2011), the authors capture and combine biomedical pattern features from different modalities with the help of multi-kernel support vector machines for predicting patients with MCI. In Liu et al. (2013b), authors adjusted the distribution of MCI-specific classes for MCI identification by a graph partitioning algorithm. In Liu et al. (2013a), the authors embedded high-dimensional neuro-imaging data into a low-dimensional space and exploited local sparse code gradients to test the data to further enhance the classification of MCI. Due to their strong reliance on prior-knowledge, these methods have strict dataset requirements, making it difficult to generalize in practical applications.

2) Deep learning based methods, mainly use the deep convolutional neural network (CNN) features to extract hidden features in neuroimaging data for MCI identification. For example, in Amoroso et al. (2017), authors designed a multiplexed neural network to model structural brain connectivity atrophy for the classification of MCI and normal controls (NC). In Yue et al. (2018), authors designed a 2DCNN framework to capture the most useful features in the gray matter of sMRI for MCI identification. In Puranik et al. (2018), the authors designed a deep 2DCNN framework and utilized the transfer method for AD, MCI, and NC classification. However, since these methods seldom consider the temporality of the existence of relevant data sets, their classification accuracy may be suboptimal.

Despite the success of these methods, the identification of MCI is still a difficult problem. Excitingly, the functional brain network (FBN) has become an important method for modeling brain neural time courses, which provides an effective imaging biomarker for the diagnosis of MCI (Bray et al., 2021). A large number of medical experiments have found that the functional connections between brain regions, voxels and ROIs in FBN are highly correlated with some diseases such as nerves or MCI (Greicius, 2008; McKhann et al., 2011). Therefore, learning FBN based on time series correlation can provide more accurate and stable test results for MCI identification (Li et al., 2020b, 2021). In this article, FBN defines the nodes as brain regions, and the edge between these regions is determined by the relationship between their blood-oxygen-level dependent (BOLD) time series recorded by fMRI. In recent years, with the rise of deep graph convolutional networks, state-of-the-art performance has been achieved in their applications in different fields, such as social networks (Dowlagar and Mamidi, 2021; Liu et al., 2022), computer vision (Han et al., 2021; Zou and Tang, 2021), and gene prediction (Yu et al., 2021; Peng et al., 2022). Meanwhile, deep graph convolutional networks have also achieved satisfactory success in disease prediction tasks (Tang et al., 2021; Yu et al., 2021). Specifically, as shown in Figure 1, we first design a space-time convolution strategy (ST-CONV) to extract time-series features and structural features between brain functional signals and brain nodes. Then, we introduce an attention mechanism to further capture brain nodes that are more correlated with MCI. Finally, we fuse time series features and structural features (i.e., spatiotemporal features) for MCI identification. The whole network is trained in an end-to-end manner. Extensive experiments demonstrate that our proposed method is significantly competitive compared with the current baselines and the state-of-the-art methods. We summarize our main contributions as follows:

• A deep learning framework for MCI identification is proposed, which provides a new way for MCI identification.

• A new fusion mechanism is designed, which extracts the spatiotemporal features of brain functional signals and FBN, and applies its fusion to MCI identification.

• We used our DSTAN to distinguish MCI from NC and achieved a classification accuracy of 84.21%, which is superior to baseline and the most advanced methods.

The rest of this article is organized as follows: In Section 2, we introduce materials. In Section 3, we infer the DSTAN network. Section 4 reports the experimental results, and Section 5 discusses and looks forward to the full text.

FIGURE 1
www.frontiersin.org

Figure 1. Deep Spatiotemporal Attention Network (DSTAN) structure illustration. Spatiotemporal convolution strategy (ST-CONV) represents spatiotemporal convolution, Node-ATT Module represents Attention module of brain functional Node, Attention represents brain node attention map and FC represents full connection layer. n represents the number of brain nodes, and T, T′ represents the number of time points of functional brain signals. c′ is the number of channels.

2. Dataset

2.1. Data Acquisition

In this article, we use the same data set as Qiao et al. (2016). The data set consisted of 45 patients with MCI and 46 NC subjects, and these data were static Functional Magnetic Resonance Imaging (fMRI) images. At the same time, the data set can be obtained from the MCI database (https://www.nitrc.org/projects/modularbrain/), in which Table 1 is a summary of the demographic information of the subjects.

TABLE 1
www.frontiersin.org

Table 1. Demographic information of subjects.

2.2. Data Pre-Processing

In this section, we use fMRI images obtained by the standard echo planar imaging sequence function in the 3T scanner (TRIO, Siemens). During fMRI imaging, the parameters are set as follows: the voxel thickness is 2.97 × 2.97 × 3mm3, the number of slices is 45, acquisition matrix size is 74 × 74, and TRTE=3,00030ms with 180 volumes. In addition, we use Statistical Parametric Mapping (SPM)2 and DPARSFA (version 2.2) for image pre-processing (Yan and Zang, 2010). In the pre-processing process, we discard the first 10 fMRI images of the subjects uniformly in order to prevent signal jitter. Then, we process the remaining fMRI images in the following steps: In step 1, we adopt a correction strategy for slice acquisition timing and head motion. In step 2, we remove the low and high-frequency artifacts in the corrected image and further regress out nuisance signals based on Friston et al. (1996). In step 3, we discard the time points with frame-wise displacement >0.5 to reduce the influence of micro-head movements on functional connectivity. On this basis, we divide the preprocessed BOLD time series signals into 90 ROIs according to the standard of automatic anatomical labeling (AAL) atlas. In step 3, we store these time series data of length 80 into a matrix of size X ∈ ℝ80×90.

3. Method

In this section, we design a DSTAN network, which captures spatiotemporal features by fusing temporal and spatial features of functional brain signals, and uses an attention mechanism to capture brain nodes related to MCI. Specifically, Section 3.1 formalizes the problem definition. Section 3.2 describes how to extract temporary and structural features in functional brain signals and functional brain networks. In Section 3.3, the attention mechanism is used to capture MCI related brain nodes. In Section 3.4, spatiotemporal features are fused. The objective function of DSTAN is defined in Section 3.5.

3.1. Problem Definition

Suppose the data set is D={f(x,t)h,yh}h=1N, N denotes the number of samples, f(x, t)h denotes the feature vector of the h-th sample, where th ∈ {t1, t2, …, tT}, xh ∈ {x1, x2, …, xn}, and yh ∈ {0, 1} is the corresponding label, and 0 and 1 represent “normal” and “MCI,” respectively. We assume that the FBN has n brain nodes corresponding to brain regions, G = {V, E}, where V denotes brain region and edge E denotes the functional connectivity between two brain regions. DSTAN networks have L convolution layers. The number of input and output channels in the l-th convolution layer is c1 and c2, respectively. For the l-th convolution layer, fl={fil(x,t)}i=1c1n×T×c1 is the input of convolution, f^l+1={f^jl+1(x,t)}j=1c2n×T-w+1s×c2 is the input of the brain node attention module, T denotes the number of time points of functional brain signals, w denotes the size of the convolution kernel, and s denotes the size of average pooling. The purpose of the DSTAN network design is to capture spatiotemporal features by fusing the spatial and temporal features of functional brain signals and to use the attention mechanism to focus on brain nodes related to MCI.

3.2. Spatiotemporal Convolution Strategy

The transmission of functional brain signals is based on the underlying functional connections between brain regions (Huang et al., 2018), and they contain rich temporal information. Therefore, we extract node features (i.e., spatial features) and temporal features of functional signals from functional networks and time series, respectively. To this end, we design an ST-CONV strategy, as shown in Figure 2. We first perform convolution operation on the time series of input functional signals to extract its temporal features fil(x,t) as follows:

fjl+1(x,T-w+1s)=tpool(σ(i=1c1kj(t,i)fil(x,t)))    (1)

where fjl+1 denotes the output, tpool(·) denotes temporal average pooling with a window size of (1, s), σ(·) denotes the activation function, and kj(t, i) denotes the convolution kernel with the size of (1, w) in the i-th channel. Then, we capture the spatial features of functional signals in the functional network through the graph convolution operation:

f^jl+1(x,t)=σ(i=1c1(D^12A^D^12fjl+1(x,t)W))    (2)

where f^jl+1,j=1,2,,c2 denotes the output of graph convolution, A^=I+V, D^ denotes D^nn=mA^nm, the rest elements are 0, I denotes identity matrix, W is the parameter corresponding to fjl+1(x,t), and fl+1={fjl+1(x,t)}i=1c2n×T-w+1s×c2 denotes the input.

FIGURE 2
www.frontiersin.org

Figure 2. Spatiotemporal convolution strategy structure illustration. Temporal-Conv represents temporal convolution operation, Temporal-AvgPool represents temporal average pooling, Grap-Conv represents graph convolution operation, Node-Att Module represents the brain node attention module, and Attention represents the brain node attention map.

3.3. Brain Node Attention Module

There are a large number of brain nodes in the functional brain network, and the brain regions corresponding to different brain nodes reflect different diseases (Ries et al., 2008). In order to capture the features of brain nodes related to MCI, we introduce an attention mechanism, as shown in Figure 3. We first integrate the channel and time information of each brain node into a scalar, and solve the brain nodes related to MCI as follows:

Mj=σ(ht*f^jl+1(x,t))    (3)

where M={Mj}j=1c2n×1×c2 denotes the output of the j-th channel after convolution operation, ht denotes the convolution kernel set with the size of (1,T-w+1s), and f^l+1={f^jl+1(x,t)}j=1c2n×T-w+1s×c2 denotes the input of the brain node attention module.

FIGURE 3
www.frontiersin.org

Figure 3. Illustration of attention mechanism. Temporal-Conv represents temporal convolution operation, Channel-AvgPool represents channel average pooling operation, and r represents the down-sampling rate.

Then, we maintain the feature invariance in the functional signal through the average down-sampling based on the channel dimension, and suppress the noise generated when collecting the functional signal to make it better for training, which can be expressed as:

CavgM=j=1c2Mjc2n×1×1    (4)

where CavgM denotes the output.

In order to further capture the MCI-related brain nodes, we project the brain node features of CavgM into the MCI-related feature space. We design the K-layer base layer in the attention module and perform the following operations: 1) Perform down-sampling on the k-th base layer:

Fk=relu(fc(CavgM,nr))    (5)

where Fk denotes the output of the k-th base layer, nr denotes the down-sampling rate, relu(·) denotes the activation function, and fc(·) is the same as the fully connected operation. 2) Perform up-sampling on the k + 1-th base layer:

Fk+1=relu(fc(Fk,nr))    (6)

where Fk+1 denotes the output of the (k + 1)-th base layer, and n denotes the up-sampling rate. In this way, we can further get the attention map of brain nodes as follows:

Z(x)=Sigmoid(Fk+1)    (7)

where Z(x) ∈ ℝn × 1 × 1 denotes the brain node attention map. In the detailed process, the k-th base layer performs down-sampling from n brain nodes to nr brain nodes; the (k + 1)-th base layer performs up-sampling from nr brain nodes to n brain nodes. We use this nonlinear transformation to capture the dependency between brain nodes and MCI.

Finally, in order to further focus on the brain nodes with strong correlation, we multiply the brain node attention map with the functional signal:

f~l+1=Z(x)×f^l+1    (8)

where f~l+1 denotes the output.

As discussed above, different brain regions have different effects on MCI. Therefore, we separate brain nodes with different correlations by maximizing the variance of the brain node attention map. At the same time, the high value of highly correlated brain regions in the brain node attention map will lead to excessive attention loss. In this regard, we control attention loss by minimizing their mean values as follows:

Latt=x=1nmean(Z(x))-var(Z(x))    (9)

where Latt denotes the attention loss, mean(·) denotes the mean operation, and var(·) denotes the variance operation.

3.4. Spatiotemporal Feature Fusion

In order to further explore the impact of temporal and spatial features of functional signals on MCI identification, we fuse spatial and temporal features. Specifically, we realize the spatiotemporal feature fusion by summing temporal features fjl+1 and spatial features f^jl+1:

hjl+1=fjl+1+f^jl+1    (10)

where hjl+1n×T-w+1s×c2 denotes the output of spatiotemporal features fusion.

Finally, in the L-th convolutional layer, we further extract spatiotemporal features by convolution operations and compress them into a scalar as the input of the fully connected layer as follows:

S=ujL*hjL    (11)

where S1×1×c2 denotes the output of the convolution operation, ujL denotes the convolution kernel set corresponding to hjL.

3.5. Objective Function

In DSTAN, the features of functional signals are mapped to the corresponding label space through fully connected layers. In the training process, the objective function of the DSTAN network is designed:

Ltotal=h=1NLce(f(x,t)h,yh)+Latth(x)    (12)

where Lce(·) denotes the cross-entropy loss, and Latth denotes the attention loss of the h-th sample.

4. Experiments

In the MCI identification experiments, we utilize fMRI data to train a deep neural network framework for MCI identification. Since the framework needs to use functional connections between brain nodes to extract the spatial features of functional brain signals. Therefore, we use the Pearson correlation coefficient method to construct functional brain networks to obtain functional connectivity matrix related to brain nodes.

4.1. Experimental Setting

In this section, our experimental setup is divided into the following steps:

Step 1: In order to obtain the connection matrix of brain nodes, we first use the Pearson coefficient to measure the correlation between brain nodes, so as to obtain a functional connectivity matrix P. Then, we sparse the connectivity matrix, where λ=0.1,0.2,…,1 denotes sparsity. Finally, the spatial features of a functional signals are extracted by using graph convolution operation of auxiliary of the connectivity matrix.

Step 2: We set the following settings for each module in DSTAN: 1) In the ST-CONV module, we use convolution kernels to extract time series features, and at the same time, we perform an average pooling operation on the time series. The number of output channels is set to (8,16,32). 2) In the Node-ATT module, we perform operational down-sampling on the base layers and set the sampling rate to 1/16.

4.2. Implementation

All experiments are programmed and implemented as follows: PyTorch 1.9 framework, Python version 3.8, and trained with one GeForce RTX 3090 GPU. We use SGD as the optimizer for training, with the momentum of 0.1, weight attenuation of 1e-4, 90 iterations, the initial learning rate of 0.1, attenuation of 50% every 30 times, and batch size of 32. Note that we randomly divided the preprocessed fMRI data obtained in Section 2.2 into a training set and a test set in a ratio of 8:2 for the following experiments.

4.3. Evaluation Standard

We use the following indicators for quantitative measurements, which include accuracy, sensitivity, and specificity. All methods are tested with these metrics, which are as follows:

Accuracy=TruePositive + TrueNegativeTruePositive + FalsePositive + TrueNegative + FalseNegative    (13)
Sensitivity=TruePositiveTruePositive + FalseNegative    (14)
Specificity=TrueNegativeTrueNegative + FalsePositive    (15)

where TruePositive represents the number of correctly classified positive patients with MCI, and TrueNegative, FalsePositive, and FalseNegative represent the corresponding number of subjects, respectively.

4.4. Experimental Results and Analysis

4.4.1. Visualization of Brain Node Functional Connectivity Matrix

In this section, we report the influence of sparsity λ and functional connectivity between brain nodes on MCI identification. We sparse the functional connectivity matrix P to different degrees. From Figure 4, we can observe that: 1) in the first row of images, when λ = 0.1, the connectivity matrix P retains more brain node connections with weaker correlations, which makes it difficult to extract effective spatial features from functional signals, thus negatively affecting MCI identification. 2) In the middle row of images, when λ = 0.5, the connectivity matrix P removes the connections of weakly correlated brain nodes and retains certain correlated brain nodes, which reduces the adverse factors for identifying MCI. 3) In the last row of images, when λ = 0.9, the connectivity matrix P retains the highly correlated brain node connections so that the graph convolution operation can extract more effective spatial features, which can further promote the accuracy of model recognition MCI. The above experimental results show that the choice of sparsity λ has a significant dependence on the functional connections between brain nodes, and a higher sparsity has a beneficial impact on spatial feature extraction and MCI identification.

FIGURE 4
www.frontiersin.org

Figure 4. Visual illustration of brain node connections.

4.4.2. Classification Performance of Different Sparsity

In order to further explore the influence of sparsity λ on MCI identification, we conducted experiments in different sparsity ranges. Figure 5 shows the sparsity λ classification accuracy histogram in the range of 0.1–0.9. From this figure, we obtain the following observations: 1) When λ = 0.9, DSTAN classification accuracy is significantly better than other sparse classification experiments. 2) The functional connections of brain nodes affect the classification performance of MCI, which leads to great differences in the classification results of connection matrices with different sparsity. 3) With the increase of sparsity, the interference of weakly correlated brain nodes gradually decreases, and the classification accuracy improves. Therefore, removing weak functional connections between brain nodes in DSTAN can improve MCI identification performance. Finally, the above experimental results prove again that higher sparsity can promote graph convolution to capture more spatial features and further improve MCI classification accuracy.

FIGURE 5
www.frontiersin.org

Figure 5. Classification accuracy of different sparsity λ.

4.4.3. MCI Identification

We performed MCI vs. NC experiments on the MCI dataset. We compare the following methods, including traditional machine learning methods: Support Vector Machine (Song et al., 2017), RandomForest (Fredo et al., 2018), and Deep learning methods: Multi-Layer Perception (Shanmuganathan, 2016; Almuqhim and Saeed, 2021; Gao et al., 2021; Yin et al., 2021). Table 2 reports the test accuracy of all methods on the MCI dataset. The following observations are made: 1) In MCI identification, deep learning methods are significantly better than traditional machine learning methods. 2) The DSTAN method significantly outperformed other methods in accuracy, sensitivity, and specificity. In addition, this method is effective for MCI identification based on FBN. In conclusion, DSTAN can well identify patients with MCI, and the probability of misdiagnosis of patients with NC is low.

TABLE 2
www.frontiersin.org

Table 2. Performance of all methods on MCI identification.

4.4.4. Visualization of Brain Node Attention Map

Figure 6 shows the brain node visualization obtained from the attention map in Section 3.3. Specifically, the abscissa values correspond to the brain regions of different brain nodes, and the ordinate values represent the correlation intensity. The higher the value of the ordinate, the stronger the correlation between the corresponding brain region and MCI. The colors of corresponding values in all brain regions are randomly generated. From this figure, we can find: 1) The corresponding values of most brain regions are 0, i.e., Inferior frontal gyrus, triangular part (IFGtriang), and Gyrus rectus Middle (REC) occipital gyrus (MOG). This result indicates that this part of the brain region has nothing to do with MCI identification, which is consistent with the conclusion of Wee et al. (2012) on the relationship between brain regions and MCI. 2) This figure shows a total of 34 brain regions that have a strong correlation with MCI, thus affecting MCI identification. 3) This figure shows that brain regions such as the middle temporal gyrus (MTG), Superior frontal gyrus medial orbital (ORBsupmed), inferior parietal (IPL), Supramarginal gyrus (SMG), and Precuneus (PCUN) have a strong correlation with MCI identification which is consistent with previous MCI imaging biomarker reports and pathological studies (Greicius, 2008; Albert et al., 2011).

FIGURE 6
www.frontiersin.org

Figure 6. Visualization of brain node attention map.

4.5. Ablation Studies

To verify the effectiveness of each component in DSTAN proposed in this article, we perform ablation studies. Table 3 reports the performance comparison between DSTAN and the removal of the attention mechanism (No-Att for short). From this table, we observe that: 1) In the No-Att method, the gap between DSTAN and No-Att accuracy is small. But the sensitivity and specificity are much lower than DSTAN. This finding may suggest that we are more likely to misdiagnose patients with MCI and patients with misdiagnosed NC. 2) DSTAN is superior to the No-Att method in all evaluation indicators. The above results demonstrate that the attention mechanism in the DSTAN framework is used to eliminate the interference of redundant brain nodes on MCI identification, so as to improve the performance of MCI classification.

TABLE 3
www.frontiersin.org

Table 3. Comparing the classification performance of the DSTAN and the No-Att methods.

5. Discussion

In this study, a reliable functional brain network (FBN) is constructed from functional magnetic resonance imaging (fMRI) data to assist in the identification of mild cognitive impairment. Different from previous studies, we propose a novel DSATN framework, which fuses functional brain signals and spatiotemporal features of FBN for MCI identification. Specifically, we first capture spatiotemporal features through ST-CONV strategy and graph convolution. Then, we capture the brain node features associated with MCI through an attention mechanism. Finally, we fuse these features for DSATN network training. Our detailed experimental results are listed as follows: 1) We facilitate graph convolution to obtain more effective spatial features in functional brain signals through a higher sparse functional connectivity matrix. 2) We use the attention mechanism to effectively improve the MCI identification performance and capture 34 brain regions with strong correlations with MCI. 3) We obtain an encouraging classification accuracy of 84.21% on MIC identification.

5.1. Spatiotemporal Feature Fusion in MCI Identification

Functional magnetic resonance imaging (fMRI) is a widely used neuroimaging modality. This modality performs imaging by measuring the blood oxygen level dependence (BOLD) of each brain region in the brain (Khosla et al., 2019). fMRI data are rich in temporal and spatial features (Ma et al., 2016). Previous study has studied the spatial features of fMRI, e.g., using matrix decomposition (Du and Zhang, 2021), Pearson correlation sparse (Smith et al., 2013), and sparse representation (Lee et al., 2011) to construct FBN, and extract its structural features; At the same time, there are also studies on the temporal features of fMRI, e.g., using Rnn (Dvornek et al., 2018), LSTM (Yan et al., 2018) to extract temporal features in the time series of fMRI data. Some recent studies have investigated the spatiotemporal features of fMRI data, e.g., in Gadgil et al. (2020), the authors divided the fMRI data into multiple short sequences according to the length of the time series, then quantified the connectivity between brain regions in the short sequences, and used graph convolution to extract spatial features of short sequences. In Li et al. (2020a), authors used convolution operation to extract spatial features in fMRI data, and taked the resulting features as the input of LSTM network to capture the temporal information contained in the data. The above methods utilize spatiotemporal features in fMRI data, but do not deeply consider the relationship between temporal and spatial features. In DSTAN, considering that both temporal and spatial features of fMRI data have positive effects on MCI identification, we further fuse temporal and spatial features. Specifically, we use convolution operation to extract temporal features and graph convolution operation to extract spatial features. Then, we achieve spatiotemporal feature fusion by element-wise summation. Extensive experimental results are compared with current state-of-the-art methods to verify the effectiveness of spatiotemporal feature fusion. We speculate as follows: 1) Each brain region corresponds to a set of time series and contains temporal information. 2) The corresponding temporal features of the brain regions related to MCI have a positive role in promoting MCI identification, and their corresponding spatial features have a key role in MCI identification. Therefore, the accumulation of these two positive-acting features can improve the performance of MCI identification.

5.2. Brain Node Attention Mechanism in MCI Identification

In the brain node attention module, we set up multiple base layers to capture the brain regions related to MCI. The experiments in this article found that 34 brain regions in all brain regions are closely related to MCI, including the middle temporal gyrus controls semantic cognition (Davey et al., 2016); the Superior frontal gyrus medial orbitally affects schizophrenia and delusions (Gao et al., 2015); Inferior parietal affects sensory memory function Chen et al. (2021); Supramarginal gyrus affects auditory memory function (DES, 2014); and Precuneus affects cognitive function (Nagano-Saito et al., 2021). These brain nodes are correlated with MCI and are consistent with the experimental results of previous studies (Greicius, 2008; Albert et al., 2011). At present, many studies have shown that FBN can show more structures or attributes, such as classification, hierarchy, centrality, synchronization, and scale-free topological results. Therefore, we will further explore the relationship between brain regions and MCI, and use correlation knowledge sharing in multi-task learning for MCI identification and interpretability research, providing a new method for the prevention and treatment of MCI.

5.3. Limitations and Future Directions

We build an MCI identification mechanism based on spatiotemporal feature fusion and attention mechanism and achieve excellent experimental results. However, there are still several limitations that need to be considered further. First, the training and validating model is inseparable from a large number of data samples and data from different sources. In future study, we need to further validate the robustness of our proposed method with large samples and heterogeneous data from multiple sources. Second, less research on the interpretability of MCI identification is involved. We need an interpretable analysis combined with clinical knowledge. Third, MCI is an early stage of AD, and MCI should be analyzed together with other related nervous system diseases. At present, many studies have shown that FBN can show more structures or attributes, such as classification, hierarchy, centrality, synchronization, and scale-free topological results. Therefore, we will further explore the relationship between brain regions and MCI, and use correlation knowledge sharing in multi-task learning for MCI identification and interpretability research, providing a new method for the prevention and treatment of MCI.

6. Conclusion

In the present study, we propose a DSTAN network, which uses spatiotemporal feature fusion and attention mechanism for MCI identification, and obtains excellent classification performance (Accuracy = 84.21%). In addition, spatiotemporal feature fusion increases the diversity of effective training samples by accumulating temporal and spatial features. The brain node attention mechanism strengthens the model's attention to brain regions related to MCI. Our findings demonstrate that the combined use of spatiotemporal feature fusion and attention mechanism can better distinguish MCI from NC. Combining FBN and graph convolution for better MCI identification is helpful for early clinical diagnosis of AD.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.nitrc.org/projects/modularbrain/.

Ethics Statement

The studies involving human participants were reviewed and approved by Department of PET/MR, Universal Medical Imaging Diagnostic Center, Shanghai, China. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

QF and YH designed this experiments and wrote the manuscript. YL drawed and wrote the manuscript. LG was involved in funding acquisition and wrote the manuscript. XG provided and processed data. All authors contributed to the article and approved the submitted version.

Funding

This project is supported by Wuyi University- Hong Kong- Macau Joint Fund: 2019WGALH23, Teaching Reform Project of Guangdong Province: GDJX2020009, Scientific Research Subjects of Shanghai Universal Medical Imaging Technology Limited Company: UV2020Z02 and UV2021Z01, and Shanghai Municipal Commission of Health and Family Planning Science and Research Subjects: 202140464.

Conflict of Interest

YL was employed by Nanjing Huayin Medical Laboratory Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Albert, M. S., DeKosky, S. T., Dickson, D., Dubois, B., Feldman, H. H., Fox, N. C., et al. (2011). The diagnosis of mild cognitive impairment due to alzheimer's disease: recommendations from the national institute on aging-Alzheimer's association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 7, 270–279. doi: 10.1016/j.jalz.2011.03.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Almuqhim, F., and Saeed, F. (2021). Asd-saenet: a sparse autoencoder, and deep-neural network model for detecting autism spectrum disorder (asd) using fmri data. Front. Comput. Neurosci. 15, 654315. doi: 10.3389/fncom.2021.654315

PubMed Abstract | CrossRef Full Text | Google Scholar

Amoroso, N., La Rocca, M., Bruno, S., Maggipinto, T., Monaco, A., Bellotti, R., et al. (2017). Brain structural connectivity atrophy in alzheimer's disease. arXiv preprint arXiv:1709.02369. doi: 10.48550/arXiv.1709.02369

CrossRef Full Text | Google Scholar

Association, A. (2019). 2019 Alzheimer's disease facts and figures. Alzheimers Dement. 15, 321–387. doi: 10.1016/j.jalz.2019.01.010

CrossRef Full Text | Google Scholar

Bray, N. W., Pieruccini-Faria, F., Bartha, R., Doherty, T. J., Nagamatsu, L. S., and Montero-Odasso, M. (2021). The effect of physical exercise on functional brain network connectivity in older adults with and without cognitive impairment. a systematic review. Mech. Ageing Dev. 196, 111493. doi: 10.1016/j.mad.2021.111493

PubMed Abstract | CrossRef Full Text | Google Scholar

Brookmeyer, R., Johnson, E., Ziegler-Graham, K., and Arrighi, H. M. (2007). Forecasting the global burden of Alzheimer's disease. Alzheimers Dement. 3, 186–191. doi: 10.1016/j.jalz.2007.04.381

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, P.-Y., Hsu, H.-Y., Chao, Y.-P., Nouchi, R., Wang, P.-N., and Cheng, C.-H. (2021). Altered mismatch response of inferior parietal lobule in amnestic mild cognitive impairment: a magnetoencephalographic study. CNS Neurosci. Therapeut. 27, 1136–1145. doi: 10.1111/cns.13691

PubMed Abstract | CrossRef Full Text | Google Scholar

Davey, J., Thompson, H. E., Hallam, G., Karapanagiotidis, T., Murphy, C., Caso, I. D., et al. (2016). Exploring the role of the posterior middle temporal gyrus in semantic cognition: integration of anterior temporal lobe with executive processes. Neuroimage 137, 165–177. doi: 10.1016/j.neuroima“ge.2016.05.051

PubMed Abstract | CrossRef Full Text | Google Scholar

DES (2014). On the role of the supramarginal gyrus in phonological processing and verbal working memory: evidence from rtms studies. Neuropsychologia 53, 39–46. doi: 10.1016/j.neuropsychologia.2013.10.015

PubMed Abstract | CrossRef Full Text | Google Scholar

Dowlagar, S., and Mamidi, R. (2021). Graph convolutional networks with multi-headed attention for code-mixed sentiment analysis,” in Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages (Pittsburgh, PA), 65–72.

Google Scholar

Du, Y., and Zhang, L. (2021). Estimating functional brain network with low-rank structure via matrix factorization for mci/asd identification. J. Appl. Math. Phys. 9, 1946–1963. doi: 10.4236/jamp.2021.98127

CrossRef Full Text | Google Scholar

Dvornek, N. C., Ventola, P., and Duncan, J. S. (2018). “Combining phenotypic and resting-state fmri data for autism classification with recurrent neural networks,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) (Washington, DC: IEEE), 725–728.

Google Scholar

Fredo, J., Jahedi, A., Reiter, M., and Müller, R.-A. (2018). “Diagnostic classification of autism using resting-state fmri data and conditional random forest,” in Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vol. 88 (California: IEEE Engineering in Medicine and Biology Society), 1148–1151.

Google Scholar

Friston, K. J., Williams, S., Howard, R., Frackowiak, R. S., and Turner, R. (1996). Movement-related effects in fmri time-series. Mag. Reson. Med. 35, 346–355. doi: 10.1002/mrm.1910350312

PubMed Abstract | CrossRef Full Text | Google Scholar

Gadgil, S., Zhao, Q., Pfefferbaum, A., Sullivan, E. V., Adeli, E., and Pohl, K. M. (2020). “Spatio-temporal graph convolution for resting-state fmri analysis,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Lima: Springer), 528–538.

Google Scholar

Gao, B., Wang, Y., Liu, W., Chen, Z., and Zang, Y. (2015). Spontaneous activity associated with delusions of schizophrenia in the left medial superior frontal gyrus: a resting-state fmri study. PLoS ONE 10, e0133766. doi: 10.1371/journal.pone.0133766

PubMed Abstract | CrossRef Full Text | Google Scholar

Gao, J., Chen, M., Li, Y., Gao, Y., Li, Y., Cai, S., et al. (2021). Multisite autism spectrum disorder classification using convolutional neural network classifier and individual morphological brain networks. Front. Neurosci. 11, 44,1473. doi: 10.3389/fnins.2020.629630

PubMed Abstract | CrossRef Full Text | Google Scholar

Gauthier, S., Reisberg, B., Zaudig, M., Petersen, R. C., Ritchie, K., Broich, K., et al. (2006). Mild cognitive impairment. Lancet 367, 1262–1270. doi: 10.1016/S0140-6736(06)68542-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Greicius, M. (2008). Resting-state functional connectivity in neuropsychiatric disorders. Curr. Opin. Neurol. 21, 424–430. doi: 10.1097/WCO.0b013e328306f2c5

PubMed Abstract | CrossRef Full Text | Google Scholar

Han, G., He, Y., Huang, S., Ma, J., and Chang, S.-F. (2021). “Query adaptive few-shot object detection with heterogeneous graph convolutional networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (Montreal, QC), 3263–3272.

Google Scholar

Huang, W., Bolton, T. A. W., Medaglia, J. D., Bassett, D. S., Ribeiro, A., and Van De Ville, D. (2018). A graph signal processing perspective on functional brain imaging. Proc. IEEE 106, 868–885. doi: 10.1109/JPROC.2018.2798928

CrossRef Full Text | Google Scholar

Ithapu, V. K., Singh, V., Okonkwo, O. C., Chappell, R. J., Dowling, N. M., Johnson, S. C., et al. (2015). Imaging-based enrichment criteria using deep learning algorithms for efficient clinical trials in mild cognitive impairment. Alzheimers Dement. 11, 1489–1499. doi: 10.1016/j.jalz.2015.01.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Kang, L., Jiang, J., Huang, J., and Zhang, T. (2020). Identifying early mild cognitive impairment by multi-modality mri-based deep learning. Front. Aging Neurosci. 12, 206. doi: 10.3389/fnagi.2020.00206

PubMed Abstract | CrossRef Full Text | Google Scholar

Khosla, M., Jamison, K., Ngo, G. H., Kuceyeski, A., and Sabuncu, M. R. (2019). Machine learning in resting-state fmri analysis. Magn. Reson. Imaging 61, 101–121. doi: 10.1016/j.mri.2019.05.031

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, H., Lee, D. S., Kang, H., Kim, B.-N., and Chung, M. K. (2011). Sparse brain network recovery under compressed sensing. IEEE Trans. Med. Imaging 30, 1154–1165. doi: 10.1109/TMI.2011.2140380

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, W., Lin, X., and Chen, X. (2020a). Detecting alzheimer's disease based on 4d fmri: an exploration under deep learning framework. Neurocomputing 382, 280–287. doi: 10.1016/j.neucom.2020.01.053

CrossRef Full Text | Google Scholar

Li, W., Wang, Z., Zhang, L., Qiao, L., and Shen, D. (2017). Remodeling pearson's correlation for functional brain network estimation and autism spectrum disorder identification. Front. Neuroinform. 15, 55. doi: 10.3389/fninf.2017.00055

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, W., Xu, X., Jiang, W., Wang, P., and Gao, X. (2020b). Functional connectivity network estimation with an inter-similarity prior for mild cognitive impairment classification. Aging 12, 17328. doi: 10.18632/aging.103719

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, W., Xu, X., Wang, Z., Peng, L., Wang, P., and Gao, X. (2021). Multiple connection pattern combination from single-mode data for mild cognitive impairment identification. Front. Cell Dev. Biol. 7, 782727. doi: 10.3389/fcell.2021.782727

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, W., Zhang, L., Qiao, L., and Shen, D. (2019). Toward a better estimation of functional brain network for mild cognitive impairment identification: a transfer learning view. IEEE J. Biomed. Health Inform. 24, 1160–1168. doi: 10.1101/684779

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, S., Cai, W., Song, Y., Pujol, S., Kikinis, R., Wen, L., et al. (2013a). Localized sparse code gradient in alzheimer's disease staging. Annu. Int. Conf. IEEE. Eng. Med. Biol. Soc. 2013, 5398–5401. doi: 10.1109/EMBC.2013.6610769

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, S., Cai, W., Wen, L., and Feng, D. (2013b). “Neuroimaging biomarker based prediction of alzheimer's disease severity with optimized graph construction,” in 2013 IEEE 10th International Symposium on Biomedical Imaging (San Francisco, CA: IEEE), 1336–1339.

Google Scholar

Liu, X., Tang, T., and Ding, N. (2022). Social network sentiment classification method combined chinese text syntax with graph convolutional neural network. Egypt. Inform. J. 23, 1–12. doi: 10.1016/j.eij.2021.04.003

CrossRef Full Text | Google Scholar

Ma, G., He, L., Lu, C.-T., Yu, P. S., Shen, L., and Ragin, A. B. (2016). “Spatio-temporal tensor analysis for whole-brain fmri classification,” in Proceedings of the 2016 SIAM International Conference on Data Mining (Miami, FL: SIAM), 819–827.

Google Scholar

McKhann, G. M., Knopman, D. S., Chertkow, H., Hyman, B. T., Jack Jr, C. R., Kawas, C. H., et al. (2011). The diagnosis of dementia due to alzheimer's disease: recommendations from the national institute on aging-alzheimer's association workgroups on diagnostic guidelines for alzheimer's disease. Alzheimers Dement. 7, 263–269. doi: 10.1016/j.jalz.2011.03.005

PubMed Abstract | CrossRef Full Text | Google Scholar

Morris, J. C., Storandt, M., Miller, J. P., McKeel, D. W., Price, J. L., Rubin, E. H., et al. (2001). Mild cognitive impairment represents early-stage Alzheimer disease. Arch. Neurol. 58, 397–405. doi: 10.1001/archneur.58.3.397

PubMed Abstract | CrossRef Full Text | Google Scholar

Nagano-Saito, A., Houde, J. C., Bedetti, C., Descoteaux, M., and Monchi, O. (2021). Reorganisation of diffusion microstructure in the precuneus is associated with preserved cognitive function in Parkinson's disease. Res Squ. [Preprint]. Available online at: https://www.researchsquare.com/article/rs-147275/v1

Google Scholar

Peng, W., Tang, Q., Dai, W., and Chen, T. (2022). Improving cancer driver gene identification using multi-task learning on graph convolutional network. Brief. Bioinform. 23, bbab432. doi: 10.1093/bib/bbab432

PubMed Abstract | CrossRef Full Text | Google Scholar

Puranik, M., Shah, H., Shah, K., and Bagul, S. (2018). “Intelligent Alzheimer's detector using deep learning,” in 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS) (Madurai: IEEE), 318–323.

Google Scholar

Qiao, L., Zhang, H., Kim, M., Teng, S., Zhang, L., and Shen, D. (2016). Estimating functional brain networks by incorporating a modularity prior. Neuroimage 143, 399–407. doi: 10.1016/j.neuroimage.2016.07.058

PubMed Abstract | CrossRef Full Text | Google Scholar

Raju, M., Sudila, T., Gopi, V. P., and Anitha, V. (2020). “Classification of mild cognitive impairment and Alzheimer's disease from magnetic resonance images using deep learning,” in 2020 International Conference on Recent Trends on Electronics, Information, Communication and Technology (RTEICT) (Bangalore: IEEE), 52–57.

Google Scholar

Ries, M. L., Carlsson, C. M., Rowley, H. A., Sager, M. A., Gleason, C. E., Asthana, S., et al. (2008). Magnetic resonance imaging characterization of brain structure and function in mild cognitive impairment: a review. J Am Geriatr Soc. 56, 920–934. doi: 10.1111/j.1532-5415.2008.01684.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Roberson, E. D., and Mucke, L. (2006). 100 years and counting: prospects for defeating Alzheimer's disease. Science 314, 781–784. doi: 10.1126/science.1132813

PubMed Abstract | CrossRef Full Text | Google Scholar

Shanmuganathan, S. (2016). Artificial Neural Network Modelling: An Introduction. Cham: Springer. p. 1–14.

Google Scholar

Smith, S. M., Vidaurre, D., Beckmann, C. F., Glasser, M. F., Jenkinson, M., Miller, K. L., et al. (2013). Functional connectomics from resting-state fmri. Trends Cogn. Sci. 17, 666–682. doi: 10.1016/j.tics.2013.09.016

PubMed Abstract | CrossRef Full Text | Google Scholar

Song, H., Chen, L., Gao, R., Bogdan, I. I. M., Yang, J., Wang, S., et al. (2017). Automatic schizophrenic discrimination on fnirs by using complex brain network analysis and svm. BMC Med. Inform. Decis. Mak. 17, 1–9. doi: 10.1186/s12911-017-0559-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Tang, X., Luo, J., Shen, C., and Lai, Z. (2021). Multi-view multichannel attention graph convolutional network for mirna-disease association prediction. Brief. Bioinform. 22, bbab174. doi: 10.1093/bib/bbab174

PubMed Abstract | CrossRef Full Text | Google Scholar

Wee, C.-Y., Yap, P.-T., Zhang, D., Wang, L., and Shen, D. (2012). “Constrained sparse functional connectivity networks for mci classification,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Nice: Springer), 212–219.

Google Scholar

Yan, C., and Zang, Y. (2010). Dparsf: a matlab toolbox for" pipeline" data analysis of resting-state fmri. Front. Syst. Neurosci. 1, 44,13. doi: 10.3389/fnsys.2010.00013

PubMed Abstract | CrossRef Full Text | Google Scholar

Yan, W., Zhang, H., Sui, J., and Shen, D. (2018). “Deep chronnectome learning via full bidirectional long short-term memory networks for mci diagnosis,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Granada: Springer), 249–257.

Google Scholar

Yin, W., Mostafa, S., and Wu, F.-X. (2021). Diagnosis of autism spectrum disorder based on functional brain networks with deep learning. J. Comput. Biol. 28, 146–165. doi: 10.1089/cmb.2020.0252

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, Z., Huang, F., Zhao, X., Xiao, W., and Zhang, W. (2021). Predicting drug-disease associations through layer attention graph convolutional network. Brief. Bioinform. 22, bbaa243. doi: 10.1093/bib/bbaa243

PubMed Abstract | CrossRef Full Text | Google Scholar

Yue, L., Gong, X., Chen, K., Mao, M., Li, J., Nandi, A. K., et al. (2018). “Auto-detection of alzheimer's disease using deep convolutional neural networks,” in 2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (Huangshan: IEEE), 228–234.

Google Scholar

Zhang, D., Wang, Y., Zhou, L., Yuan, H., Shen, D., Initiative, A. D. N., et al. (2011). Multimodal classification of Alzheimer's disease and mild cognitive impairment. Neuroimage 55, 856–867. doi: 10.1016/j.neuroimage.2011.01.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Zou, Z., and Tang, W. (2021). “Modulated graph convolutional network for 3d human pose estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (Montreal, QC: IEEE), 11477–11487.

Google Scholar

Zubatiy, T., Vickers, K. L., Mathur, N., and Mynatt, E. D. (2021). “Empowering dyads of older adults with mild cognitive impairment and their care partners using conversational agents,” in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama), 1–15.

Google Scholar

Keywords: functional brain network (FBN), mild cognitive impairment (MCI), graph convolution, attention, spatiotemporal features

Citation: Feng Q, Huang Y, Long Y, Gao L and Gao X (2022) A Deep Spatiotemporal Attention Network for Mild Cognitive Impairment Identification. Front. Aging Neurosci. 14:925468. doi: 10.3389/fnagi.2022.925468

Received: 21 April 2022; Accepted: 23 June 2022;
Published: 18 July 2022.

Edited by:

Shuo Hu, Central South University, China

Reviewed by:

Kai Ma, Nanjing University of Aeronautics and Astronautics, China
Li Zhang, Nanjing Forestry University, China

Copyright © 2022 Feng, Huang, Long, Gao and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Le Gao, bGUuZ2FvJiN4MDAwNDA7bnNjYy1nei5jbg==; Xin Gao, Z2VvcmdlLnNzbXUmI3gwMDA0MDsxNjMuY29t

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.