Diagnostic model optimization method for ADHD based on brain network analysis of resting-state fMRI images and transfer learning neural network

Meng, Xiaojing; Zhuo, Wenjie; Ge, Peng; Zou, Bin; Zhu, Yao; Liu, Weidong; Li, Xuzhou

doi:10.3389/fnhum.2022.1005425

ORIGINAL RESEARCH article

Front. Hum. Neurosci., 14 October 2022

Sec. Brain Health and Clinical Neuroscience

Volume 16 - 2022 | https://doi.org/10.3389/fnhum.2022.1005425

This article is part of the Research TopicEffective Connectivity Analysis in Neuropsychiatric DisordersView all 6 articles

Diagnostic model optimization method for ADHD based on brain network analysis of resting-state fMRI images and transfer learning neural network

Xiaojing Meng^1†

Wenjie Zhuo^2†

Peng Ge³

Bin Zou⁴

Yao Zhu⁵

Weidong Liu^3*

Xuzhou Li^6*

¹XuZhou Medical University, Xuzhou, China
²Collaborative Innovation Center of Artificial Intelligence, Zhejiang University, Hangzhou, China
³China University of Mining and Technology, Xuzhou, China
⁴Mental Health Counseling Center, Zhejiang Financial College, Hangzhou, China
⁵The School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
⁶Faculty of Education, Yunnan Normal University, Kunming, China

Introduction: Attention deficit and hyperactivity disorder (ADHD) is a common inherited disease of the nervous system whose cause(s) and pathogenesis remain unclear. Currently, the diagnosis of ADHD is mainly based on clinical experience and guidelines that have laid out some diagnostic standards. Our study aimed to apply a learning-based classification method to assist the ADHD diagnosis based on high-dimensional resting-state fMRI.

Methods: Our study selected the ADHD-200 Peking dataset of resting-state fMRI, which has an ADHD patient (n = 142) group and a typically developing control (TDC) healthy control (n = 102) group. We first used Pearson and partial correlation coefficients to perform functional connectivity (FC) analysis between ROIs. Then, the Pearson and partial correlation coefficient matrices were concatenated into a dual-channel feature to build a dual data channel as input to the transfer learning neural network (TLNN) architecture. Finally, we transferred the pretrained model from the auxiliary domain to our target domain and fine-tuned it.

Results: Based on the Pearson correlation coefficient, FC between ROIs was detected in 22 brain regions, including the fusiform gyrus, superior frontal gyrus, posterior superior temporal sulcus, inferior parietal lobule, anterior cingulate cortex, and parahippocampal gyrus. Based on the partial correlation coefficient, we found FC in the salient network, default network, sensory-motor network, dorsal attention network, and cerebellum network. With the TLNN architecture, we solved the problem of insufficient training data and improved the sensitivity of the classification method. When the VGG model (fine-tuned transfer strategy, 1,024 fully connected layers) was applied, the accuracy of TLNN classification ultimately reached 82%.

Conclusion: Our study suggests that completing the training of the target domain by transferring the prior knowledge of the auxiliary domain is effective in solving the classification problem of small sample datasets. Based on prior knowledge of FC analysis, TLNN classification may assist ADHD diagnosis in a new way.

Introduction

Attention deficit and hyperactivity disorder (ADHD) is a common inherited disease of the nervous system. If not treated in time, ADHD will have a negative impact on the patient’s schooling and life, influence family harmony, and even endanger society (Dupaul et al., 1998; Graham et al., 2011; Cortese et al., 2013; Kooij et al., 2019). The combined insights of previous articles suggest that there is no clear evidence of brain damage but there are hypo-efficient dopamine systems that give rise to neurochemical imbalances (Sagvolden and Sergeant, 1998). This explains the diagnostic criteria change from brain damage to its behavioral manifestations, as reflected in DSM-IV (Bell, 1994). These behavioral observation-based criteria lack an objective basis and may lead to misdiagnosis (Wolraich, 1999). Our goal is to develop an objective and accurate ADHD diagnostic method, which is an important application of brain imaging studies.

At present, research on ADHD neural mechanisms of pathogenesis mainly focuses on the comparison of fMRI between a large number of ADHD patients and typically developing control group (TDC) people. In children, hypoactivation in ADHD relative to comparison subjects was observed mostly in systems involved in executive function (frontoparietal network) and attention (ventral attentional network). Significant hyperactivation in ADHD relative to comparison subjects was observed predominantly in the default, ventral attention, and somatomotor networks (Cortese et al., 2012). In adult ADHD patients, low activation regions are mainly found in the frontal-parietal system, and high activation regions are in vision, dorsal attention, and default networks (Cortese et al., 2012). Another meta-analysis studied ADHD patients during inhibitory response and attention tasks by fMRI and found abnormalities in the basal ganglia network of the right hemisphere of the patient’s brain, including the subfrontal cortex, supplementary motor area, anterior cingulate cortex, dorsolateral prefrontal cortex, parietal and cerebral regions (Hart et al., 2013). In fMRI tasks of working memory, patients with ADHD had decreased activity in the bilateral frontal, frontal-parietal regions, and insula (Wu et al., 2017). A study selected five subnuclear regions, including the amygdala, caudate, putamen, globus pallidus, and hippocampus, as regions of interest. By measuring resting-state functional connectivity at the whole-brain voxel level, they studied the fundamental roles of the subcortical structures in ADHD pathogenesis and neurodevelopment, which provides new evidence to bridge the gap between neurological function and clinical manifestations in ADHD (Damiani et al., 2021). Cao found abnormalities in ADHD patients’ frontal-striatal-cerebral circuits by regional homogeneity analysis results that were confirmed by Zang’s amplitude of low-frequency fluctuation (ALFF) study, revealing that changes in spontaneous neuronal activity in these regions might be relevant to the potential morbid physiology of ADHD children in previous research results (Cao et al., 2006; Zang et al., 2007). Resting-state fMRI provides a new direction for studying the brain connectivity of ADHD patients and the morbid physiology of ADHD with learning-based classification methods (Cao et al., 2006; Zang et al., 2007).

Based on a large number of previous studies on the neural mechanism of ADHD and artificial intelligence algorithms, advanced and convenient ADHD diagnostic models have been developed. The combination of resting-state fMRI analysis and machine learning algorithms has shown profound promise in revealing pathological functional connectome (FC) patterns (Cox and Savoy, 2003; Mourão-Miranda et al., 2005; Fan et al., 2007; Pereira et al., 2009; Anderson et al., 2011; Zhang and Shen, 2012; Uddin et al., 2013; Plitt et al., 2014). With the 3D low-level features extracted from functional and structural images, researchers constructed a 3D CNN model to evaluate the local spatial pattern of MRI features and reached an accuracy of 69.15% (Zou et al., 2017). However, traditional machine learning algorithms can only extract shallow features and are deficient in data integrating ability for high-dimensional fMRI images (Kim et al., 2016; Suk et al., 2017). Existing deep learning algorithms for ADHD classification are mostly based on small datasets (Kuang et al., 2014; Kim et al., 2016; Guo et al., 2017; Heinsfeld et al., 2017), whose reproducibility and generalizability are insufficient.

To address the restrictions caused by limited data, there is a critical need to develop an approach with a more robust training methodology (Li et al., 2018). Motivated by the human learning pattern, transfer learning (Pan and Yang, 2010) has been proposed, focusing on knowledge transfer between domains. Transfer learning has been gradually applied to the diagnosis of mental disorders. In a study from the Alzheimer’s Disease (AD) Neuroimaging Initiative database, prior knowledge obtained from 10,000 normal images was applied to the classification of AD, where high competitive performance was achieved compared with other approaches (Gupta et al., 2013). Another study proposed robust multilabel transfer feature learning for the early diagnosis of AD and it effectively improved the accuracy of an AD diagnosis (Cheng et al., 2019). Transfer learning has shown great potential in the scenario of a small sample size. However, transfer learning has not yet been used to diagnose ADHD.

In addition, most of the previous ADHD automatic diagnosis models did not consider the topological characteristics of the brain network. They stopped at the individual level and failed to conduct a modular analysis of the brain network to find the differences between ADHD patients and normal people. Therefore, we proposed an integrated model that combines functional connectivity analysis and transfers learning architecture to reduce the high dimensionality of resting-state fMRI and learn a common set of features across different domains.

Materials and Methods

Datasets

Our dataset is a part of the internationally published database ADHD-200¹. ADHD-200 includes eight datasets: New York University Child Study Center (NYU), Brown University, University of Pittsburgh, Washington University, NeuroImage, Kennedy Krieger Institute (KKI), Oregon Health and Science University (OHSU), Peking University Child Study Center (Peking; ADHD-200 Consortium, 2012). To eliminate the influence of data differences between sites on the experimental results, we chose the Peking dataset, which has an ADHD patient group and a TDC healthy control group. We further removed subjects according to the following exclusion criteria to reduce demographic errors: (1) left-handed and mixed handedness; (2) resting-state fMRI images with a low signal-to-noise ratio or insufficient phenotypic data; (3) intelligence score less than 80; and (4) accompanying other diseases. Finally, 244 subjects (142 ADHD and 102 TDC) were enrolled.

Functional connectivity analysis of ADHD

Data preprocessing

We ran the Data Processing Assistant for Resting-State fMRI (DPARSFA) on the platform MATLAB (R2016a) for data preprocessing: (1) ensure each point in the image comes from the actual signal at the same time by temporal layer correction; (2) through head movement realignment, subjects with more than 2 mm translation in the X-Y-Z axis or more than 2° rotation were excluded; (3) apply spatial normalization; and (4) conduct full-width-and-half-height Gaussian kernel smoothing on the images, with a kernel size of 8 × 8 × 8 mm, to reduce the impact of the noise and improve its signal-to-noise ratio (Chao-Gan and Yu-Feng, 2010; Yan et al., 2016; Sun et al., 2021).

Pearson correlation coefficient

We applied the Brainnetome Atlas proposed by the National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences (Fan et al., 2016). We extracted the mean resting-state fMRI time (Sun et al., 2021) series from 246 ROIs of all subjects. Then, we calculated the Pearson correlation coefficient (Benesty et al., 2009) between different ROIs by CONN toolbox² (RRID:SCR_009550) and converted it to a Z value with a Fisher transform. A 246 × 246 contrast matrix was obtained. We performed a two-sample t-test and FDR correction (p-FDR< 0.05) between the two groups and then compared the differences in FC between the TDC and ADHD groups. Finally, we observed and recorded statistically significant brain regions, along with their connection strengths and scores.

Partial correlation coefficient

We calculated the inverse LASSO covariance matrix for all subjects and found brain regions with significant differences by statistical analysis (Friedman et al., 2008). The graph LASSO method is an algorithm that can quickly estimate the inverse covariance matrix. It uses l₁ panelty to increase the sparsity of the inverse covariance and the fast coordinate descent method to solve a single LASSO problem. It can solve the problem of too high dimensionality in data.

Our experiment used the Graphical LASSO estimator in the scikit-learn library and the Network template (32 ROIs) in the Python-based Nilearn library to calculate the inverse covariance matrix. To find the brain regions with significant differences in each subject, thresholding was performed on the absolute value of the partial correlation coefficient for each subject. We set the threshold to 0.1 to obtain the binary matrix for each subject.

Simultaneously, we defined the score of the i-th edge as:

Score = \frac{L_{T}}{N_{T}} - \frac{L_{a}}{N_{a}}

L_T and L_a represent the number of connections between two brain regions in the ADHD group and TDC group, respectively, while N_T and N_a represent the number of subjects in the ADHD group and TDC group, respectively. The score describes the difference between the probability of the existence of the edge in the normal control group and that in the ADHD group. We used the same method to repeatedly calculate the score value of each connected edge. Then, the binary connection matrix of all subjects was scrambled and randomly divided into two groups of 142 and 102. After that, we calculated the Score value S’ of all edges separately and repeated it 10⁵ times. For an edge, we constructed a hypothesis that presumes that there is no significant difference between the two groups. If the hypothesis is true, the following equality should be satisfied:

P = {\begin{cases} p (S = 0) S = 0 \\ p (S^{'} > S) S < 0 \\ p (S^{'} \leq S) S > 0 \end{cases} (2)

P stands for the probability that the hypothesis is true and reflects whether the edge is different between the two groups. The higher the P value is, the greater the probability that the hypothesis is true. Finally, we observed and recorded statistically significant brain regions (P < 0.001), along with their connection strengths and scores.

ADHD classification model based on transfer learning

To compare the effects of different models on TLNN, Visual Geometry Group Network (VGG; Simonyan and Zisserman, 2015) and Residual Neural Network (ResNet; He et al., 2016) were used. The TLNN model mainly consists of two parts (Figure 1). We first augmented the data and then concatenated the Pearson correlation coefficient (Benesty et al., 2009) matrix and the partial correlation matrix into a dual-channel feature to eliminate the impact of irrelevant areas. Next, we applied the parameters obtained from two CNN models pretrained on natural images to our model and fine-tuned them for joint training of classifiers in the target domain (fMRI data; Etzel et al., 2009; Tompson et al., 2014; Zhang et al., 2018). Our experiment is based on Windows 10 operating system, Anaconda 4.8.3 development platform, Python 3.7 programming language, and neural network classification framework is implemented by Tensor Flow-GPU 1.14 version.

FIGURE 1

Figure 1. ADHD classification model based on TLNN. The model training process including: (1) loading the pre-trained model, the pre-trained parameters were transferred to the target domain (fMRI image); (2) the hyperparameters obtained from the natural images were fine-tuned; (3) the VGGNet or ResNet50 models are trained on the large dataset ImangeNet; (4) the weight parameters completed by training are transferred to the fMRI image classification task; (5) the middle and lower layers of the pre-trained model are used as the feature extractor of the target task; (6) the extracted features are nonlinear mapped through the fully connected layer; and (7) the final classification result is obtained. Conv means the number of convolution kernels. FCLs means fully connected layers.

To address the effects of different strategies on TLNN, two training methods were designed. The first one was to freeze all convolutional layers, forbidding lower layers from participating in the training and only training the reset fully connected layer. The second was to fine-tune all convolutional layers, letting all convolutional and fully connected layers of the pretrained model participate in training. Furthermore, our study set up four fully connected layers (FCLs) to analyze the impact of different transfer learning strategies: (1) a softmax classifier (Wolfe et al., 2017), denoted FCLs₀; (2) a fully connected layer with 128 neurons and a softmax classifier, denoted FCLs₁₂₈; (3) a fully connected layer with 512 neurons and a softmax classifier, denoted FCLs₅₁₂; and (4) a fully connected layer with 1,024 neurons and a softmax classifier, denoted FCLs₁₀₂₄. We mainly studied the influence of the following three hyperparameters on the classification performance: optimizer, mini batch size, and epoch. Additionally, we used the Peking dataset under the same selection method mentioned above, which had 142 ADHD patients and 102 in TDC. We calculated the partial correlation coefficient and the Pearson correlation coefficient matrix of the two groups of data separately. We took the FC matrix as input to the model. First, we introduced effective size as a standard deviation analysis criterion for feature selection, which eliminates the impact of irrelevant features. Here, Cohen’s method was applied:

E S_{i} = | \frac{{\bar{x}}_{i, 1} - {\bar{x}}_{i, 2}}{S_{i}} | (3)

S_{i} = \sqrt{\frac{(n_{1} - 1) S_{(i, 1)}^{2} + (n_{2} - 1) S_{(i, 2)}^{2})}{(n_{1} + n_{2})}} (4)

${\bar{x}}_{i, 1}$ and ${\bar{x}}_{i, 2}$ represent the mean of the i-th characteristic of the ADHD patients and TDC subjects. $S_{i, 1}^{2}$ and $S_{i, 2}^{2}$ are the standard deviations of the i-th feature of the two groups. Second, by setting the threshold to 22 × 22 = 484, we saved the features with large differences between groups and removed the irrelevant features. Finally, the maximum 22 correlation coefficients were selected as the model input by the effective size.

Results

Demographics and results of the participants

Data from 244 participants (age range: 10–13 years; 180 boys and 64 girls) with usable resting-state fMRI data were used in this study. The 244 participants’ fMRI images had a low signal-to-noise ratio or sufficient phenotypic data, and none of them differed statistically significantly from the full dataset on key variables, including: (1) sex and age; (2) IQ less than 80; and (3) no other diseases. Demographic information on age, sex, attention hyperactivity/impulse, IQ, language intelligence, and operating language intelligence scores are presented in Table 1.

TABLE 1

Table 1. Demographic and clinical characteristics of the ADHD and TDC groups.