- 1The Mind Research Network, Albuquerque, NM, USA
- 2Department of ECE, University of New Mexico, Albuquerque, NM, USA
- 3Department of Psychology and Neuroscience, University of New Mexico, Albuquerque, NM, USA
- 4Olin Neuropsychiatry Research Center, Hartford, CT, USA
- 5Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
There is a growing interest in automatic classification of mental disorders based on neuroimaging data. Small training data sets (subjects) and very large amount of high dimensional data make it a challenging task to design robust and accurate classifiers for heterogeneous disorders such as schizophrenia. Most previous studies considered structural MRI, diffusion tensor imaging and task-based fMRI for this purpose. However, resting-state data has been rarely used in discrimination of schizophrenia patients from healthy controls. Resting data are of great interest, since they are relatively easy to collect, and not confounded by behavioral performance on a task. Several linear and non-linear classification methods were trained using a training dataset and evaluate with a separate testing dataset. Results show that classification with high accuracy is achievable using simple non-linear discriminative methods such as k-nearest neighbors (KNNs) which is very promising. We compare and report detailed results of each classifier as well as statistical analysis and evaluation of each single feature. To our knowledge our effects represent the first use of resting-state functional network connectivity (FNC) features to classify schizophrenia.
Introduction
Population studies show that lifetime prevalence of all psychotic disorders is as high as 4% (http://www.nimh.nih.gov/statistics/SMI_AASR.shtml). These disorders can impair normal life significantly and impose huge societal cost (Rice, 1999). Clinically, the patient's self-reported experiences and observed behavior over the longitudinal course of the illness constitute the basis for diagnosis. The overlapping symptoms of mental disorders and the absence of standard biologically-based clinical tests make differential diagnosis a challenging task. Early diagnosis of these diseases can significantly improve treatment response and reduce associated costs (McGlashan, 1998).
Advances in neuroimaging technologies in the past two decades have opened a new window into the structure and function of the healthy human brain as well as illuminating many brain disorders such as schizophrenia. Schizophrenia is among the most prevalent mental disorders affecting about 1% of the population worldwide (Wyatt et al., 1995; Bhugra, 2005). This devastating, chronic heterogeneous disease is usually characterized by disintegration in perception of reality, cognitive problems and chronic course with lasting impairment (Heinrichs and Zakzanis, 1998). Multiple structural and functional brain abnormalities are widely reported in patients with schizophrenia (Shenton et al., 2001; Calhoun et al., 2009a; Karlsgodt et al., 2010). Most neuroimaging-based studies of schizophrenia focus on showing aberrations of some features (structural or functional) in a patient group by comparing them to a control group. While many of these findings are statistically significant in the average sense, discrimination ability of those features is under question for classification purposes on a case-by-case basis. Since classification provides information for each individual subject, it is considered a much harder task than reporting group differences. In the case of classifying schizophrenia patients, a small number of training samples (subjects) and high dimensional data make it a challenging task to design an accurate, robust classifier for such a heterogeneous brain disorder.
Recently, there is a growing interest in designing objective prognostic/diagnostic tools based on neuroimaging and other data that display high accuracy and robustness. The relatively small amount of research on MRI-based classification of schizophrenia patients can be divided into three categories based on the type of discriminating features used: structural-based (Csernansky et al., 2004; Nakamura et al., 2004; Davatzikos et al., 2005; Fan et al., 2005, 2007b; Caan et al., 2006; Pardo et al., 2006; Kawasaki et al., 2007; Yoon et al., 2007; Caprihan et al., 2008; Sun et al., 2009; Takayanagi et al., 2010, 2011; Ardekani et al., 2011), functional-based (Georgopoulos et al., 2007; Calhoun et al., 2008b; Demirci et al., 2008a; Michael et al., 2008; Arribas et al., 2010; Shen et al., 2010; Castro et al., 2011) or combination of structural and functional features (Fan et al., 2007a; Ford et al., 2002).
In recent years, spontaneous modulation of blood oxygenation level-dependent (BOLD) signal during the resting condition has found fruitful clinical applications (Fox and Greicius, 2010). Resting-state fMRI (rfMRI) experiments are less prone to multi-site variability, allow a wider range of patients to be scanned and make it possible to study multiple cortical systems from one dataset (Fox and Greicius, 2010). Moreover, more accurate connectivity maps can be detected using rfMRI data compared to task-based fMRI data (Xiong et al., 1999). With considerable literature on rfMRI group comparisons, researchers have started tackling more challenging task of using the found abnormalities or so called biomarkers to discriminate patients from healthy controls. The main target of these studies has been the Alzheimer's disease (Li et al., 2002; Greicius et al., 2004; Wang et al., 2006; Supekar et al., 2008). However, rfMRI data have been rarely used for discrimination of schizophrenia (Cecchi et al., 2009; Shen et al., 2010; Du et al., 2012). Shen et al. (2010) used an atlas-based method to extract mean time-courses of 116 brain regions in the resting-state for both healthy controls and schizophrenia subjects. The correlation between these time-courses made the feature vector for each subject. By using feature selection and dimensionality reduction techniques, they reduced the dimensionality down to three where they classified patients from controls with a high accuracy (93% for patients and 75% for healthy controls).
The main purpose of this study is using resting-state functional network connectivity (FNC) features for classification of schizophrenia patients. Using functional connectivity (FC) methods, researchers have shown disrupted functional integration in schizophrenia patients (Friston and Frith, 1995; Frith et al., 1995; Josin and Liddle, 2001; Bokde et al., 2006; Mikula and Niebur, 2006; Salvador et al., 2010). Liang et al. reported decreased FC among insula, prefrontal lobe and temporal lobe and increase connectivity between cerebellum and several other brain regions. Meyer-Lindenberg et al. (2001) reported abnormal FC in fronto-temporal interactions in schizophrenia in selected regions of interest (ROIs) using positron emission tomography (PET) brain scans on working memory task. Salvador et al. (2010) reported hyper-connectivity within medial and orbital structures of the frontal lobe and hyper-connectivity between these regions and several cortical and sub-cortical structures in schizophrenia patients. FC is defined as correlation (or other kinds of statistical dependency) among spatially remote brain regions (Friston, 2002). FC analysis documents interactions among brain regions during a task as well as during rest. Two widely used FC approaches are: (a) seed-based analysis (Biswal et al., 1995, 1997; Lowe et al., 1998; Cordes et al., 2000, 2002; Stein et al., 2000; Ford et al., 2005) and (b) spatial independent component analysis (ICA) (McKeown et al., 1998; Calhoun et al., 2001a; van de Ven et al., 2004; Esposito et al., 2005; Garrity et al., 2007). In the seed-based approach, individual seed voxels from predefined brain regions of interest (ROI) are chosen and the cross correlation of other voxels' time courses (TCs) with the selected seeds then computed, to derive a correlation map. This map can then be thresholded to identify voxels showing significant FC with the seed voxels.
An alternative approach is based on ICA, a multivariate data-driven method which as a blind source separation method, can recover a set of signals from their linear mixtures and has yielded fruitful results with fMRI data (Calhoun et al., 2009b; Calhoun and Adali, 2012). ICA estimates maximally independent components using independence measures based on higher-order statistics. Compared to general linear model approaches, ICA requires no specific temporal model (task-based design matrix), making it ideal for analyzing resting state data (Kiviniemi et al., 2003). Depending on data matrix formation, one can perform either temporal or spatial ICA (sICA) on fMRI data. sICA is the predominant ICA approach used for fMRI data (McKeown et al., 1998; Calhoun et al., 2001a,b). sICA decomposes fMRI data into a set of maximally spatially independent maps and their corresponding time-courses. Each thresholded sICA map may consist of several remote brain regions forming a brain functional network. sICA generates consistent spatial maps (SMs) while modeling complex fMRI data collected during a task or in the resting-state (Turner and Twieg, 2005) although the task can result in a subtle modulation of the spatial patterns (Calhoun et al., 2008a). The dynamics of the BOLD signal within a single component is described by that component's TC. Regions contributing significantly within a given component are strongly functionally connected to each other.
There is growing interest in studying FC among brain functional networks. This type of connectivity, which can be considered as a higher level of FC, is termed FNC (Jafri et al., 2008) and measures the statistical dependencies among brain functional networks. Each functional network may consist of multiple remote brain regions. Spatial components resulting from sICA are maximally spatially independent but their corresponding time-courses can show a considerable amount of temporal dependency. This property of sICA makes it an excellent choice for studying FNC, which can be studied by analyzing these weaker dependencies among sICA TCs. These dependencies can be analyzed by correlation methods (Jafri et al., 2008) or algorithms such as dynamic causal modeling (Stevens et al., 2007) or Granger causality (Stevens et al., 2009; Havlicek et al., 2010).
It has been shown that there are significant FNC differences between schizophrenic patients and the control group in the resting-state possibly showing deficiencies in the brain functional processing in the patients (Jafri et al., 2008; Calhoun et al., 2009a, 2011). Jafri et al. (2008) reported increased FNC among frontal, temporal, visual and default-mode networks and decreased FNC between temporal and parietal networks. We hypothesized that disrupted functional integration in schizophrenia patients as captured by FNC analysis entail valuable information that can be used to discriminate patients automatically. To test our hypothesis we conducted a feasibility study of using FNC features for classification of schizophrenia patients to our knowledge for the first time. In order to show that our method can provide significant results regardless of the type of machine learning algorithm, we report the results for several linear and non-linear classification methods such as minimum least square linear classifier, Fisher's linear discriminant classifier (LDC), quadratic classifier, binary decision tree, support vector machine (SVM), k-nearest neighbor (KNN), artificial neural networks (ANN), naïve Bayes, logistic linear classifier (LLC) and dissimilarity-based classifier. Careful considerations were taken to avoid common pitfalls in automatic classification studies such as using very small cohort, using testing dataset information in the training phase and incomplete report of the results (Demirci et al., 2008b). The results show that the proposed method can classify the schizophrenia patients with very high specificity and sensitivity.
Materials and Methods
Participants and Paradigm Description
One session of resting-state fMRI data was collected from 28 healthy and 28 schizophrenic patients. Participants gave written, informed, Hartford hospital and Yale IRB approved consent at the Institute of Living and were compensated for their participation. Schizophrenia was diagnosed according to the DSM-IV TR criteria on the basis of a structured clinical interview (SCID) (First et al., 1995) administered by a research nurse and review of the medical file. Exclusion criteria included any participants with auditory or visual impairment, mental retardation (full scale IQ < 70), traumatic brain injury with loss of consciousness greater than 15 min, presence or history of any central neurological illness and a positive urine pregnancy test. Participants were also excluded if they met criteria for alcohol or drug dependence within the past 6 months or produced a positive (assessed by urine toxicology screen on the day of scanning). Although patients were slightly older than controls (SZ age = 39.7 ± 10.1; HC age = 36.5 ± 11.3), the difference was not statistically significant (two sample t-test p-value: 0.27). All but three patients and one control were right handed. Healthy participants were free of any DSM-IV TR Axis I disorder (SCID) or psychotropic medication and had no first-degree relatives with a psychotic illness.
Image Acquisition
Scans were acquired at the Olin Neuropsychiatry Research Center at the Institute of Living/Hartford Hospital on a Siemens Allegra 3T dedicated head scanner equipped with 40 mT/m gradients and a standard quadrature head coil. The transaxial functional scans were acquired using gradient-echo echo-planar-imaging with the following parameters (repeat time (TR) = 1.50 s, echo time (TE) = 27 ms, field of view = 24 cm, acquisition matrix = 64 × 64, flip angle = 70°, voxel size = 3.75 × 3.75 × 4 mm3, slice thickness = 4 mm, gap = 1 mm, 29 slices, ascending acquisition). Six “dummy” scans were performed at the beginning to allow for longitudinal equilibrium, after which the paradigm was automatically triggered to start by the scanner. The resting state scan consisted of one 5 min run.
Proposed Approach
The block diagram in Figure 1 shows our approach. We divided the data into separate training (16 healthy subjects + 16 patients) and testing (12 healthy subjects + 12 patients) randomly. The raw fMRI data was first preprocessed. Then the training data were analyzed with group ICA. Subject specific SMs and time-courses were computed using back reconstruction. Next, FNC analysis was performed on the subject specific ICA time-courses. FNC was calculated between each pair of selected components.
Figure 1. The proposed approach. The pink blocks on the top show the feature extraction steps. The statistical analysis box (green) is not part of the classification approach. The light green blocks describe the classification stage. Orange clouds indicate the corresponding figures and tables in the Results section.
Several classifiers were trained using the training data and were evaluate using the testing data. Leave-one-out cross validation (LOOCV) inside the training set was used to select the hyperparameters for the classifiers. The optimum parameters for relevant classifiers were selected based on the averaged validation error over 32 validation iterations. In the testing phase, a separate ICA was performed on the testing dataset and the extracted brain networks where matched with those of training ICA based on maximum Pearson correlation coefficient. Finally, performances of trained classifiers were evaluated using the testing features.
As a supplementary study, the FNC features were statistically analyzed within each group of subjects using one sample t-tests and between groups using two-sample t-tests on the training dataset. Statistical test within each group test the null hypothesis that each feature has a mean of zero. Features surviving the test have non-zero mean which is statistically significant (which tells us there is a significant correlation between the pair of components). Two sample t-tests between groups test the null hypothesis that corresponding FNC features in the two groups (controls and patients) have the same mean. Features surviving this test are the ones significantly (from a statistical point of view) different between control and patient groups (and tell us that the correlation between the pair of components is greater in one group compared to the other group). Note that these results are presented for descriptive purposes but were not used for feature selection or at all in the classification process. Each of the blocks in Figure 1 is described in more details in the following subsections.
Preprocessing
Data were preprocessed using SPM5 software (http://fil.ion.ucl.ac.uk), motion corrected, spatially normalized into standard MNI space and slightly subsampled to voxel size 3 × 3 × 3 mm3, resulting in 53 × 63 × 46 voxels. Next, spatial smoothing with a 10 × 10 × 10 mm3 FWHM Gaussian kernel was performed.
Group ICA and back reconstruction
Prior to the ICA, data dimensionality was reduced at two levels using principal component analysis (PCA). First at the subject level, dimensionality was reduced to 80. Then reduced data from all subjects and all sessions were concatenated together and put through another reduction step. The number of components for the second level reduction was estimated to be 20 by minimum description length (MDL) criterion (Li et al., 2007). This is also the number of IC components. Note the MDL is a data driven approach, so it is not dependent on whether data are collected at rest or during a task.
Infomax group sICA (Calhoun et al., 2001a) was conducted to decompose the aggregated data into components using GIFT software (http://icatb.sourceforge.net/). SICA applied to fMRI data identifies temporally-coherent networks (TCNs) by estimating maximally independent spatial sources, referred to as SMs and their corresponding TCs.
In order to validate the number of ICA components chosen by MDL and also measure the robustness of each of them, ICA was repeated 10 times using ICASSO (http://www.cis.hut.fi/projects/ica/icasso). Each time, the ICA algorithm was started from a different initial point and the resulting components were clustered to estimate the reliability of the decomposition (Himberg et al., 2004). Robustness and reliability of components were well validated by ICASSO results showing compact clusters.
In order to estimate subject-specific SMs and TCs, a back-reconstruction approach based on PCA compression and projection was used (Calhoun et al., 2001b; Erhardt et al., 2010). Subject-specific TCs were reconstructed separately for patients and controls.
Component selection
SMs were reconstructed and converted to Z values for each of the subjects. All of the components were visually inspected and the non-artifactual components were selected. Non-artifactual components are expected to have peak activation in the gray matter and have low spatial overlap with known ventricles, vascular, motion and susceptibility artifacts.
Functional network connectivity
The FNC toolbox (http://mialab.mrn.org/software/#fnc) was used for the FNC analysis. As mentioned before, significant temporal correlation can exist among the sICA TCs. The FNC toolbox computes maximum lagged correlation among the components. The maximum lagged correlation was computed as in (Jafri et al., 2008). First the TCs of the ICA components were interpolated to allow us detection of any delays less than the TR of the scanner (Calhoun et al., 2000; Ford et al., 2005). We assume ρ for the Pearson correlation coefficient between two TCs named and of dimension T × 1 where T is the number of time points in TCs. Starting reference point of the TCs is named io and Δi represents the non-integer change in time. ρΔi represents the Pearson correlation between which is vector at the reference time point io and which is vector shifted Δi from the reference time point. This correlation between the overlapping points of and can be computed as follows:
The ρΔi vector is calculated for each pair of TCs when one of TCs is shifted Δi units from −3 to +3 s (i.e., ± 2 TR). The maximum correlation and the corresponding lag is calculated and saved for each of the subjects and separately for rest and task. Allowing lag between signals is important to account for variations in hemodynamic response shapes among brain regions as well as among subjects. Although the lag can give an idea of temporal order of fMRI TCs, the source of the lag is not completely understood and could be due to mixture of functional and physiological effects. For these reasons, we will not report any analysis on the lag parameter in this paper. The lag corresponding to the maximum correlation was checked to be distributed in ±3 s interval and often away from its maximum or minimum.
Prior to computing correlations, ICA TCs were filtered. There are reports that show task related and other interesting information resides in lower frequencies while noise and artifacts contributes mostly to the higher frequency contents of the TCs (Cordes et al., 2001). We applied a bandpass Butterworth filter with cut-off frequencies at 0.017 Hz and 0.15 Hz to the ICA TCs. Also, we regressed out the motion parameters from the FNC values to remove any movement bias from the analysis.
Statistical analysis
For all FNC analyses, correlations were transformed to z-scores using Fisher's transformation [z = arctanh(r)]. Then, robustness of maximum lagged correlation between each pair of TCs was tested separately for rest and task using t-tests. Finally, to determine the significant differences of rest versus task, paired t-tests were conducted on the two groups. The cut-off p-value for all of the tests was set at p < 0.05 and was corrected for multiple comparisons using the false discovery rate (FDR) method (Genovese et al., 2002).
Classification
We evaluated the performance of several well-known linear and non-linear classifiers. This will give us a better view of the complexity of the features. If simpler classifiers (such as linear classifiers) classify the data successfully, it means that the features have a simple structure (classes are almost linearly separable). However, if just complicated non-linear classifiers classify the data successfully, it is an indication that data has a more complex structure. The decision boundary in a linear classifier is a hyperplane while in a non-linear classifier the boundary can take any shape. In another sense, the classifiers can be divided into generative and discriminative. In generative classifiers, the probability density functions (pdf) of all classes are modeled and the Bayes theorem gives the posterior probabilities. On the other hand, discriminative classifiers try to estimate the posterior probability directly or skip the challenging step of pdf estimation and determine the decision boundary based on the observed data (discriminant methods). Generative methods are often simpler and more computationally efficient but require estimation of pdf which require substantial amount of data. For complex data sets with few training samples, discriminative methods yield a better performance. It should be noted that in this study we computed the prior probabilities for the two classes from the data (which is equal) since the distribution of the data is very different from the real prevalence of schizophrenia (around 1%). All classifiers were implemented using Matlab (MathWorks, Inc.). Naïve Bayes, logistic linear and quadratic classifiers along with decision trees (DT) were implemented using PRTools (http://www.prtools.org) which is a Matlab-based pattern recognition toolbox (Duin et al., 2007). In this section, these methods will be briefly reviewed.
Linear methods.
Linear Bayes normal classifier. This simple classifier assumes Gaussian pdf for both classes with equal covariance matrices but different means. The joint covariance matrix is the weighted average of class covariance matrices (weighted by prior probabilities). Using the Bayes rule, these assumptions lead to a linear decision boundary. This classifier is also called LDC (Duda et al., 2001).
Fisher linear classifier (FLC). Fisher's linear discriminant views classification as a dimensionality reduction task. Fisher formulation tries to maximize class mean separation while minimizing class overlap during linear dimension reduction. This choice of direction for projection can be used as a linear classifier in a two class problem. Fisher' linear classifier is special case of minimum least square linear classifier (Bishop, 2006).
Logistic linear classifier (LLC). Logistic regression in method of learning functions from f : X → Y. X = [X1 X2 … Xn] is the training vector with n variables and is the target value (class). Logistic regression assumes a parametric for the distribution P(Y|X). The parameters are estimated from the training data. Assuming that is binary (two class problem), the logistic regression can be formulated as below:
One of the nice properties of the logistic regression is its ability to provide a linear discriminant between the two classes. Each new object is assigned to a class that has a larger probability for that object. Simplifying this rule results in a classification rule:
LLC also provides the weight for each feature so it can be used to rank the features.
Linear perceptron classifier. This classic linear discriminant tries to minimize the error function which is the number of misclassifications. This classifier can be considered as simple feed forward ANN (Rosenblatt, 1958). First the input vector is transformed using a non-linear transformation to give a feature vector. The algorithm then tries to change the weight vector of the neural network using gradient stochastic descent algorithm to minimize the error in an iterative manner. At each iteration, the weight vector of the network is manipulated by perceptron learning rule. The perceptron convergence theorem guarantees that the perceptron learning algorithm can find the solution in finite number of steps if such a solution (data is linearly separable) exists (Block et al., 1962).
Linear support vector machine (SVM). Over the last 15 years following the work by Cortes et al. (Cortes and Vapnik, 1995), SVM has proven useful in many machine learning and pattern recognition analysis problems. Moreover, when data classes are heterogeneous with few training samples, SVMs appear to be especially beneficial (Melgani and Bruzzone, 2004). This binary classifier aims at finding a hyperplane that maximizes the margin between the two classes. The training samples closest to the decision boundary are called support vectors. By allowing a margin (called soft margin) that allows for misclassification of some noisy samples, SVMs avoid the overfitting problem.
Non-linear methods.
K-nearest neighbor. KNN is a method of classifying objects based on proximity to the training samples (Cover and Hart, 1967). This instance-based learning method is among the simplest machine learning approaches. Each object is classified by the majority voting of the training samples in the neighborhood. The most common class among the k nearest neighbors is determined and is assigned to the object (Bremner et al., 2005). KNN can result in complex decision boundaries. The optimum k is determined by cross validation. Different distance metrics such as Euclidean, city block, cosine and correlation can be used to measure the proximity of the samples. KNN is fast, simple and guarantees an error rate no worse than twice the Bayes error if the amount of data approach infinity. We used just Euclidean distance metric in our analysis.
Naïve Bayes classifier (NBC). The naïve Bayes classifier is a simple generative classifier based on Bayes theorem. The naïve assumption of NBC is that it assumes independence among the features. Although this over-simplified assumption is violated in most of the machine learning problems, this approach worked very well for many complex problems even when the independence assumption is not valid (Domingos and Pazzani, 1997; Rish, 2001). One of the main advantages of NBC is that it requires small amount of data to estimate the parameters of pdf function for each feature. Since the features are assumed to be independent, the joint pdf of the features is simply the multiplication of individual pdfs of each feature. When dealing with continuous data, typically Gaussian distribution is assumed for each feature. The pdf parameters are estimated from the training data. NBC works quite well in anti-spam filtering problems (Seewald, 2007).
Quadratic Bayes normal classifier. Quadratic discriminant analysis (QDC) is closely related to linear discriminant analysis. It assumes that the data is normally distributed with different mean and covariance matrices. This results in a quadratic decision boundary (Duda et al., 2001).
Binary decision tree. DT find use in a wide range of applications. DT partitions the input space into cubic regions. In classification a class label is assigned to each region in the input space. Interpretability of the DT makes them very popular specially in medical diagnosis (Bishop, 2006). Each decision is a result of a sequence of binary decisions. In order to learn a model from the training samples, the structure of the tree and the threshold value for each node should be determined. There are many variations of DT but most of them rely on the top–down greedy search in the space of possible trees called ID3 algorithm (Quinlan, 1987) and its successor C4.5 (Quinlan, 1993). Selecting optimal tree structure is usually infeasible due to large number possibilities. Usually the tree is started with a single root node and then at each step one node is added to the tree. This is called greedy strategy for growing the tree. At each node an attribute (feature) should be selected to be tested. There are several criteria to measure the worth of each feature such as information gain, diversity index, Fisher's criterion (the same used in Fisher discriminant analysis) and gain ratio. The threshold values and structure of the tree is chosen so that the classification error is minimized. A criterion to stop growing the tree (pruning) should also be devised. Often the tree is fully grown and then the tree is pruned back to find the best tree for that structure. Graphical representation and human interpretability of the DT makes them very popular. However, since the edges of the decision regions are aligned with the axis of the feature space they are very suboptimal (Bishop, 2006). One of the main advantages of DT is interpretability. Moreover, they show the importance of each feature for classification in a graphical illustration.
Artificial neural networks. Multilayer ANN is the extension of linear perceptron classifier. These networks can result in complex non-linear decision boundaries. A well-known structure for a tree layer structure: Input layer, hidden layer and output layer. Each neuron in each layer has connections to other neurons of the subsequent layers. Non-linear transfer function of the neurons in the hidden layer can take any form such as sigmoid. The weights of the nodes are changed using a technique called backpropagation (Werbos, 1990). At each iteration, the output of the network is compared to correct answers and based on a predefined error function, an error value is computed. This error is fed back to the network and the weights of each node are adjusted to minimize this error. This can be done by gradient descent technique if the activation function is differentiable. Other method of minimizing the error is using Levenberg–Marquardt algorithm (Levenberg, 1944).
Another class of ANN uses radial basis activation function in the hidden layer (Chen et al., 1991). Usually this kind of network requires more neurons than standard feed forward back-propagation network but can be trained much faster. Topology of ANN used in this study can be found in the Results section.
Non-linear support vector machine. By using the kernel trick, SVM can map the not-linearly separable data into a higher dimensional space where the samples are hopefully lineally separable. This mapping to higher dimensional space is difficult, but since SVM formulation depends on the inner product of each of training samples with the support vectors, the kernel is defined as this inner product so the problem is solved in the same fashion as the linear case. There are many kernel functions but the most widely used ones are Gaussian radial basis function (RBF) and polynomial kernel. There is at least one parameter in a kernel (except for the linear kernel) which should be optimized along with the soft margin usually by grid search over reasonable values of that parameter. RBF and polynomial kernels are defined as below:
In the above equations, support vectors are denoted by xi and each training point is denoted by x. σ is a parameter proportional to the width of the RBF kernel. p is the degree of the polynomial kernel. A detailed mathematical formulation of SVM can be found in Burges (1998).
Parameter selection
The parameters for each classifier were selected by grid search. Unfortunately, there is no exact theoretical solution for the optimum value for most of the parameter. The parameters were selected based on the average validation error.
Effect of Medication
One limitation of this study is the fact that patients are medicated. It is highly desirable to evaluate the performance of the proposed method on diagnosed but not yet medicated schizophrenia patients. It has been shown that antipsychotic medications have a normalizing effect on the functionality of the schizophrenia patients' brain (Davis et al., 2005). Moreover, prior fMRI and EEG studies on not medicated schizophrenia patients have reported altered FC (Omori et al., 1995; Meyer-Lindenberg et al., 2005).
It has been shown that the main targets of antipsychotic treatments in schizophrenia patients are cortical and subcortical motor networks (Wenz et al., 1994; Muller et al., 2003; Rogowska et al., 2004; Abbott et al., 2011). Recently the effect of antipsychotic treatment on resting-state FNC was studied (Lui et al., 2010) and it was shown that after treatment patients showed three connectivity changes compared to healthy controls. From these three changes only one (FNC between the temporal and parietal network) was present in this study. To further reduce the effect of medication on classification results, we repeated the classification with all described methods on reduced set of features where the motor network related features along with temporal-parietal FNC feature were excluded.
Results
From the 20 ICA components, 9 components were selected as non-artifactual, relevant networks. Since we selected nine IC components and we were interested in connectivity between each pair of networks, we ended up with 36 FNC features for each subject . Figure 2 illustrates the SMs of the selected IC components. These networks are: auditory network (IC #2), frontal-parietal networks (IC #6 and 9), default-mode networks (IC #12, 13, and 19), visual networks (IC #15 and 20) and motor network (IC # 18). Detailed information of each spatial map such as regions of activation, Brodmann area, volume and peak activation t-value and coordinates are provided in Table 1.
Table 1. Brain regions, corresponding Brodmann areas, volumes, maximum t-values and spatial coordinates of each component in talairach space.
The maximum lagged correlation was computed for each of the subjects in each group. For each of the correlation pairs, student t-test was conducted with an FDR-corrected p-value threshold of 0.05 to identify significant correlations. Figure 3 shows the average correlation and the corresponding t-values. The black circles determine the correlation pairs that survived the t-test. It is seen that there are more significant correlation pairs (12) in the control group compared to patients group (10). Interestingly, the mean correlation between the auditory network (IC #2) with each of the visual networks (IC #15 and 20) and the motor network (IC #18) is significant for the healthy group but not for the patients. To determine which correlation pairs are significantly different between the two groups, two sample t-tests were conducted with a FDR corrected p-value threshold of 0.05. Also a mean correlation difference between the two groups (control-patients) was computed for each correlation pair. These results are shown in Figure 4. Starred pairs indicate those features surviving the paired t-test.
Figure 3. Left: Mean of correlation pairs for controls and patients. Right: T-value of each correlation pair resulted from student t-test with p-value threshold of 0.05 corrected for FDR. Black circles indicate the pairs surviving the t-test.
Figure 4. Left: Mean correlation difference between control subjects and patients (control-patient). Right: T-value resulting from two sample t-test with p-value threshold of 0.05 corrected for FDR. Stars show pairs that survived the paired t-test.
The classification results on the testing dataset for described classification methods (section Classification) are summarized in Table 2. For each method, overall classification accuracy, sensitivity, specificity, positive predicative value (PPN) and negative predictive value (NPV) are provided. Moreover, we reported the Wilson's binomial confidence interval (Wilson, 1927) for each classifier. For relevant methods, the choice of parameters selected during the training phase along with the topology of ANN s are also included in Table 2. As discussed in section Effect of Medication, to reduce the effect of medication on the classification results we repeated the analysis on the reduced set of features. Out of 36 features, 9 features that were more susceptible to medications were excluded from the feature set and the whole classification was repeated on the remaining 27 features. The excluded features are 8 motor related features (all FNC features involving IC18) along with a temporal-parietal feature (FNC between IC2 and IC15). The results are summarized in Table 3.
One of the main advantages of using DT is the graphical representation. One can represent decision alternatives and possible outcomes schematically. The visual approach is particularly helpful in comprehending sequential decisions and outcome dependencies. DT for both the Fisher's and information gain criteria are illustrated in Figures 5, 6, respectively.
Figure 5. Fisher's decision tree using full set of features. This tree includes 8 features in 10 nodes.
Figure 6. Information gain decision tree using full set of features. This tree includes six features in six nodes.
Discussion and Conclusions
We investigated whether resting-state FC features are able to discriminate between schizophrenia patients and healthy control groups. Using group ICA, the training dataset was decomposed into independent spatial components and their corresponding TCs. Then, FNC was computed between each pair of functional networks on the back reconstructed data using the maximum lagged correlation method. Several linear and non-linear classifiers were trained using the training data and were evaluated using the testing data. One of the common pitfalls in classification of mental diseases is using cross-validation to measure the generalized error (Wood et al., 2007; Demirci et al., 2008b). Another pitfall is selection of parameter/model in a way that maximize the performance in the final classifier in the testing dataset (Demirci et al., 2008b). To avoid this, we used separate training and testing datasets. Separate ICAs were performed on training and testing datasets. Cross validation was used in the training phase just for parameter/model selection. ICA successfully extracted similar non-artifactual networks from both training and testing datasets. This not surprising since it has been shown that there are several consistent functional networks across subjects in the resting state (Damoiseaux et al., 2006; Smith et al., 2009; Allen et al., 2011).
The high accuracy of different classifiers in this study consolidates the disconnection hypothesis in schizophrenia patients (Friston and Frith, 1995; Frith et al., 1995; Josin and Liddle, 2001; Bokde et al., 2006; Mikula and Niebur, 2006; Salvador et al., 2010). Using FC methods, researchers have shown disrupted connectivity patterns in schizophrenia patients during rest and task in several brain regions (Meyer-Lindenberg et al., 2001; Boksman et al., 2005; Honey et al., 2005; Liang et al., 2006; Jafri et al., 2008). In our experiment, connectivity between two DMN nodes (IC #12 and 13) was found to be significantly lower in schizophrenia patients compared to healthy controls (Figure 4). This reduced within DMN connectivity is interesting and in line with recent findings (Camchong et al., 2011; Mingoia et al., 2012; Orliac et al., 2013). One explanation can be gray matter thinning and greater psychopathology in patients(Goghari et al., 2007; Jang et al., 2011). Some recent DTI studies have shown anatomical disconnection in several brain regions in temporal and frontal lobe in schizophrenia patients (Buchsbaum et al., 2006). Moreover some studies have associated anatomical damage and FC disconnection in patients by analyzing DTI and functional data together (Zhou et al., 2008). This anatomical-functional association may be the reason for successful automatic diagnosis studies using DTI (Caprihan et al., 2008; Ardekani et al., 2011) and fMRI studies (Georgopoulos et al., 2007; Calhoun et al., 2008b; Demirci et al., 2008a; Michael et al., 2008; Arribas et al., 2010; Shen et al., 2010). While anatomical studies using either DTI or structural MRI are popular in classification of schizophrenia patients, functional studies are limited mostly to task-based studies. Resting-state studies in case of classification of schizophrenia are rare and have been just recently started (Shen et al., 2010; Venkataraman et al., 2012). Most of the connectivity fMRI studies (resting-state or task-based) have used FC features which means that the features are temporal statistical dependencies among brain regions. Using FC methods have some limitation such as the choice of seed-voxel in each region (that may be different for patients and controls) and very high number of extracted features. Shen et al., extracted average time-courses from 116 brain regions which means 6670 features for each subject. High number of features requires additional step such as feature selection and reduction to avoid curse of dimensionality. Moreover, most of the features in that fashion are not discriminative. Using FNC on the other hand, doesn't require seed-voxel selection. Moreover, the number of extracted features is much less than FC methods (36 features in our experiment based on 9 functional networks). Based on our experiments, it can be inferred that FNC methodology is a concise abstraction of the connectivity pattern in the brain that can successfully capture the differences between schizophrenia patients from healthy controls.
We have reported detailed classification results (sensitivity, specificity, positive predictive value and negative predictive value) as well as Wilson's binomial confidence interval for each classification method. The classification results in Table 2 show that non-linear methods outperform linear methods, which was expected. Among the linear methods, LDC, Perceptron and linear SVM performed above the chance (lower bound of Wilson's binomial confidence interval is greater than 50%). All linear methods show high specificity than sensitivity. Except for quadratic classifier, all non-linear methods, performed above the chance. In overall, discriminative approaches outperformed generative methods. As a general rule in this study, the less assumptions about the data, the better the performance. Simple classifiers such as KNN and decision tree performed very well on this specific machine learning problem. Also, non-linear SVM showed significant performance with only one misclassified sample. Despite of oversimplified assumptions and little training data available in this study, the performance of naïve Bayes is marginally above the chance (79.17% overall performance). A poor classification was achieved using the quadratic classifier. It can be hypothesized that whether the assumptions of this classifier that two classes are normally distributed with different mean and covariance matrices are not valid or small amount of data is not sufficient to accurately estimate the mean and covariance matrix of each classifier. It should be noted that conclusions regarding the performance of different classifiers are limited to this specific problem using one dataset. Performance of each machine learning algorithm depends on the dataset and comparison among different classifiers has been heavily investigated in the machine learning literature. Since our main goal is not comparing classifiers, we didn't conduct statistical tests to compare their performances and just reported Wilson's binomial score interval for each classifier.
Table 3 shows the result of classification on reduced set of features. Surprisingly, the overall error was reduced for all the linear methods except for linear perceptron. The main reason for this phenomenon may be the curse of dimensionality (Pearlson, 2009) since we have only 32 samples for training and 36 features. Using the reduced feature set (27 features), most of the linear methods could estimate more accurate hyperplane. Linear SVM performs robustly and equally on both full and reduced set of features. Most non-linear classifiers still show above the chance performance with lower overall performance compared to the full feature set. KNN still classifies with high accuracy. Again, QDC performed very poorly. In overall, reduction of features didn't greatly affect the results and very high performances were still achievable. This suggests that medication didn't bias the classification.
DT don't transform the data from the original feature space. Moreover, they classify the data based on thresholds they put on each of the features. This makes it possible for the investigator to observe the decision tree and analyze it. One can see how features are distributed in different levels of the decision tree and what thresholds on which features discriminate the classes. This property is especially of interest in the medical diagnosis field since decision tree provides classification structure which includes thresholds on the symptoms. This discriminative information of each feature is very valuable in medical problems.
In our problem the symptoms are FNC features. One can observe that how each feature discriminate the two groups. This information may reflect FNC abnormalities in schizophrenia patients. First of all decision tree introduces the important features which are 8 and 6 in Figures 5, 6, respectively. Top node features are among the most important features which are among the feature identified by the two-sample t-test. Also, the decision tree can identify the type abnormality which is discriminative between the two groups. For example, it is seen from Figure 5 that subjects with temporal-motor FNC lower than 0.34 and temporal-visual higher than 0.25 are patients. Or from Figure 6 it is evident that all subjects with temporal-visual FNC lower than 0.57 are healthy controls. In other words, all patients have higher temporal-visual FNC (as do some of the healthy controls).
Figures 5, 6 illustrate Fisher's and information gain DT, respectively (full set of features). Fisher's decision tree includes eight features in nine nodes. Information gain decision tree includes six features in six nodes. It is interesting that using small subset of features, DT perform well in classification. Fisher's decision tree outperforms information gain tree when full set of feature are used but it is more complicated. However, both trees can be considered very simple. Using reduced set of features, information gain tree outperforms Fisher's tree.
Prior studies mentioned in the Introduction section reported accuracies ranging from 79 to 98%. Several limitations and considerations make it very hard to compare different approaches of automatic classification of mental disorders. For example, study size, MRI scanner parameters, nature of extracted features, type of classifier, medication and disease severity in the patient group varies among the different studies. In the absence of standard training and testing datasets, comparison of different approaches based only on the classification rate is ambiguous.
One of the issues in the current study was that the patients were slightly older than healthy controls. We looked at the misclassified subjects in each of the classification experiments and couldn't find any systematic age pattern. Note also, it has been shown that schizophrenia patients have stronger FNC (Jafri et al., 2008) whereas subjects that are older have reduced FNC (Allen et al., 2011). So, this potential confound would likely have a canceling effect making the diagnosis even harder. Regardless, based on the above observation we do not believe age is a factor in our classification results. To avoid any bias, we also repeated the classification when age was regressed out from the FNC features and exactly same performance was achieved.
In this study, we separated the data into training and testing dataset. One may wonder how our method works in a clinical situation when we have only one new subject. We assume that we have trained our model using enough training data. In this situation here are two options: (1) we can use the group ICA components of the training data as regressors and calculate the subject specific time-courses. (2) For a more accurate estimation another ICA can be done on an extended dataset containing training data and the new subject data. Note that we won't use the information of this new ICA analysis for training the classifiers/models but just to extract IC networks/time-series for the new test subject. This approach is more accurate but slower especially in the case of big training data. Since the main goal of this paper is to study the feasibility of using FNC features, we didn't investigate methods.
In this study we showed that the resting state FNC features can be successfully exploited in order to automatically discriminate schizophrenia patients. To the best of our knowledge this the first study using resting-state FNC features to classify schizophrenia patients. Acquiring scans from schizophrenia patients is more feasible in the resting state due to the short acquisition time and avoidance of cognitive task-related impairment confounds. Moreover, the data is less prone to multi-site variability (Pearlson and Calhoun, 2009). It was demonstrated that just 5 min resting state data can be used to classify patients reliably and accurately using FNC features and simple classifiers such as KNN. Moreover, performance of several linear and non-linear methods were evaluated and compared.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
This study was supported by NIH/NIBIB 2R01 EB000840 and NCRR/NIGMS 5P20RR021938/P20GM103472
References
Abbott, C., Juarez, M., White, T., Gollub, R. L., Pearlson, G. D., Bustillo, J., et al. (2011). Antipsychotic dose and diminished neural modulation: a multi-site fMRI study. Prog. Neuro-Psychopharmacol. Biol. Psychiatry 35, 473–482. doi: 10.1016/j.pnpbp.2010.12.001
Allen, E. A., Erhardt, E. B., Damaraju, E., Gruner, W., Segall, J. M., Silva, R. F., et al. (2011). A baseline for the multivariate comparison of resting-state networks. Front. Syst. Neurosci. 5:2. doi: 10.3389/fnsys.2011.00002
Ardekani, B. A., Tabesh, A., Sevy, S., Robinson, D. G., Bilder, R. M., and Szeszko, P. R. (2011). Diffusion tensor imaging reliably differentiates patients with schizophrenia from healthy volunteers. Hum. Brain Mapp. 32, 1–9. doi: 10.1002/hbm.20995
Arribas, J. I., Calhoun, V. D., and Adali, T. (2010). Automatic Bayesian classification of healthy controls, bipolar disorder, and schizophrenia using intrinsic connectivity maps from FMRI data. IEEE Trans. Biomed. Eng. 57, 2850–2860. doi: 10.1109/TBME.2010.2080679
Bhugra, D. (2005). The global prevalence of schizophrenia. PLoS Med. 2:e151; quiz e175. doi: 10.1371/journal.pmed.0020151
Biswal, B., Yetkin, F. Z., Haughton, V. M., and Hyde, J. S. (1995). Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magn. Reson. Med. 34, 537–541. doi: 10.1002/mrm.1910340409
Biswal, B. B., Van Kylen, J., and Hyde, J. S. (1997). Simultaneous assessment of flow and BOLD signals in resting-state functional connectivity maps. NMR Biomed. 10, 165–170.
Block, H. D., Knight, B. W., and Rosenblatt, F. (1962). Analysis of a 4-layer series-coupled perceptron 2. Rev. Mod. Phys. 34:135. doi: 10.1103/RevModPhys.34.135
Bokde, A. L., Lopez-Bayo, P., Meindl, T., Pechler, S., Born, C., Faltraco, F., et al. (2006). Functional connectivity of the fusiform gyrus during a face-matching task in subjects with mild cognitive impairment. Brain J. Neurol. 129, 1113–1124. doi: 10.1093/brain/awl051
Boksman, K., Theberge, J., Williamson, P., Drost, D. J., Malla, A., Densmore, M., et al. (2005). A 4.0-T fMRI study of brain connectivity during word fluency in first-episode schizophrenia. Schizophr. Res. 75, 247–263. doi: 10.1016/j.schres.2004.09.025
Bremner, D., Demaine, E., Erickson, J., Iacono, J., Langerman, S., Morin, P., et al. (2005). Output-sensitive algorithms for computing nearest-neighbour decision boundaries. Disc. Comput. Geom. 33, 593–604. doi: 10.1007/s00454-004-1152-0
Buchsbaum, M. S., Friedman, J., Buchsbaum, B. R., Chu, K. W., Hazlett, E. A., Newmark, R., et al. (2006). Diffusion tensor imaging in schizophrenia. Biol. Psychiatry 60, 1181–1187. doi: 10.1016/j.biopsych.2005.11.028
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167. doi: 10.1023/A:1009715923555
Caan, M. W. A., Vermeer, K. A., van Vliet, L. J., Majoie, C. B. L. M., Peters, B. D., den Heeten, G. J., et al. (2006). Shaving diffusion tensor images in discriminant analysis: a study into schizophrenia. Med. Image Anal. 10, 841–849. doi: 10.1016/j.media.2006.07.006
Calhoun, V., Adali, T., Kraut, M., and Pearlson, G. (2000). A weighted least-squares algorithm for estimation and visualization of relative latencies in event-related functional MRI. Magn. Reson. Med. 44, 947–954.
Calhoun, V. D., and Adali, T. (2012). Multisubject independent component analysis of fMRI: a decade of intrinsic networks, default mode, and neurodiagnostic discovery. IEEE Rev. Biomed. Eng. 5, 60–73. doi: 10.1109/RBME.2012.2211076
Calhoun, V. D., Adali, T., Pearlson, G. D., and Pekar, J. J. (2001a). A method for making group inferences from functional MRI data using independent component analysis. Hum. Brain Mapp. 14, 140–151. doi: 10.1002/hbm.1048
Calhoun, V. D., Adali, T., Pearlson, G. D., and Pekar, J. J. (2001b). Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms. Hum. Brain Mapp. 13, 43–53. doi: 10.1002/hbm.1024
Calhoun, V. D., Eichele, T., and Pearlson, G. (2009a). Functional brain networks in schizophrenia: a review. Front. Hum. Neurosci. 3:17. doi: 10.3389/neuro.09.017.2009
Calhoun, V. D., Liu, J., and Adali, T. (2009b). A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data. Neuroimage 45, S163–S172. doi: 10.1016/j.neuroimage.2008.10.057
Calhoun, V. D., Kiehl, K. A., and Pearlson, G. D. (2008a). Modulation of temporally coherent brain networks estimated using ICA at rest and during cognitive tasks. Hum. Brain Mapp. 29, 828–838. doi: 10.1002/hbm.20581
Calhoun, V. D., Maciejewski, P. K., Pearlson, G. D., and Kiehl, K. A. (2008b). Temporal lobe and “default” hemodynamic brain modes discriminate between schizophrenia and bipolar disorder. Hum. Brain Mapp. 29, 1265–1275. doi: 10.1002/hbm.20463
Calhoun, V. D., Sui, J., Kiehl, K., Turner, J., Allen, E., and Pearlson, G. (2011). Exploring the psychosis functional connectome: aberrant intrinsic networks in schizophrenia and bipolar disorder. Front. Psychiatry 2:75. doi: 10.3389/fpsyt.2011.00075
Camchong, J., MacDonald, A. W. 3rd., Bell, C., Mueller, B. A., and Lim, K. O. (2011). Altered functional and anatomical connectivity in schizophrenia. Schizophr. Bull. 37, 640–650. doi: 10.1093/schbul/sbp131
Caprihan, A., Pearlson, G. D., and Calhoun, V. D. (2008). Application of principal component analysis to distinguish patients with schizophrenia from healthy controls based on fractional anisotropy measurements. Neuroimage 42, 675–682. doi: 10.1016/j.neuroimage.2008.04.255
Castro, E., Martinez-Ramon, M., Pearlson, G., Sui, J., and Calhoun, V. D. (2011). Characterization of groups using composite kernels and multi-source fMRI analysis data: application to schizophrenia. Neuroimage 58, 526–536. doi: 10.1016/j.neuroimage.2011.06.044
Cecchi, G., Rish, I., Thyreau, B., Thirion, B., Plaze, M., Martinot, J.-L., et al. (2009). “Discriminative network models of Schizophrenia,” in Neural Information Processing Systems (Vancouver, Canada).
Chen, S., Cowan, C. F. N., and Grant, P. M. (1991). Orthogonal least-squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2, 302–309. doi: 10.1109/72.80341
Cordes, D., Haughton, V., Carew, J. D., Arfanakis, K., and Maravilla, K. (2002). Hierarchical clustering to measure connectivity in fMRI resting-state data. Magn. Reson. Imag. 20, 305–317. doi: 10.1016/S0730-725X(02)00503-9
Cordes, D., Haughton, V. M., Arfanakis, K., Carew, J. D., Turski, P. A., Moritz, C. H., et al. (2001). Frequencies contributing to functional connectivity in the cerebral cortex in “resting-state” data. AJNR. Am. J. Neuroradiol. 22, 1326–1333.
Cordes, D., Haughton, V. M., Arfanakis, K., Wendt, G. J., Turski, P. A., Moritz, C. H., et al. (2000). Mapping functionally related regions of brain with functional connectivity MR imaging. AJNR. Am. J. Neuroradiol. 21, 1636–1644.
Cortes, C., and Vapnik, V. (1995). Support-vector networks. Mach. Learn. 20, 273–297. doi: 10.1007/BF00994018
Cover, T. M., and Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Trans. Inform. Theory 13, 21–27. doi: 10.1109/TIT.1967.1053964
Csernansky, J. G., Schindler, M. K., Splinter, N. R., Wang, L., Gado, M., Selemon, L. D., et al. (2004). Abnormalities of thalamic volume and shape in schizophrenia. Am. J. Psychiatry 161, 896–902. doi: 10.1176/appi.ajp.161.5.896
Damoiseaux, J. S., Rombouts, S. A., Barkhof, F., Scheltens, P., Stam, C. J., Smith, S. M., et al. (2006). Consistent resting-state networks across healthy subjects. Proc. Natl. Acad. Sci. U.S.A. 103, 13848–13853. doi: 10.1073/pnas.0601417103
Davatzikos, C., Shen, D., Gur, R. C., Wu, X., Liu, D., Fan, Y., et al. (2005). Whole-brain morphometric study of schizophrenia revealing a spatially complex set of focal abnormalities. Arch. Gen. Psychiatry 62, 1218–1227. doi: 10.1001/archpsyc.62.11.1218
Davis, C. E., Jeste, D. V., and Eyler, L. T. (2005). Review of longitudinal functional neuroimaging studies of drug treatments in patients with schizophrenia. Schizophr. Res. 78, 45–60. doi: 10.1016/j.schres.2005.05.009
Demirci, O., Clark, V. P., and Calhoun, V. D. (2008a). A projection pursuit algorithm to classify individuals using fMRI data: application to schizophrenia. Neuroimage 39, 1774–1782. doi: 10.1016/j.neuroimage.2007.10.012
Demirci, O., Clark, V. P., Magnotta, V. A., Andreasen, N. C., Lauriello, J., Kiehl, K. A., et al. (2008b). A review of challenges in the use of fMRI for disease classification/characterization and a projection pursuit application from multi-site fMRI schizophrenia study. Brain Imag. Behav. 2, 147–226. doi: 10.1007/s11682-008-9028-1
Domingos, P., and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130. doi: 10.1023/A:1007413511361
Du, W., Calhoun, V. D., Li, H., Ma, S., Eichele, T., Kiehl, K. A., et al. (2012). High classification accuracy for schizophrenia with rest and task fMRI data. Front. Hum. Neurosci. 6:145. doi: 10.3389/fnhum.2012.00145
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification. 2nd Edn., New York, NY: Wiley.
Duin, R. P. W., Juszczak, P., de Ridder, D., Paclik, P., Pekalska, E., Tax, D. M. J., et al. (2007). PRTools, a Matlab Toolbox for Pattern Recognition. Delft University of technology.
Erhardt, E. B., Rachakonda, S., Bedrick, E. J., Allen, E. A., Adali, T., and Calhoun, V. D. (2010). Comparison of multi-subject ICA methods for analysis of fMRI data. Hum. Brain Mapp.
Esposito, F., Scarabino, T., Hyvarinen, A., Himberg, J., Formisano, E., Comani, S., et al. (2005). Independent component analysis of fMRI group studies by self-organizing clustering. Neuroimage 25, 193–205. doi: 10.1016/j.neuroimage.2004.10.042
Fan, Y., Rao, H., Hurt, H., Giannetta, J., Korczykowski, M., Shera, D., et al. (2007a). Multivariate examination of brain abnormality using both structural and functional MRI. Neuroimage 36, 1189–1199. doi: 10.1016/j.neuroimage.2007.04.009
Fan, Y., Shen, D., Gur, R. C., Gur, R. E., and Davatzikos, C. (2007b). COMPARE: classification of morphological patterns using adaptive regional elements. IEEE Trans. Med. Imag. 26, 93–105. doi: 10.1109/TMI.2006.886812
Fan, Y., Shen, D., and Davatzikos, C. (2005). Classification of structural images via high-dimensional image warping, robust feature extraction, and SVM. MICCAI 8, 1–8. doi: 10.1007/11566465_1
First, M. B., Spitzer, R. L., Gibbon, M., and Williams, J. B. W. (1995). The structured clinical interview for Dsm-Iii-R personality-disorders (Scid-Ii) 1. Description. J. Personal. Dis. 9, 83–91. doi: 10.1521/pedi.1995.9.2.83
Ford, J., Shen, L., Makedon, F., Flashman, L. A., and Saykin, A. J. (2002). “A combined structural-functional classification of schizophrenia using hippocampal volume plus fMRI activation,” in Second Joint EMBS/BMES Conference (Houston, TX).
Ford, J. M., Johnson, M. B., Whitfield, S. L., Faustman, W. O., and Mathalon, D. H. (2005). Delayed hemodynamic responses in schizophrenia. Neuroimage 26, 922–931. doi: 10.1016/j.neuroimage.2005.03.001
Fox, M. D., and Greicius, M. (2010). Clinical applications of resting state functional connectivity. Front. Syst. Neurosci. 4:19. doi: 10.3389/fnsys.2010.00019
Friston, K. (2002). Beyond phrenology: what can neuroimaging tell us about distributed circuitry? Annu. Rev. Neurosci. 25, 221–250. doi: 10.1146/annurev.neuro.25.112701.142846
Friston, K. J., and Frith, C. D. (1995). Schizophrenia: a disconnection syndrome? Clin. Neurosci. 3, 89–97.
Frith, C. D., Friston, K. J., Herold, S., Silbersweig, D., Fletcher, P., Cahill, C., et al. (1995). Regional brain activity in chronic schizophrenic patients during the performance of a verbal fluency task. Br. J. Psychiatry 167, 343–349. doi: 10.1192/bjp.167.3.343
Garrity, A. G., Pearlson, G. D., McKiernan, K., Lloyd, D., Kiehl, K. A., and Calhoun, V. D. (2007). Aberrant “default mode” functional connectivity in schizophrenia. Am. J. Psychiatry 164, 450–457. doi: 10.1176/appi.ajp.164.3.450
Genovese, C. R., Lazar, N. A., and Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage 15, 870–878. doi: 10.1006/nimg.2001.1037
Georgopoulos, A. P., Karageorgiou, E., Leuthold, A. C., Lewis, S. M., Lynch, J. K., Alonso, A. A., et al. (2007). Synchronous neural interactions assessed by magnetoencephalography: a functional biomarker for brain disorders. J. Neur. Eng. 4, 349–355. doi: 10.1088/1741-2560/4/4/001
Goghari, V. M., Rehm, K., Carter, C. S., and MacDonald, A. W. 3rd. (2007). Regionally specific cortical thinning and gray matter abnormalities in the healthy relatives of schizophrenia patients. Cereb. Cortex 17, 415–424. doi: 10.1093/cercor/bhj158
Greicius, M. D., Srivastava, G., Reiss, A. L., and Menon, V. (2004). Default-mode network activity distinguishes Alzheimer's disease from healthy aging: evidence from functional MRI. Proc. Natl. Acad. Sci. U.S.A. 101, 4637–4642. doi: 10.1073/pnas.0308627101
Havlicek, M., Jan, J., Brazdil, M., and Calhoun, V. D. (2010). Dynamic Granger causality based on Kalman filter for evaluation of functional network connectivity in fMRI data. Neuroimage 53, 65–77. doi: 10.1016/j.neuroimage.2010.05.063
Heinrichs, R. W., and Zakzanis, K. K. (1998). Neurocognitive deficit in schizophrenia: a quantitative review of the evidence. Neuropsychology 12, 426–445. doi: 10.1037/0894-4105.12.3.426
Himberg, J., Hyvarinen, A., and Esposito, F. (2004). Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22, 1214–1222. doi: 10.1016/j.neuroimage.2004.03.027
Honey, G. D., Pomarol-Clotet, E., Corlett, P. R., Honey, R. A., McKenna, P. J., Bullmore, E. T., et al. (2005). Functional dysconnectivity in schizophrenia associated with attentional modulation of motor function. Brain 128, 2597–2611. doi: 10.1093/brain/awh632
Jafri, M. J., Pearlson, G. D., Stevens, M., and Calhoun, V. D. (2008). A method for functional network connectivity among spatially independent resting-state components in schizophrenia. Neuroimage 39, 1666–1681. doi: 10.1016/j.neuroimage.2007.11.001
Jang, J. H., Jung, W. H., Choi, J. S., Choi, C. H., Kang, D. H., Shin, N. Y., et al. (2011). Reduced prefrontal functional connectivity in the default mode network is related to greater psychopathology in subjects with high genetic loading for schizophrenia. Schizophr. Res. 127, 58–65. doi: 10.1016/j.schres.2010.12.022
Josin, G. M., and Liddle, P. F. (2001). Neural network analysis of the pattern of functional connectivity between cerebral areas in schizophrenia. Biol. Cybern. 84, 117–122. doi: 10.1007/s004220000197
Karlsgodt, K. H., Sun, D. Q., and Cannon, T. D. (2010). Structural and functional brain abnormalities in Schizophrenia. Curr. Direct. Psychol. Sci. 19, 226–231. doi: 10.1177/0963721410377601
Kawasaki, Y., Suzuki, M., Kherif, F., Takahashi, T., Zhou, S. Y., Nakamura, K., et al. (2007). Multivariate voxel-based morphometry successfully differentiates schizophrenia patients from healthy controls. Neuroimage 34, 235–242. doi: 10.1016/j.neuroimage.2006.08.018
Kiviniemi, V., Kantola, J. H., Jauhiainen, J., Hyvarinen, A., and Tervonen, O. (2003). Independent component analysis of nondeterministic fMRI signal sources. Neuroimage 19, 253–260. doi: 10.1016/S1053-8119(03)00097-1
Levenberg, K. (1944). A method for the solution of certain problems in least squares. Quart. Appl. Math. 2, 164–168.
Li, S. J., Li, Z., Wu, G. H., Zhang, M. J., Franczak, M., and Antuono, P. G. (2002). Alzheimer disease: evaluation of a functional MR imaging index as a marker. Radiology 225, 253–259. doi: 10.1148/radiol.2251011301
Li, Y. O., Adali, T., and Calhoun, V. D. (2007). Estimating the number of independent components for functional magnetic resonance imaging data. Hum. Brain Mapp. 28, 1251–1266. doi: 10.1002/hbm.20359
Liang, M., Zhou, Y., Jiang, T., Liu, Z., Tian, L., Liu, H., et al. (2006). Widespread functional disconnectivity in schizophrenia with resting-state functional magnetic resonance imaging. Neuroreport 17, 209–213. doi: 10.1097/01.wnr.0000198434.06518.b8
Lowe, M. J., Mock, B. J., and Sorenson, J. A. (1998). Functional connectivity in single and multislice echoplanar imaging using resting-state fluctuations. Neuroimage 7, 119–132. doi: 10.1006/nimg.1997.0315
Lui, S., Li, T., Deng, W., Jiang, L., Wu, Q., Tang, H., et al. (2010). Short-term effects of antipsychotic treatment on cerebral function in drug-naive first-episode schizophrenia revealed by “resting state” functional magnetic resonance imaging. Arch. Gen. Psychiatry 67, 783–792. doi: 10.1001/archgenpsychiatry.2010.84
McGlashan, T. H. (1998). Early detection and intervention of schizophrenia: rationale and research. Br. J. Psychiatry Suppl. 172, 3–6.
McKeown, M. J., Makeig, S., Brown, G. G., Jung, T. P., Kindermann, S. S., Bell, A. J., et al. (1998). Analysis of fMRI data by blind separation into independent spatial components. Hum. Brain Mapp. 6, 160–188.
Melgani, F., and Bruzzone, L. (2004). Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 42, 1778–1790. doi: 10.1109/TGRS.2004.831865
Meyer-Lindenberg, A. S., Olsen, R. K., Kohn, P. D., Brown, T., Egan, M. F., Weinberger, D. R., et al. (2005). Regionally specific disturbance of dorsolateral prefrontal-hippocampal functional connectivity in schizophrenia. Arch. Gen. Psychiatry 62, 379–386. doi: 10.1001/archpsyc.62.4.379
Meyer-Lindenberg, A., Poline, J. B., Kohn, P. D., Holt, J. L., Egan, M. F., Weinberger, D. R., et al. (2001). Evidence for abnormal cortical functional connectivity during working memory in schizophrenia. Am. J. Psychiatry 158, 1809–1817. doi: 10.1176/appi.ajp.158.11.1809
Michael, A. M., Calhoun, V. D., Andreasen, N. C., and Baum, S. A. (2008). A method to classify schizophrenia using inter-task spatial correlations of functional brain images. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2008, 5510–5513. doi: 10.1109/IEMBS.2008.4650462
Mikula, S., and Niebur, E. (2006). A novel method for visualizing functional connectivity using principal component analysis. Int. J. Neurosci. 116, 419–429. doi: 10.1080/00207450500505761
Mingoia, G., Wagner, G., Langbein, K., Maitra, R., Smesny, S., Dietzek, M., et al. (2012). Default mode network activity in schizophrenia studied at resting state using probabilistic ICA. Schizophr. Res. 138, 143–149. doi: 10.1016/j.schres.2012.01.036
Muller, J. L., Deuticke, C., Putzhammer, A., Roder, C. H., Hajak, G., and Winkler, J. (2003). Schizophrenia and Parkinson's disease lead to equal motor-related changes in cortical and subcortical brain activation: an fMRI fingertapping study. Psychiatry Clin. Neurosci. 57, 562–568.
Nakamura, K., Kawasaki, Y., Suzuki, M., Hagino, H., Kurokawa, K., Takahashi, T., et al. (2004). Multiple structural brain measures obtained by three-dimensional magnetic resonance imaging to distinguish between schizophrenia patients and normal subjects. Schizophr. Bull. 30, 393–404. doi: 10.1093/oxfordjournals.schbul.a007087
Omori, M., Koshino, Y., Murata, T., Murata, I., Nishio, M., Sakamoto, K., et al. (1995). Quantitative EEG in never-treated schizophrenic patients. Biol. Psychiatry 38, 305–309. doi: 10.1016/0006-3223(95)00300-6
Orliac, F., Naveau, M., Joliot, M., Delcroix, N., Razafimandimby, A., Brazo, P., et al. (2013). Links among resting-state default-mode network, salience network, and symptomatology in schizophrenia. Schizophr. Res. doi: 10.1016/j.schres.2013.05.007. [Epub ahead of print].
Pardo, P. J., Georgopoulos, A. P., Kenny, J. T., Stuve, T. A., Findling, R. L., and Schulz, S. C. (2006). Classification of adolescent psychotic disorders using linear discriminant analysis. Schizophr. Res. 87, 297–306. doi: 10.1016/j.schres.2006.05.007
Pearlson, G. (2009). Multisite collaborations and large databases in psychiatric neuroimaging: advantages, problems, and challenges. Schizophr. Bull. 35, 1–2. doi: 10.1093/schbul/sbn166
Pearlson, G. D., and Calhoun, V. D. (2009). Convergent approaches for defining functional imaging endophenotypes in schizophrenia. Front. Hum. Neurosci. 3:37. doi: 10.3389/neuro.09.037.2009
Quinlan, J. R. (1987). Simplifying decision trees. Int. J. Man Mach. Stud. 27, 221–234. doi: 10.1016/S0020-7373(87)80053-6
Quinlan, J. R. (1993). C4.5 : Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.
Rice, D. P. (1999). The economic impact of schizophrenia. J. Clin. Psychiatry 60(Suppl. 1), 4–6; discussion 28–30.
Rish, I. (2001). “An empirical study of the naïve bayes classifier,” in Proceedings of IJCAI-01 Workshop on Empirical Methods in AI, (Sicily), 41–46.
Rogowska, J., Gruber, S. A., and Yurgelun-Todd, D. A. (2004). Functional magnetic resonance imaging in schizophrenia: cortical response to motor stimulation. Psychiatry Res. 130, 227–243. doi: 10.1016/j.pscychresns.2003.12.004
Rosenblatt, F. (1958). The perceptron – a probabilistic model for information-storage and organization in the brain. Psycholo. Rev. 65, 386–408. doi: 10.1037/h0042519
Salvador, R., Sarro, S., Gomar, J. J., Ortiz-Gil, J., Vila, F., Capdevila, A., et al. (2010). Overall brain connectivity maps show cortico-subcortical abnormalities in schizophrenia. Hum. Brain Mapp. 31, 2003–2014. doi: 10.1002/hbm.20993
Seewald, A. K. (2007). An evaluation of Naive Bayes variants in content-based learning for spam filtering. Intell. Data Anal. 11, 497–524.
Shen, H., Wang, L., Liu, Y., and Hu, D. (2010). Discriminative analysis of resting-state functional connectivity patterns of schizophrenia using low dimensional embedding of fMRI. Neuroimage 49, 3110–3121. doi: 10.1016/j.neuroimage.2009.11.011
Shenton, M. E., Dickey, C. C., Frumin, M., and McCarley, R. W. (2001). A review of MRI findings in schizophrenia. Schizophr. Res. 49, 1–52. doi: 10.1016/S0920-9964(01)00163-3
Smith, S. M., Fox, P. T., Miller, K. L., Glahn, D. C., Fox, P. M., Mackay, C. E., et al. (2009). Correspondence of the brain's functional architecture during activation and rest. Proc. Natl. Acad. Sci. U.S.A. 106, 13040–13045. doi: 10.1073/pnas.0905267106
Stein, T., Moritz, C., Quigley, M., Cordes, D., Haughton, V., and Meyerand, E. (2000). Functional connectivity in the thalamus and hippocampus studied with functional MR imaging. AJNR. Am. J. Neuroradiol. 21, 1397–1401.
Stevens, M. C., Kiehl, K. A., Pearlson, G. D., and Calhoun, V. D. (2007). Functional neural networks underlying response inhibition in adolescents and adults. Behav. Brain Res. 181, 12–22. doi: 10.1016/j.bbr.2007.03.023
Stevens, M. C., Kiehl, K. A., Pearlson, G. D., and Calhoun, V. D. (2009). Brain network dynamics during error commission. Hum. Brain Mapp. 30, 24–37. doi: 10.1002/hbm.20478
Sun, D., van Erp, T. G., Thompson, P. M., Bearden, C. E., Daley, M., Kushan, L., et al. (2009). Elucidating a magnetic resonance imaging-based neuroanatomic biomarker for psychosis: classification analysis using probabilistic brain atlas and machine learning algorithms. Biol. Psychiatry 66, 1055–1060. doi: 10.1016/j.biopsych.2009.07.019
Supekar, K., Menon, V., Rubin, D., Musen, M., and Greicius, M. D. (2008). Network analysis of intrinsic functional brain connectivity in Alzheimer's disease. PLoS Comput. Biol. 4:e1000100. doi: 10.1371/journal.pcbi.1000100
Takayanagi, Y., Kawasaki, Y., Nakamura, K., Takahashi, T., Orikabe, L., Toyoda, E., et al. (2010). Differentiation of first-episode schizophrenia patients from healthy controls using ROI-based multiple structural brain variables. Prog. Neuro Psychopharmacol. Biol. Psychiatry 34, 10–17. doi: 10.1016/j.pnpbp.2009.09.004
Takayanagi, Y., Takahashi, T., Orikabe, L., Mozue, Y., Kawasaki, Y., Nakamura, K., et al. (2011). Classification of first-episode schizophrenia patients and healthy subjects by automated MRI measures of regional brain volume and cortical thickness. PLoS ONE 6:e21047. doi: 10.1371/journal.pone.0021047
Turner, G. H., and Twieg, D. B. (2005). Study of temporal stationarity and spatial consistency of fMRI noise using independent component analysis. IEEE Trans. Med. Imag. 24, 712–718. doi: 10.1109/TMI.2005.846852
van de Ven, V. G., Formisano, E., Prvulovic, D., Roeder, C. H., and Linden, D. E. (2004). Functional connectivity as revealed by spatial independent component analysis of fMRI measurements during rest. Hum. Brain Mapp. 22, 165–178. doi: 10.1002/hbm.20022
Venkataraman, A., Whitford, T. J., Westin, C. F., Golland, P., and Kubicki, M. (2012). Whole brain resting state functional connectivity abnormalities in schizophrenia. Schizophr. Res. 139, 7–12. doi: 10.1016/j.schres.2012.04.021
Wang, K., Jiang, T. Z., Liang, M., Wang, L., Tian, L. X., Zhang, X. Q., et al. (2006). Discriminative analysis of early Alzheimer's disease based on two intrinsically anti-correlated networks with resting-state fMRI. Med. Image Comput. Comput. Assist. Inter. Miccai 2006(Pt 2 4191), 340–347. doi: 10.1007/11866763_42
Wenz, F., Schad, L. R., Knopp, M. V., Baudendistel, K. T., Flomer, F., Schroder, J., et al. (1994). Functional magnetic resonance imaging at 1.5 T: activation pattern in schizophrenic patients receiving neuroleptic medication. Magn. Reson. Imag. 12, 975–982. doi: 10.1016/0730-725X(94)91227-N
Werbos, P. J. (1990). Backpropagation through time – What it does and how to do it. Proc. IEEE 78, 1550–1560. doi: 10.1109/5.58337
Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22, 209–212. doi: 10.1080/01621459.1927.10502953
Wood, I. A., Visscher, P. M., and Mengersen, K. L. (2007). Classification based upon gene expression data: bias and precision of error rates. Bioinformatics 23, 1363–1370. doi: 10.1093/bioinformatics/btm117
Wyatt, R. J., Henter, I., Leary, M. C., and Taylor, E. (1995). An economic evaluation of schizophrenia – 1991. Soc. Psychiatry Psychiatr. Epidemiol. 30, 196–205.
Xiong, J., Parsons, L. M., Gao, J. H., and Fox, P. T. (1999). Interregional connectivity to primary motor cortex revealed using MRI resting state images. Hum. Brain Mapp. 8, 151–156.
Yoon, U., Lee, J. M., Im, K., Shin, Y. W., Cho, B. H., Kim, I. Y., et al. (2007). Pattern classification using principal components of cortical thickness and its discriminative pattern in schizophrenia. Neuroimage 34, 1405–1415. doi: 10.1016/j.neuroimage.2006.11.021
Keywords: functional network connectivity, independent component analysis (ICA), classification, schizophrenia, resting-state fMRI
Citation: Arbabshirani MR, Kiehl KA, Pearlson GD and Calhoun VD (2013) Classification of schizophrenia patients based on resting-state functional network connectivity. Front. Neurosci. 7:133. doi: 10.3389/fnins.2013.00133
Received: 28 April 2013; Accepted: 10 July 2013;
Published online: 30 July 2013.
Edited by:
John Ashburner, UCL Institute of Neurology, UKCopyright © 2013 Arbabshirani, Kiehl, Pearlson and Calhoun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Vince D. Calhoun, The Mind Research Network, 1101 Yale Blvd NE, Albuquerque, NM 87106, USA e-mail: vcalhoun@mrn.org