- 1Laureate Institute for Brain Research, Tulsa, OK, United States
- 2Department of Computer Science, University of Tulsa, Tulsa, OK, United States
- 3Department of Community Medicine, Oxley College of Health Sciences, University of Tulsa, Tulsa, OK, United States
- 4Department of Psychiatry, University of California, San Diego, San Diego, CA, United States
Introduction
Independent Component Analysis (ICA) is a widely used, unsupervised, exploratory machine learning method (Comon, 1994) and is often applied to resting-state fMRI (rsfMRI) data (Nickerson et al., 2017). Though its usefulness is apparent, the most common applications of ICA involve substantial subjectivity. For example, the spatial components extracted through individual or group-level ICA are never identical between studies and are often labeled by visual inspection and expert opinion. The goal of this study was to establish a spatial component approach based on well-documented atlases derived from large-scale investigations. These components can subsequently be used in place of components tailored to fit each individual study's dataset.
The utility of ICA to extract meaningful functional connectivity patterns without the need for prior knowledge has been established by its application to large-scale studies like Human Connectome Project (HCP) and United Kingdom (UK) BioBank cohort (Miller et al., 2016; Smitha et al., 2017). Although ICA has been successfully applied to a wide range of applications in rsfMRI, there have long been some concerns about reproducibility and the subjectivity of ICA results (Friston, 1998). Also, ICA is a computationally-demanding approach, which may provide a barrier to researchers with limited resources or exceptionally large datasets. We present an approach that has the potential to substantially decrease both the subjectivity of ICA and its computational burden.
The Critique on ICA
Performing ICA-based rsfMRI studies involves: data preprocessing and clean-up (sometimes through subject-level ICA or SICA), group-level ICA (GICA) on the entire dataset (usually with temporal concatenation), separating signal from noise independent components (ICs), network labeling, and time-series and spatial map extraction based on selected ICs for all subjects (Nickerson et al., 2017). Because ICA is a time-consuming and computationally resource-demanding procedure, a significant reduction in runtime may be worthwhile, especially in large-scale studies. Runtime issues aside, our main objective for improving this analysis pipeline focuses on producing objective, reproducible science. Furthermore, a main concern of the ICA pipeline lies in network labeling, where ICs representing potential resting-state networks (RSNs) of interest must be inspected by contextually-experienced brain anatomist(s) to be safeguarded against any misidentification. This limitation may challenge reproducibility of the results since this process could be quite arbitrary (Storti et al., 2013; Salimi-Khorshidi et al., 2014; Pruim et al., 2015). Also, running GICA on different datasets will not yield exactly the same components, and will output results that may not be closely matched with the results from other analyses (e.g., a single network may be split into 2 or 3 components, depending on the idiosyncrasies of the datasets used) (Wang and Li, 2015).
Current Remedies
One approach to this problem is to utilize machine learning and deep learning solutions to compare the GICA results with established reference RSNs (Kozák et al., 2017; Zhao et al., 2018). Although this classification method categorizes the ICs objectively based on the provided template, replicability, and stability of this approach have yet to be benchmarked by large-scale studies and relies on large amounts of training data. Deep learning approaches involve additional concerns such as slow convergence and over-fitting, especially in MRI modalities (Srivastava et al., 2017).
Another method is “semi-blind ICA”, which uses prior knowledge in the form of a “template” that is entered at the beginning of each run of GICA to guide and improve estimation of network-related components (Lin et al., 2010). This method also requires well-established knowledge on the expected activation patterns in fMRI data, especially in task-based fMRI studies.
The alternative approach presented here may allow ICA pipelines to be more stable, faster and reproducible, in terms of extracting time-series of network(s) of interest from the subjects' data in a shorter time and with less computational resources. This is the case especially in time-constrained, resource-limited studies where access to experts for interpreting the GICA results may be a challenge.
A Solution
We propose to use the ICs resulting from prior studies, such as the UK Biobank and HCP, in an “atlas-like” manner. Because such ICs are already published (Miller et al., 2016), they could be well-studied and agreed upon by the experts across the field. Following agreement, it would be possible to use the ICs as a reference to extract the time-series of subjects in matched groups, similar to an atlas, and interpret the ICs from other studies more objectively through automated, semi-automated, or conventional manual approaches.
Results of GICA would have the potential be re-used in other studies (Bijsterbosch et al., 2017). The idea of using a reference in analyses is not novel. The use of references such as Montreal Neurological Institute (MNI) standard spaces or Harvard-Oxford cortical and subcortical structural atlases in preprocessing and analysis of imaging data is also based on the same concept of grouping data together to have a common frame of reference. This approach would be beneficial to ICA pipelines as well. This solution is depicted in Figure 1. In order to elaborate on this proposal, it is demonstrated by following the solution recommended by the widely-used FMRIB Software Library (FSL) package from Functional Magnetic Resonance Imaging Modeling (FMRIB) lab (Smith et al., 2004; Woolrich et al., 2009; Jenkinson et al., 2012).
After preprocessing the data, it is common to use cleaned data to perform GICA to detect ICs and then inspect the results to label and select RSNs (ICs) of interest. By means of dual regression, candidate ICs are mapped to each subject's functional data to extract subject-specific time-series and spatial maps of desired RSNs for use in subsequent analyses (Beckmann et al., 2009; Nickerson et al., 2017).
To alleviate the issues mentioned above, a more efficient approach would be to use previously labeled ICs from a large-scale study as an atlas of ICs to be applied in dual regression rather than performing GICA. This strategy reduces study-specific GICA to a regression method. As a candidate application for this improved approach, it would be possible to use the set of ICs from GICA obtained in one study to get the time-series of matched subjects in another demographically comparable study in a cross-site investigational manner. Also, investigating a subset group from the original dataset would be more efficient this way by eliminating the need to run GICA again on the subset. Although running GICA in this scenario may lead to a better model fit to the data this may not outweigh the benefit of having a more objective way of extracting the spatial maps and time-series. Also, using GICA can provide additional improvements of noise components, that would not be possible using a standard atlas. This may not be much of a concern, since sufficient preprocessing and noise removal can be performed at the individual subject level.
Atlas-based ICA would also benefit smaller-sized studies since the analysis pipeline would improve stability and allow results to be more readily comparable to other studies. In a conventional pipeline, the whole dataset must be preprocessed and ready before running GICA, therefore performing analysis on a subset before data collection is complete is not feasible and will not produce the same components as when the entire dataset is available. In addition, if a participant's data was excluded, re-running the GICA again would be necessary, instead of simply removing those records from the group analysis. The proposed approach would address these issues as well.
We tested our approach on an rsfMRI dataset from a recent within-subjects study (Le et al., 2018) with 20 individuals that completed three sessions of functional scans (60 scans in total). Following preprocessing we investigated the difference between the conventional ICA pipeline and our proposed pipeline and applied the GICA atlas from a subset of the Tulsa 1,000 study cohort (Victor et al., 2018) using a MacBook Pro machine with a 2.9 Ghz Intel Core i7 processor and 16 GB of memory. In the proposed pipeline, the time-series of the ICs for each subject (Stage 1 output from dual regression) were ready to use in 57 mins. Alternatively, performing GICA followed by dual regression in order to extract the time-series for each subject took 9 h and 33 min. The additional runtime required for running GICA is reasonable in this case, but would be of greater concern with a larger dataset. Additionally, GICA on the 60 scans produced a substantially different set of components, which makes it difficult or impossible to compute the same components as from the atlas.
Currently, there are published reference atlases on well-studied resting-state networks (Yeo et al., 2011). These atlases are useful for labeling the networks, yet they are binary network masks and lack components' voxel-wise weights, which are necessary for subject-wise time-course extraction. To the best of our knowledge, no atlas has been published with this information.
There are some limitations of this new proposed approach. Similar to other studies that use a common atlas, the subject population should be a reasonable representation of the subject group recruited in the reference studies. Since most large public datasets are from normal populations, one might question the practice of applying components derived from them to patient populations. This is a valid concern, which could be addressed by comparing GICA results from the target population with the standard atlas components. Similarly, there may be systematic differences introduced by machine types and scanning parameters, which warrants a thorough investigation. Regardless, it would be surprising if factors such as scanning parameters impacted large-scale functional organization of the brain. Also, across the field there is no standard procedure for preprocessing the data, so careful consideration must be taken when applying an IC atlas and preprocessing procedures must be compatible (Wetherill et al., 2018), as different preprocessing methods may result in a different set of outcomes. In addition, performing preprocessing on the data before comparing the results to IC atlas is necessary. For example, if an atlas uses 2 mm resolution maps, applying it to studies with different set of rules would require auxiliary processing steps.
To the best of our knowledge, there are several ICA results published and available in the literature (Beckmann et al., 2005; Smith et al., 2009; Laird et al., 2011) including from the UK Biobank cohort (Miller et al., 2016). Such references have the potential to be applied as “Atlases of ICs” on other studies, but only with extensive documentation. Publishing group-level ICA results and documenting them in an atlas-like manner would also allow researchers to keep their (testing) data separate from the training data used to build the models, which is another consideration that has garnered increased attention recently (Scheinost et al., 2019), along with the need for more open, externally validated science.
Author Contributions
RK proposed the research idea and revised the manuscript. MM performed the research and authored the manuscript. HE developed the idea and revised the manuscript. TV provided the datasets and revised the manuscript. MP supervised the project and revised the manuscript and provided feedback on the research and the manuscript.
Funding
MP was supported by a grant from the National Institute of Mental Health (R01 MH101453), from the National Institute on Drug Abuse (U01 DA041089) and National Institute of General Medical Sciences (P20GM121312). This research was funded by the William K. Warren Foundation.
Conflict of Interest
MP is an advisor to Spring Care, Inc., a behavioral health start up, he has received royalties for an article about methamphetamine in UpToDate.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
The authors appreciate the helpful feedback received from Dr. Janine Bijsterbosch at Washington University in St. Louis.
References
Beckmann, C., Mackay, C., Filippini, N., and Smith, S. (2009). Group comparison of resting-state FMRI data using multi-subject ICA and dual regression. Neuroimage 47:S148. doi: 10.1016/S1053-8119(09)71511-3
Beckmann, C. F., DeLuca, M., Devlin, J. T., and Smith, S. M. (2005). Investigations into resting-state connectivity using independent component analysis. Philos. Transac. R. Soc. 360, 1001–1013. doi: 10.1098/rstb.2005.1634
Bijsterbosch, J., Smith, S. M., and Beckmann, C. F. (2017). Introduction to Resting State FMRI Functional Connectivity. Oxford, UK: Oxford University Press.
Comon, P. (1994). Independent component analysis, a new concept? Signal Proc. 36, 287–314. doi: 10.1016/0165-1684(94)90029-9
Friston, K. J. (1998). Modes or models: a critique on independent component analysis for fMRI. Trends Cogn. Sci. 2, 373–375. doi: 10.1016/S1364-6613(98)01227-3
Jenkinson, M., Beckmann, C. F., Behrens, T. E. J., Woolrich, M. W., and Smith, S. M. (2012). FSL. Neuroimage 62, 782–790. doi: 10.1016/j.neuroimage.2011.09.015
Kozák, L. R., van Graan, L. A., Chaudhary, U. J., Szabó, Á. G., and Lemieux, L. (2017). ICN_Atlas: automated description and quantification of functional MRI activation patterns in the framework of intrinsic connectivity networks. Neuroimage 163, 319–341. doi: 10.1016/j.neuroimage.2017.09.014
Laird, A. R., Fox, P. M., Eickhoff, S. B., Turner, J. A., Ray, K. L., McKay, D. R., et al. (2011). Behavioral interpretations of intrinsic connectivity networks. J. Cogn. Neurosci. 23, 4022–4037. doi: 10.1162/jocn_a_00077
Le, T. T., Kuplicki, R., Yeh, H.-W., Aupperle, R. L., Khalsa, S. S., Simmons, W. K., et al. (2018). Effect of Ibuprofen on BrainAGE: a randomized, placebo-controlled, dose-response exploratory study. Biol Psychiatr. 3, 836–843. doi: 10.1016/j.bpsc.2018.05.002
Lin, Q.-H., Liu, J., Zheng, Y.-R., Liang, H., and Calhoun, V. D. (2010). Semiblind spatial ICA of fMRI using spatial constraints. Hum. Brain Mapp. 31, 1076–1088. doi: 10.1002/hbm.20919
Miller, K. L., Alfaro-Almagro, F., Bangerter, N. K., Thomas, D. L., Yacoub, E., Xu, J., et al. (2016). Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 19, 1523–1526. doi: 10.1038/nn.4393
Nickerson, L. D., Smith, S. M., Öngür, D., and Beckmann, C. F. (2017). Using dual regression to investigate network shape and amplitude in functional connectivity analyses. Front. Neurosci. 11:115. doi: 10.3389/fnins.2017.00115
Pruim, R. H. R., Mennes, M., van Rooij, D., Llera, A., Buitelaar, J. K., and Beckmann, C. F. (2015). ICA-AROMA: a robust ICA-based strategy for removing motion artifacts from fMRI data. Neuroimage 112, 267–277. doi: 10.1016/j.neuroimage.2015.02.064
Salimi-Khorshidi, G., Douaud, G., Beckmann, C. F., Glasser, M. F., Griffanti, L., and Smith, S. M. (2014). Automatic denoising of functional MRI data: combining independent component analysis and hierarchical fusion of classifiers. Neuroimage 90, 449–468. doi: 10.1016/j.neuroimage.2013.11.046
Scheinost, D., Noble, S., Horien, C., Greene, A. S., Lake, E. M., Salehi, M., et al. (2019). Ten simple rules for predictive modeling of individual differences in neuroimaging. Neuroimage 193, 35–45. doi: 10.1016/j.neuroimage.2019.02.057
Smith, F. P. T., Miller, K. L., Glahn, D. C., Fox, P. M., Mackay, C. E., and Beckmann, C. F. (2009). Correspondence of the brain's functional architecture during activation and rest. Proc. Natl. Acad. Sci. U.S.A. 106, 13040–13045. doi: 10.1073/pnas.0905267106
Smith, S. M., Jenkinson, M., Woolrich, M. W., Beckmann, C. F., Behrens, T. E. J., Johansen-Berg, H., et al. (2004). Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23, S208–S219. doi: 10.1016/j.neuroimage.2004.07.051
Smitha, K. A., Akhil Raja, K., Arun, K. M., Rajesh, P. G., Thomas, B., Kapilamoorthy, T. R., et al. (2017). Resting state fMRI: a review on methods in resting state connectivity analysis and resting state networks. Neuroradiol. J. 30, 305–317. doi: 10.1177/1971400917697342
Srivastava, S., Soman, S., Rai, A., and Srivastava, P. K. (2017). “Deep learning for health informatics: Recent trends and future directions,” in Paper presented at the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (Karnataka). doi: 10.1109/ICACCI.2017.8126082
Storti, S. F., Formaggio, E., Nordio, R., Manganotti, P., Fiaschi, A., Bertoldo, A., et al. (2013). Automatic selection of resting-state networks with functional magnetic resonance imaging. Front. Neurosci. 7:72. doi: 10.3389/fnins.2013.00072
Victor, T. A., Khalsa, S. S., Simmons, W. K., Feinstein, J. S., Savitz, J., Aupperle, R. L., et al. (2018). Tulsa 1000: a naturalistic study protocol for multilevel assessment and outcome prediction in a large psychiatric sample. BMJ Open 8:e016620. doi: 10.1136/bmjopen-2017-016620
Wang, Y., and Li, T.-Q. (2015). Dimensionality of ICA in resting-state fMRI investigated by feature optimized classification of independent components with SVM. Front. Hum. Neurosci. 9:259. doi: 10.3389/fnhum.2015.00259
Wetherill, R. R., Rao, H., Hager, N., Wang, J., Franklin, T. R., and Fan, Y. (2018). Classifying and characterizing nicotine use disorder with high accuracy using machine learning and resting-state fMRI: machine learning and nicotine. Addict. Biol. 24, 811–821. doi: 10.1111/adb.12644
Woolrich, M. W., Jbabdi, S., Patenaude, B., Chappell, M., Makni, S., Behrens, T., et al. (2009). Bayesian analysis of neuroimaging data in FSL. Neuroimage 45, S173–S186. doi: 10.1016/j.neuroimage.2008.10.055
Yeo, K. F. M., Sepulcre, J., Sabuncu, M. R., Lashkari, D., Hollinshead, M., and Buckner, R. L. (2011). The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165. doi: 10.1152/jn.00338.2011
Keywords: independent component analysis, image-derived phenotype, machine learning, rsfMRI, group-level analysis
Citation: Moradi M, Ekhtiari H, Victor TA, Paulus M and Kuplicki R (2020) Image-Derived Phenotyping Informed by Independent Component Analysis—An Atlas-Based Approach. Front. Neurosci. 14:118. doi: 10.3389/fnins.2020.00118
Received: 17 July 2019; Accepted: 30 January 2020;
Published: 21 February 2020.
Edited by:
Baojuan Li, Fourth Military Medical University, ChinaReviewed by:
Xintao Hu, Northwestern Polytechnical University, ChinaCopyright © 2020 Moradi, Ekhtiari, Victor, Paulus and Kuplicki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Mahdi Moradi, mmoradi@laureateinstitute.org