- 1Department of Vascular Surgery, Renmin Hospital of Wuhan University, Wuhan, China
- 2Faculty of Medicine, Jianghan University, Wuhan, China
- 3Central Laboratory, Renmin Hospital of Wuhan University, Wuhan, China
- 4Department of Clinical Laboratory, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen, China
- 5Department of Radiology, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
- 6School of Health, Wuhan University, Wuhan, China
Abstract: Alzheimer’s disease (AD) is a severe neurodegenerative disease for which there is currently no effective treatment. Mild cognitive impairment (MCI) is an early disease that may progress to AD. The effective diagnosis of AD and MCI in the early stage has important clinical significance.
Methods: To this end, this paper proposed a hypergraph-based netNMF (HG-netNMF) algorithm for integrating structural magnetic resonance imaging (sMRI) of AD and MCI with corresponding gene expression profiles.
Results: Hypergraph regularization assumes that regions of interest (ROIs) and genes were located on a non-linear low-dimensional manifold and can capture the inherent prevalence of two modalities of data and mined high-order correlation features of the two data. Further, this paper used the HG-netNMF algorithm to construct a brain structure connection network and a protein interaction network (PPI) with potential role relationships, mine the risk (ROI) and key genes of both, and conduct a series of bioinformatics analyses.
Conclusion: Finally, this paper used the risk ROI and key genes of the AD and MCI groups to construct diagnostic models. The AUC of the AD group and MCI group were 0.8 and 0.797, respectively.
1. Introduction
Alzheimer’s disease (AD) is a neurodegenerative disease with insidious onset and progressive development. The most common early symptom is difficulty remembering recent events (Scheltens et al., 2021). As the disease progresses, patients gradually lose their ability to care for themselves and eventually die from complications such as infection (Scheltens et al., 2021). The exact cause of AD is still unknown, but it has a long-standing preclinical feature: mild cognitive impairment (MCI). The typical symptoms of MCI patients are memory loss and there may be damage to one or more cognitive areas (Carlson and Prusiner, 2021). Still, it is not enough to affect the patient’s daily life, and the diagnostic criteria for AD have not yet been met. Previous studies show that structural and functional abnormalities of the brain and phenotypic or molecular abnormalities associated with AD are associated (Simrén et al., 2021). However, the source or cause of these abnormalities is unclear.
In recent years, imaging genetic analysis related to brain diseases has attracted much attention. Imaging genetics explores the effect of genetic variation on brain structure, metabolism, and function through association analysis of radionics data (such as sMRI, PET, fMRI; Jiang et al., 2018) and genetic data (such as DNA methylation, SNP, gene expression; Du et al., 2020).
The association mechanism of imaging genetics is gradually revealed through robust association analysis algorithms. By improving sparse canonical correlation analysis (SCCA) with various strategies, Du et al. (2020) proposed various innovative algorithms based on SCCA to analyze brain imaging genetics data. For example, they proposed a dirty multi-task sparse canonical correlation analysis (dirty MT-SCCA) to study imaging genetic problems involving multimodal brain imaging. This method used multi-task learning and parameter decomposition to simultaneously identify pattern-consistent and pattern-specific brain regions and SNP loci. Furthermore, they proposed a parametric decomposition-based sparse multi-view canonical correlation analysis (PDSMCCA) method to identify modality-sharing and specific information from multimodal data to gain insight into the complex pathology of brain diseases (Zhang et al., 2022). However, the above two algorithms only supported the analysis of multimodal brain imaging and single-modal genetic data. The sparse multi-view sparse canonical correlation analysis (SMCCA) algorithm can simultaneously perform correlation analysis on data from multiple modalities. However, the direct fusion of multiple SCCA objectives can cause gradient domination problems, resulting in SMCCA being a suboptimal model. Therefore, they proposed two adaptive multi-view canonical correlation analysis algorithms to solve the problem of gradient domination, integrating the underlying relationships between protein expression, SNPs, and neuroimaging (Du et al., 2021).
On the other hand, multi-objective optimization algorithms are also emerging in the field of imaging genetics. Bi et al. (2022) integrated the fMRI and SNP data of Parkinson’s patients through multiple optimization methods for multimodal analysis. They used a correlation analysis method to construct fused features from the sequence information and SNPs of regions of interest (ROIs). Then, a weighted evolution strategy was introduced into ensemble learning, and a new weighted evolutionary random forest (WERF) model was constructed to eliminate inefficient features. In addition, they also proposed a clustered evolutionary random forest (CERF) method to detect discriminative genes and brain regions and found some interesting associations between brain regions and genes in Parkinson’s patients (Bi et al., 2021).
However, the above algorithms cannot interpret the results from the perspective of network regulation. Therefore, this paper used a hypergraph-based netNMF (HG-netNMF) algorithm to explore the relationship between genes and sMRI from both gene network and brain network construction. The algorithm exploited hypergraph regularization to mine higher-order features of both modal data. Further, the key genes and ROIs were explored from the critical network modules. Bioinformatics analysis and diagnostic model construction were performed to provide new insights into the imaging genetic association mechanism of AD and MCI.
2. Methods
2.1. Nonnegative matrix factorization
Nonnegative matrix factorization (NMF) algorithm is a classic dimensionality reduction algorithm that were widely used in image processing, audio processing, biological data analysis, and other fields (Kim and Tidor, 2003; Zhang et al., 2012; Yang et al., 2021). It decomposes the matrix to obtain the basis matrix and the coefficient matrix , and its objective function is as follows:
2.2. Network non-negative matrix factorization
The traditional NMF algorithm can only decompose single-modal data and cannot analyze the network control module composed of multi-modal data. When is a symmetric similarity matrix, Chen and Zhang (2018) proposed the netNMF algorithm to decompose multiple matrices simultaneously. It is worth noting that the algorithm can be used to analyze the network modules of various data. For example, their respective correlation networks and mutual correlation networks can be calculated for analysis for the two expression matrices , . The objective function of the netNMF algorithm is as follows:
Among them, , are the respective association networks of and . is the cross-correlation network of and . , , , are non-negative factor matrices. The algorithm can be analyzed from the network module level constructed by the data, and is used to identify the one-to-one relationship between the two types of modules.
In fact, , are the symmetric similarity matrices corresponding to the two types of features, respectively, and is the nonnegative similarity matrix between the two types of data. and represent the matrix describing the similarity of the two networks obtained from the decomposition of and , respectively. Off-diagonal elements in and indicate the importance of relationships between rois and between genes.
2.3. Hypergraph learning
In graph learning, vertices and edges can describe the relationship between multiple objects. However, simple graphs may not depict complex relationships in practical situations. Therefore, hypergraph theory is widely used to mine higher-order correlations between complex things (Xiao et al., 2020). A hypergraph can connect more than two vertices through hyperedges. It is an extension of simple graphs where each edge can connect multiple vertices, known as hyperedges. Let represents a hypergraph, and represent vertices, hyperedges, and the weights of vertices, respectively. is the set of vertices contained in one of the hyperedges is the set of hyperedges. corresponds to the weight of each hyperedge. In addition, the relationship between hyperedges and vertices can be represented by an association matrix shown on the right side of Figure 1. Among them, the element of matrix indicates whether the hyperedge contains the vertex. The element in the row and column of matrix is defined as follows:
Among them, . Furthermore, the degrees and of the vertex and the hyperedge are defined, respectively.
Figure 1. Line graph of the relationship between the number of neighbors and the value of the relative error when building a hypergraph in the two groups. (A,B) Are the cases of AD group and MCI group, respectively.
Further, define degree diagonal matrices and . Among them, ,. Furthermore, let represent the diagonal matrix of hyperedge weights, . Then, the similarity matrix that defines the hypergraph is as follows.
Similar to the Laplacian matrix definition for simple graphs, the Laplacian matrix for hypergraphs is defined as follows.
2.4. Hypergraph-based network non-negative matrix factorization
The previously introduced netNMF algorithm does not consider the correlation within different networks, while the hypergraph Laplacian matrix can integrate high-order correlation information of different modal data. It helps the algorithm identify more biologically meaningful network modules and improves its performance to a certain extent. This paper defines hypergraph regularization as follows.
and are the hypergraph Laplacian matrices of and , respectively. Further, let , and be Lagrangian multipliers, and (9) can be arranged as a Lagrangian function.
The following formula can be obtained using (10) to calculate the partial derivatives of , , and , respectively.
Further, through the Karush-Kuhn-Tucher (KKT) condition, the iteration rules of , , and can be obtained, as shown below.
2.5. Network module selection method
Two types of modules, and , can be identified from the ROI matrix and gene network matrix constructed in this paper. Specifically, zscore is introduced into this paper to calculate the weights of genes and ROIs, and genes and ROIs with weights above the threshold are treated as members of the corresponding community.
Among them, , . In this paper, the threshold was set to 1.
2.6. Evaluation indicators for regression analysis and diagnostic model construction
In this paper, regression analysis and diagnostic model construction were carried out on the Top elements in the network module obtained by the HG-netNMF algorithm. For regression analysis, this paper introduced mean absolute error (MAE) and root mean square error (RMSE) to evaluate the regression performance, and their definitions are as follows:
where represents the predicted value, and represents the true value. In addition, when constructing the diagnostic model, this paper introduced the Receiver Operating Characteristic (ROC) curve to evaluate the classification performance of the algorithm, the abscissa is the false positive rate (FPR), and the ordinate is the true positive rate (TPR). The AUC value is the area covered by the ROC curve. AUC can measure the classification effect of the classifier.
3. Results
3.1. Data acquisition and preprocessing
The real data used in this paper are all from the ADNI database. We collected sMRI imaging and gene expression data from 306 subjects in ADNI. Table 1 gives specific information of the samples.
For sMRI data, this paper realized the segmentation of sMRI based on the CAT toolkit of MATLAB software. Specifically, the CAT toolkit provided a voxel-based morphometric measurement (VBM) function (Veres et al., 2009). Finally, the gray matter volumes of 140 ROIs were extracted as imaging features. Differential expression analysis was performed using the Limma package for gene expression data. Specifically, we used the AD group and the MCI group as the diseased group and the HC group as the control group to experiment. Differentially expressed genes with values less than 0.01 in the AD and MCI groups were retained (510 genes were retained in the AD group, and 314 genes were retained in the MCI group).
In addition, this paper firstly divided the AD group and the MCI group into the training set and the test set according to the ratio of 8:2 and used the HG-netNMF algorithm to perform network module association analysis on the training set of the two groups of data, and then validate on the test set, and use Top 10 genes to regress Top 10 ROIs, build diagnostic models, etc.
3.2. The influence of neighbor size
In this paper, the K-nearest neighbors (KNN) method was used to construct the hyperedge, and the neighbor size of the KNN algorithm needs to be selected. The size of the neighbors was selected based on the size of the objective function value. Based on the selection experience of previous papers, we divided the parameters of the two groups. Figures 1A,B below give the line graphs of the relationship between the neighbor size of the AD group and the MCI group and the objective function value, respectively.
The AD group’s smallest relative error was 0.4496 (corresponding to 31 and 5). The MCI group’s smallest relative error was 0.4848 (corresponding to 5 and 47).
3.3. Parameter selection
Two parameters need to be selected in this paper, namely and . The objective function value was selected as the selection standard, and parameters were selected from the range of [0.0001, 0.001, 0.01, 0.1]. The schematic diagram of parameter selection was shown in Figure 2.
Figure 2. The line graph of parameter selection and objective function value of AD group and MCI group. (A,B) Are the cases of AD group and MCI group, respectively.
In addition, according to the selection experience of previous literature, the value of generally does not exceed one-tenth of the minimum number of samples or features for the number of genes-ROIs network modules . Therefore, the value in this paper was set to 7 in the AD group and 22 in the MCI group.
3.4. Algorithm comparison
To confirm the superiority of the algorithm, the algorithm was compared on simulated data and the real data. First, on simulated data, this paper compared the objective function values of the two algorithms at different noise levels. The generation method of simulated data was similar to Kim et al. (2019). Specifically, this paper simulated the sMRI numerical matrix and the gene numerical matrix of 300 random samples. By defining a normal distribution , generate an ROI weight vector with 200 elements and a gene weight vector with 2000 elements. In addition, this paper created noise , which comes from a normal distribution . Next, correlated and uncorrelated ROI and gene variables were generated similarly: and and and . The respective and mutual Pearson correlation coefficients were then calculated for the two variables as inputs to the algorithm (, , and ). Next, the objective function values of the proposed method and the netNMF algorithm were compared under the same experimental conditions and noise levels (Table 2). In addition, we simulated the anti-noise performance of the two algorithms when the sample size is large. Specifically, we set the total number of samples to 1,000, and the remaining parameters are consistent with the above to obtain the objective function values of the two algorithms under different noise levels (Table 3).
Table 2. The objective function values of the two algorithms were compared on a simulation dataset with a small sample size.
Table 3. The objective function values of the two algorithms were compared on a simulation dataset with a large sample size.
In real data, this paper compared the results of the proposed HG-netNMF algorithm with the results obtained by the netNMF algorithm for the AD group and the MCI group. Specifically, we calculated the Pearson correlation coefficients between the reconstructed matrices , , and and the three original matrices , , and , as shown in Table 4. In addition, we give the formulas for , and in the following formula.
Table 4. Pearson correlation coefficients between the three original matrices and the three reconstructed matrices obtained by the two algorithms in the AD group and the MCI group.
represents the Pearson correlation coefficient between and . It can be seen from Table 3 that the proposed algorithm outperforms the netNMF algorithm in the reconstruction of and in the AD group. In the MCI group, the proposed algorithm outperformed the netNMF algorithm for reconstructing the three original matrices. This confirmed that hypergraph regularization contributes to the improvement of algorithm reconstruction performance. In addition, this paper also drew the scatter plots of the two algorithms between the three original matrices and the three reconstructed matrices in the AD group and the MCI group, as shown in Figure 3.
Figure 3. Correlation scatter plots of , , and of the two algorithms and their reconstruction matrices , , and in the AD and MCI groups. (A,E,I) Are three sets of scatter plots obtained by the netNMF algorithm in the AD group. (B,F,J) Are three groups of scatter plots obtained by the HG-netNMF algorithm in the AD group. (C,G,K) Are three sets of scatter plots obtained by the netNMF algorithm in the MCI group. (D,H,L) Are three groups of scatter plots obtained by the HG-netNMF algorithm in the MCI group.
As seen from Figure 3, the two algorithms had comparable reconstruction performance when reconstructing and . However, the proposed algorithm performed better when reconstructing in the MCI group.
3.5. Network module selection
Fourteen network modules were obtained in the AD group, of which module 1 and 10 contained less than five genes, so they were eliminated. In the MCI group, 14 modules were obtained, of which modules 1, 3, 6, 9, and 11 contained less than five ROIs, and modules 2, 3, 4, 7, and 9 contained less than five genes. Therefore, the above modules were eliminated. Figure 4 below shows the number of elements retained in the AD and the MCI group and the respective reconstruction errors and total reconstruction errors of the two modal features.
Figure 4. Histogram of network module selection in the AD group and MCI group. (A,B) Are the element numbers of ROIs and genes in the retained modules in the AD and MCI groups, respectively. (C,D) Are the ROIs, genes, and relative errors of the two in the retained modules in AD and MCI groups, respectively.
As shown in Figure 4, module 3 of the AD group and module 10 of the MCI group had minor relative errors, and subsequent analysis of these two modules will be performed later.
3.6. Significant network module analysis
First, this paper constructed a PPI network using the ROIs and genes in network module 3 in the AD group, respectively. Specifically, this paper selected elements in module 3 from and to form PCC pairs with a one-to-one correspondence between two components. The AD group’s mean of the PCC of gene–gene and ROI-ROI was 0.902. Therefore, in this paper, the PCC threshold of ROI and genes was set to 0.9, and uses the relationship pair of PCC > 0.9 to construct the ROI-ROI interaction network and the gene–gene interaction network. Furthermore, the interaction network model was visualized using Cytoscape (version 3.9.1; Shannon et al., 2003). The Matthews Correlation Coefficient metric (MCC) algorithm (Boughorbel et al., 2017) was widely used as a performance metric in bioinformatics. This paper used the MCC algorithm to mine the scores of ROIs and genes related to other network nodes and then arranges the top 10 ROIs and genes as the risk ROI and key genes. The MCI group’s mean of the PCC of gene–gene and ROI-ROI was 0.604. Therefore, in this paper, the PCC threshold of ROI and genes was set to 0.60, and used the relationship pair of PCC > 0.6 to construct the ROI-ROI interaction network and the gene–gene interaction network. The PPI networks of the two groups are shown in Figure 5.
Figure 5. Visualization of ROI-ROI and gene–gene interaction networks in the AD and MCI groups. (A,B) are the network diagrams of the ROIs and genes of the AD group, respectively. (C,D) Are the network maps of the ROIs and genes of the MCI group, respectively.
3.7. Key regions of interests and genes selection
In this paper, the MCC algorithm was used to obtain the scores of the Top nodes in the four network graphs of the AD and the MCI groups, as shown in Tables 5, 6. A higher score in the table represents a more critical role for this ROIs/gene in MCI/AD. So we discussed these ROIs/genes in detail in the subsequent discussion section. The top brain regions in both groups were also visualized (Figure 6, Figure 7). We also drew Venn diagrams for ROI and gene nodes in the AD group and MCI group network (Figure 8).
Figure 8. ROI and gene intersection of Top 10 in AD and MCI groups. (A) Is the intersection of ROI selected by AD group and MCI group. (B) Is the intersection of ROI selected by AD group and MCI group.
4. Discussion
4.1. Analysis of the biological significance of key regions of interests and genes
We first analyzed the biological significance of the Top 10 ROIs in the AD group. A pilot study showed that in AD, Aβ deposition in the inferior temporal gyrus was strongly associated with gray matter atrophy in brain region BF-227 (Maeno, 2019). The left lingual gyrus is associated with changes in functional connectivity at the local network level in AD (Chang et al., 2020). The left angular gyrus is associated with increased brain metabolism in AD patients (Weissberger et al., 2017). Studies on rs-fMRI have shown that the left angular gyrus is the brain functional connectivity region showing the most significant discrimination (Wang et al., 2021). The left inferior frontal gyrus is associated with genetic variants of cortical atrophy in AD, contributing to further understanding AD’s genetic basis (Kim et al., 2021). Maeno (2019) found that Aβ deposition in the anterior cuneiform of AD patients was associated with atrophy of the right occipitotemporal region. The left medial orbital age has a strong negative correlation, which is valid in the age adjustment of AD subjects (Wang et al., 2021).
We also analyzed the Top ROIs in the MCI group’s biological significance. The left entorhinal cortex decreases in volume and cognitive and motor dysfunction in older adults with mild cognitive impairment (Sakurai et al., 2019). Cerebral blood flow (rCBF) in the left lateral orbital gyrus was decreased in the dizzy MCI group compared with the non-dizzy MCI group (Na et al., 2020). In an ALFF-based fMRI study, researchers found significantly increased ALFF values in the right lingual gyrus of MCI patients compared with HC (Lai et al., 2022). Occipital gray matter volume correlates with neuropsychological performance in patients with amnestic MCI or mild AD (Arlt et al., 2013). The left superior parietal lobule is associated with modulating the amplitude of low-frequency fluctuations (ALFF) in patients with mild cognitive impairment (MCI) (Zhuang et al., 2020). Chen et al. (2020) found atrophy of the Para hippocampus gyrus structure in MCI. In addition, it can be found from Tables 4, 5 that the left inferior frontal gyrus and left inferior frontal gyrus are the intersection ROIs of the two groups. Here we found that these two brain regions were selected in both the AD and MCI groups. Using the left inferior frontal gyrus as a seed, Pistono et al. (2021) found that the functional connectivity of the language network could better discriminate the MCI and AD participants than the executive control network, revealing an increase in connectivity during the MCI phase. To study the relationship between MCI patients and apathy, the researchers divided MCI patients into those with and without “SPECT images suggestive of AD.” They found that apathy negatively correlated with regional cerebral blood flow in the bilateral fusiform gyrus (Kazui et al., 2016).
Next, this paper also performed the same analysis on the Top 10 genes. Two significant features of AD are transcriptome dysregulation and altered RNA-binding protein (RBP) function (Chen et al., 2020). TNS1-related Gene Ontology annotations included RNA binding and actin binding. Variations in actin-involved endocytosis pathways are significant contributors to the overall regulation of genetic risk for AD (Pistono et al., 2021). TRIM58 gene induces E3 ubiquitin ligase in late erythropoiesis (Kazui et al., 2016). Experiments have confirmed that α-synuclein in red blood cells may help differentiate AD from HC. Erythropoiesis is associated with FECH. The protein encoded by FECH is localized to mitochondria (Kazui et al., 2016). GLRX5 encodes a mitochondrial protein (Rybak-Wolf and Plass, 2021). E2F2 is involved in the control of the cell cycle, and Zhou et al. (2021) identified vital cell cycle regulators, helping to develop potential pathways for optimal AD treatment. BCL2L1 was identified as a core target of tau pathogenesis (Tesi et al., 2020).
TXN is associated with cellular senescence, and Gene Ontology annotations associated with this gene include RNA binding and oxidoreductase activity. Chico et al. (2013) found that superoxide dismutase activity was reduced in MCI compared to controls. SLC15A3 is involved in the innate immune response, and Fiala et al. (2017) found that the innate immune system in MCI patients is highly increased or decreased through the transcription of inflammatory genes. LST1 may regulate immune responses, and peripheral innate immune responses had the highest activation level in the MCI group compared with the subjective memory complaints (SMC) and AD groups (Munawara et al., 2021). The CCDC9 gene is a possible component of the exon junction complex (EJC), which is involved in mRNA translation, one of the common pathways of MCI (Fernández-Martínez et al., 2020). LRP3 is involved in regulating gene expression, and analysis of gene expression data from blood may help differentiate MCI from AD (Bottero and Potashkin, 2019).
4.2. Analysis of MMSE based on top features
In this paper, the Top 10 ROIs and 10 genes of the AD and the MCI groups were used to predict MMSE (stands for root mean square error) to confirm the correlation between the Top features and the clinical score. Specifically, this paper first used support vector regression (SVR), random forest (RF), and K-Nearest Neighbors (KNN) algorithms to regress the MMSE of the two groups on the training sets of the two groups, respectively. The regression effect was evaluated using MAE and RMSE, as shown in Figure 9.
Figure 9. Histograms obtained by using three regression methods (RF, SVM and KNN) to regress MMSE and using two regression evaluation indicators (MAE and MSE). (A,B) Are the regression results using the Top ROIs and genes of the AD group, respectively. (C,D) Are the regression results of Top ROIs genes using the MCI group, respectively.
It can be found from Figures 9A,B that using KNN regression algorithm in AD group can get smaller MAE and RMSE in most cases. From Figures 9C,D, it can be found that using the RF algorithm in AD group can get smaller MAE and RMSE in most cases.
4.3. Correlation analysis between top regions of interests and genes
This paper drew a correlation heat map for the Top 10 ROIs and genes on the test set in the two groups obtained above, as shown in Figure 10.
Figure 10. Heatmap of the correlation of Top ROIs and genes on the test set. (A,B) Are the correlation heatmaps of AD group and healthy control group, respectively. (C,D) Are the correlation heatmaps of the MCI group and the healthy control group, respectively.
As seen in Figure 10, there is a strong correlation between the Top features of the AD group data. This paper takes the absolute value of PCC and then calculates the average correlation. Among them, the average correlation for the AD group was 0.5411, and the correlation of the control group was 0.1045. The mean correlation was 0.1039 in the MCI group and 0.2619 in the control group.
4.4. Regression analysis between top features
In this section, this paper still used three regression algorithms: SVR, RF, and KNN. Specifically, this paper uses the Top 10 genes selected by the HG-netNMF algorithm to regress to the Top 10 ROIs, respectively, trains three models on the training set, and uses MAE and RMSE to evaluate the regression effect on the test set, as shown in Figure 11.
Figure 11. Histograms of errors (MAE and RMSE) obtained by regression prediction of Top 10 ROIs using Top10 genes in the AD and MCI groups, respectively. (A,B) Are the regression results of two groups, respectively.
It can be found from Figure 11 that using different regression algorithms to predict the left lingual gyrus of the AD group can achieve the smallest root mean square error. Using different regression algorithms to predict the right frontal operculum of the MCI group can achieve the smallest root mean square error.
4.5. Classification analysis
This paper’s ROC curves were drawn using the Top 10 ROIs and genes of the AD and MCI groups, respectively. In addition, this paper used Logistic regression in IBM SPSS Statistics software to build a joint diagnostic model, as shown in Figure 12. This paper also counted the specific information of the four diagnostic models, as shown in Table 7.
Figure 12. ROC curves obtained from the diagnostic models of AD and MCI constructed using Top ROIs and genes of AD and MCI groups, respectively. (A,B) Are the diagnostic models of AD constructed using the Top features of the AD group, respectively. (C,D) Are the diagnostic models of MCI constructed using the Top features of the MCI group, respectively.
Next, to determine whether the ROIs and genes involved in the construction of the diagnostic model also have diagnostic significance for AD and MCI, this paper drew ROC curves for the ROIs and genes involved in the construction of the diagnostic model in Figure 9 to verify one by one, as shown in Figure 10. In addition, the ROC curve information of a single ROI and gene in the two groups was calculated separately, as shown in Tables 8, 9.
It can be seen from Figure 13 that most of the AUCs of ROI/gene involved in the construction of the diagnostic model are more significant than 0.5, which has diagnostic significance for AD and MCI, further confirming the effectiveness of the algorithm.
Figure 13. Single validation of Top 10 ROIs and genes for the two groups. (A–D) Are the ROC curves of the first five and last five ROIs of the AD group, respectively. (E–H) Are the ROC curves of the first five and last five ROIs of the MCI group.
5. Conclusion
This paper proposed a hypergraph-based NetNMF method to integrate sMRI and genetic data of AD and MCI patients and mined their respective regulatory networks and interaction information, aiming to explore the risk ROIs and genes associated with AD and MCI. Finally, robust diagnostic models for AD and MCI were constructed, respectively. In future work, we will integrate more data types to analyze the disease-related regulatory mechanisms more comprehensively.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found at: https://adni.loni.usc.edu.
Author contributions
JT contributed to the conception of the study. JZ and JT performed the experiment. XX and TL contributed significantly to analysis and manuscript preparation. ZC and RC performed the data analyses and wrote the manuscript. JC and XL helped to perform the analysis with constructive discussions. All authors have read and approved the manuscript.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Arlt, S., Buchert, R., Spies, L., Eichenlaub, M., Lehmbeck, J. T., and Jahn, H. (2013). Association between fully automated MRI-based volumetry of different brain regions and neuropsychological test performance in patients with amnestic mild cognitive impairment and Alzheimer's disease. Eur. Arch. Psychiatry Clin. Neurosci. 263, 335–344. doi: 10.1007/s00406-012-0350-7
Bi, X.-A., Wu, H., Xie, Y., Zhang, L., Luo, X., Fu, Y., et al. (2021). The exploration of Parkinson's disease: a multi-modal data analysis of resting functional magnetic resonance imaging and gene data. Brain Imaging Behav. 15, 1986–1996. doi: 10.1007/s11682-020-00392-6
Bi, X.-a., Xing, Z.-X., Zhou, W., Li, L., and Xu, L. (2022). Pathogeny detection for mild cognitive impairment via weighted evolutionary random forest with brain imaging and genetic data. IEEE J. Biomed. Health Inf. 26, 3068–3079. doi: 10.1109/JBHI.2022.3151084
Bottero, V., and Potashkin, J. A. (2019). Meta-analysis of gene expression changes in the blood of patients with mild cognitive impairment and Alzheimer's disease dementia. Int. J. Mol. Sci. 20:5403. doi: 10.3390/ijms20215403
Boughorbel, S., Jarray, F., and el-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS One 12:e0177678. doi: 10.1371/journal.pone.0177678
Carlson, G. A., and Prusiner, S. B. (2021). How an infection of sheep revealed prion mechanisms in Alzheimer's disease and other neurodegenerative disorders. Int. J. Mol. Sci. 22:4861. doi: 10.3390/ijms22094861
Chang, Y. T., Hsu, J. L., Huang, S. H., Hsu, S. W., Lee, C. C., and Chang, C. C. (2020). Functional connectome and neuropsychiatric symptom clusters of Alzheimer's disease? J. Affect. Disord. 273, 48–54. doi: 10.1016/j.jad.2020.04.054
Chen, S., Xu, W., Xue, C., Hu, G., Ma, W., Qi, W., et al. (2020). Voxelwise meta-analysis of gray matter abnormalities in mild cognitive impairment and subjective cognitive decline using activation likelihood estimation. J. Alzheimers Dis. 77, 1495–1512. doi: 10.3233/JAD-200659
Chen, J., and Zhang, S. (2018). Discovery of two-level modular organization from matched genomic data via joint matrix tri-factorization. Nucleic Acids Res. 46, 5967–5976. doi: 10.1093/nar/gky440
Chico, L., Simoncini, C., Lo Gerfo, A., Rocchi, A., Petrozzi, L., Carlesi, C., et al. (2013). Oxidative stress and APO E polymorphisms in Alzheimer's disease and in mild cognitive impairment. Free Radic. Res. 47, 569–576. doi: 10.3109/10715762.2013.804622
Du, L., Liu, F., Liu, K., Yao, X., Risacher, S. L., Han, J., et al. (2020). Associating multi-modal brain imaging phenotypes and genetic risk factors via a dirty multi-task learning method. IEEE Trans. Med. Imaging 39, 3416–3428. doi: 10.1109/TMI.2020.2995510
Du, L., Zhang, J., Liu, F., Wang, H., Guo, L., Han, J., et al. (2021). Identifying associations among genomic, proteomic and imaging biomarkers via adaptive sparse multi-view canonical correlation analysis. Med. Image Anal. 70:102003:102003. doi: 10.1016/j.media.2021.102003
Fernández-Martínez, J. L., Álvarez-Machancoses, Ó., de Andrés-Galiana, E. J., Bea, G., and Kloczkowski, A. (2020). Robust sampling of defective pathways in Alzheimer's disease. Implications in drug repositioning. Int. J. Mol. Sci. 21:3594. doi: 10.3390/ijms21103594
Fiala, M., Kooij, G., Wagner, K., Hammock, B., and Pellegrini, M. (2017). Modulation of innate immunity of patients with Alzheimer's disease by omega-3 fatty acids. FASEB J. 31, 3229–3239. doi: 10.1096/fj.201700065R
Jiang, J., Sun, Y., Zhou, H., Li, S., Huang, Z., Wu, P., et al. (2018). Study of the influence of age in (18)F-FDG PET images using a data-driven approach and its evaluation in Alzheimer's disease. Contrast Media Mol. Imaging 2018:3786083. doi: 10.1155/2018/3786083
Kazui, H., Takahashi, R., Yamamoto, Y., Yoshiyama, K., Kanemoto, H., Suzuki, Y., et al. (2016). Neural basis of apathy in patients with amnestic mild cognitive impairment. J. Alzheimers Dis. 55, 1403–1416. doi: 10.3233/JAD-160223
Kim, M., Ji, H. W., Youn, J., and Park, H. (2019). Joint-connectivity-based sparse canonical correlation analysis of imaging genetics for detecting biomarkers of parkinson's disease. IEEE Trans. Med. Imaging 99:1. doi: 10.1109/TMI.2019.2918839
Kim, B. H., Nho, K., and Lee, J. M. (2021). Genome-wide association study identifies susceptibility loci of brain atrophy to NFIA and ST18 in Alzheimer's disease. Neurobiol. Aging 102, 200.e1–200.e11. doi: 10.1016/j.neurobiolaging.2021.01.021
Kim, P. M., and Tidor, B. (2003). Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Res. 13, 1706–1718. doi: 10.1101/gr.903503
Lai, Z., Zhang, Q., Liang, L., Wei, Y., Duan, G., Mai, W., et al. (2022). Efficacy and mechanism of Moxibustion treatment on mild cognitive impairment patients: an fMRI study using ALFF. Front. Mol. Neurosci. 15:852882. doi: 10.3389/fnmol.2022.852882
Maeno, N. (2019). Correlation between β-amyloid deposits revealed by BF-227-PET imaging and brain atrophy detected by voxel-based morphometry-MR imaging: a pilot study. Nucl. Med. Commun. 40, 905–912. doi: 10.1097/MNM.0000000000001042
Munawara, U., Catanzaro, M., Xu, W., Tan, C., Hirokawa, K., Bosco, N., et al. (2021). Hyperactivation of monocytes and macrophages in MCI patients contributes to the progression of Alzheimer's disease. Immun. Ageing 18:29. doi: 10.1186/s12979-021-00236-x
Na, S., Im, J. J., Jeong, H., Lee, E. S., Lee, T. K., Chung, Y. A., et al. (2020). Altered regional cerebral blood perfusion in mild cognitive impairment patients with dizziness. Diagnostics (Basel) 10:777. doi: 10.3390/diagnostics10100777
Pistono, A., Senoussi, M., Guerrier, L., Rafiq, M., Giméno, M., Péran, P., et al. (2021). Language network connectivity increases in early Alzheimer's disease. J. Alzheimers Dis. 82, 447–460. doi: 10.3233/JAD-201584
Rybak-Wolf, A., and Plass, M. (2021). RNA dynamics in Alzheimer's disease. Molecules 26:5113. doi: 10.3390/molecules26175113
Sakurai, R., Bartha, R., and Montero-Odasso, M. (2019). Entorhinal cortex volume is associated with dual-task gait cost among older adults with MCI: results from the gait and brain study. J. Gerontol. A Biol. Sci. Med. Sci. 74, 698–704. doi: 10.1093/gerona/gly084
Scheltens, P., De Strooper, B., Kivipelto, M., Holstege, H., Chételat, G., Teunissen, C. E., et al. (2021). Alzheimer's disease. Lancet 397, 1577–1590. doi: 10.1016/S0140-6736(20)32205-4
Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., et al. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504. doi: 10.1101/gr.1239303
Simrén, J., Leuzy, A., Karikari, T. K., Hye, A., Benedet, A. L., Lantero-Rodriguez, J., et al. (2021). The diagnostic and prognostic capabilities of plasma biomarkers in Alzheimer's disease. Alzheimers Dement. 17, 1145–1156. doi: 10.1002/alz.12283
Tesi, N., van der Lee, S. J., Hulsman, M., Jansen, I. E., Stringa, N., van Schoor, N. M., et al. (2020). Immune response and endocytosis pathways are associated with the resilience against Alzheimer's disease. Transl. Psychiatry 10:332. doi: 10.1038/s41398-020-01018-7
Veres, S M, Molnar, L, and Lincoln, N K. The Cognitive Agents Toolbox (CAT)–Programmingautonomous Vehicles. South Yorkshire: Sysbrain Ltd. (2009).
Wang, S. M., Kim, N. Y., Kang, D. W., Um, Y. H., Na, H. R., Woo, Y. S., et al. (2021). A comparative study on the predictive value of different resting-state functional magnetic resonance imaging parameters in preclinical Alzheimer's disease. Front. Psych. 12:626332. doi: 10.3389/fpsyt.2021.626332
Weissberger, G. H., Melrose, R. J., Fanale, C. M., Veliz, J. V., and Sultzer, D. L. (2017). Cortical metabolic and cognitive correlates of disorientation in Alzheimer's disease. J. Alzheimers Dis. 60, 707–719. doi: 10.3233/JAD-170420
Xiao, L., Wang, J., Kassani, P. H., Zhang, Y., Bai, Y., Stephen, J. M., et al. (2020). Multi-Hypergraph learning-based brain functional connectivity analysis in fMRI data. IEEE Trans. Med. Imaging 39, 1746–1758. doi: 10.1109/TMI.2019.2957097
Yang, B., Zhang, X., Nie, F., Wang, F., Yu, W., and Wang, R. (2021). Fast multi-view clustering via nonnegative and orthogonal factorization. IEEE Trans. Image Process. 30, 2575–2586. doi: 10.1109/TIP.2020.3045631
Zhang, S., Liu, C.-C., Li, W., Shen, H., Laird, P. W., and Zhou, X. J. (2012). Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 40, 9379–9391. doi: 10.1093/nar/gks725
Zhang, J., Wang, H., Zhao, Y., Guo, L., and Du, L., Alzheimer’s Disease Neuroimaging Initiative (2022). Identification of multimodal brain imaging association via a parameter decomposition based sparse multi-view canonical correlation analysis method. BMC Bioinformatics 23:128:128. doi: 10.1186/s12859-022-04669-z
Zhou, Z., Bai, J., Zhong, S., Zhang, R., Kang, K., Zhang, X., et al. (2021). Integrative functional genomic analysis of molecular signatures and mechanistic pathways in the cell cycle underlying Alzheimer's disease. Oxidative Med. Cell. Longev. 2021:5552623. doi: 10.1155/2021/5552623
Keywords: non-negative matrix factorization, Alzheimer’s disease, mild cognitive impairment, hypergraph learning, biomarkers
Citation: Zhuang J, Tian J, Xiong X, Li T, Chen Z, Chen R, Chen J and Li X (2023) Associating brain imaging phenotypes and genetic risk factors via a hypergraph based netNMF method. Front. Aging Neurosci. 15:1052783. doi: 10.3389/fnagi.2023.1052783
Edited by:
Balaji Krishnan, University of Texas Medical Branch at Galveston, United StatesReviewed by:
Xianglian Meng, Changzhou Institute of Technology, ChinaChao Huang, Florida State University, United States
Copyright © 2023 Zhuang, Tian, Xiong, Li, Chen, Chen, Chen and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xiaoxing Xiong, eGlhb3hpbmd4aW9uZ0B3aHUuZWR1LmNu; Taihan Li, bGl0aDE3QGx6dS5lZHUuY24=
†Authors note