Novel Collaborative Weighted Non-negative Matrix Factorization Improves Prediction of Disease-Associated Human Microbes

Xu, Da; Xu, Hanxiao; Zhang, Yusen; Gao, Rui

doi:10.3389/fmicb.2022.834982

ORIGINAL RESEARCH article

Front. Microbiol., 10 March 2022

Sec. Systems Microbiology

Volume 13 - 2022 | https://doi.org/10.3389/fmicb.2022.834982

Novel Collaborative Weighted Non-negative Matrix Factorization Improves Prediction of Disease-Associated Human Microbes

Da Xu¹

Hanxiao Xu¹

Yusen Zhang^1*

Rui Gao^2*

¹School of Mathematics and Statistics, Shandong University, Weihai, China
²School of Control Science and Engineering, Shandong University, Jinan, China

Extensive clinical and biomedical studies have shown that microbiome plays a prominent role in human health. Identifying potential microbe–disease associations (MDAs) can help reveal the pathological mechanism of human diseases and be useful for the prevention, diagnosis, and treatment of human diseases. Therefore, it is necessary to develop effective computational models and reduce the cost and time of biological experiments. Here, we developed a novel machine learning-based joint framework called CWNMF-GLapRLS for human MDA prediction using the proposed collaborative weighted non-negative matrix factorization (CWNMF) technique and graph Laplacian regularized least squares. Especially, to fuse more similarity information, we calculated the functional similarity of microbes. To deal with missing values and effectively overcome the data sparsity problem, we proposed a collaborative weighted NMF technique to reconstruct the original association matrix. In addition, we developed a graph Laplacian regularized least-squares method for prediction. The experimental results of fivefold and leave-one-out cross-validation demonstrated that our method achieved the best performance by comparing it with 5 state-of-the-art methods on the benchmark dataset. Case studies further showed that the proposed method is an effective tool to predict potential MDAs and can provide more help for biomedical researchers.

Introduction

Extensive clinical and biomedical studies have shown that microbiome has a prominent role in human health and disease. More than 100 trillion (10¹⁴) microbes inhabit the human gut and constitute a nutrient-rich environment where symbiotic relationships are of benefit to the host (Ley et al., 2006; Lozupone et al., 2012). Therefore, gut flora is often referred to as the “forgotten organ” (O’Hara and Shanahan, 2006). Once the balance is broken or the symbiotic relationship is disturbed, this close relationship will carry risks for the development of the disease, including cardiovascular disease (Wang et al., 2011), neurological disease (Tremlett et al., 2017), cancer (Schwabe and Jobin, 2013), inflammatory bowel disease (IBD) (Hossen et al., 2020), and so on. To better understand the medical and biological significance of the human microbiome, some large projects have been launched and made substantial progress, such as the project of metagenomics of the human intestinal tract (Ehrlich, 2011; Cho and Blaser, 2012) and the Human Microbiome Project (HMP) (Turnbaugh et al., 2007).

Studies investigating microbiomes demonstrated a critical role for microbes in the disease and health of humans. Considering the complexity and diversity of the microbial community, it is still a challenge to fully understand the interaction mechanism between microorganisms and human diseases, healthy composition, and functional states of the human microbiome. Because of the known disease-related microbes being insufficient, developing effective computational methods is necessary for reducing the cost and time of biological experiments. Recently, with the deepening of studies on computational biology, many computation-based methods have been proposed and achieved successful applications in the bioinformatics field, such as miRNA–disease (Peng et al., 2018a; Chen et al., 2019) or drug–target (Chen et al., 2016) association prediction, and lncRNA–miRNA (Zhang et al., 2021), protein–protein (Xu et al., 2020a), or lncRNA–protein (Peng et al., 2021; Zhou et al., 2021) interaction prediction.

Fortunately, in 2016, a human microbe–disease association database was constructed by Ma et al. (2017). It provided a foundation for identifying potential MDAs through computational methods. A basic assumption is mainly used in the developed methods that microbes will share similar interaction patterns with phenotype diseases if they have similar functions (Zhao et al., 2020). Chen et al. (2017) proposed the first computational model called KATZHMAD for MDA prediction using the KATZ measure. With the rapid development of artificial intelligence and machine learning (Camacho et al., 2018; Xu et al., 2020b), some machine learning-based models were proposed. For instance, Wang et al. (2017) developed the LRLSHMDA method using the Laplacian regularized least squares. In 2021, Xu et al. (2021b) developed a novel prediction model named MDAKRLS using multisimilarity and Kronecker regularized least squares for prediction and achieved better performance. Shi et al. (2018) designed a prediction model by binary matrix completion.

In addition, there are some network-based computational methods. For example, Zou et al. (2017) and Luo and Long (2020) developed BiRWHMDA and NTSHMDA by random walk for prediction only using the Gaussian interaction profile (GIP) kernel similarity, respectively. Recently, several integrated model methods have also been proposed. For example, Huang et al. (2017) built a computational model by combining two single computational methods (graph-based and neighbor-based models). Qu et al. (2019) constructed an integrated model based on label propagation and matrix decomposition. Peng et al. (2020) developed a reliable negative sample selection method based on the random walk with restart and positive unlabeled learning, then used the logistic matrix factorization with neighborhood regularization for prediction. Yin et al. (2020) also designed an integrated method using label propagation and network consistency projection. Some matrix factorization-based computational methods have been proposed to solve microbe–disease association prediction tasks or similar questions. For example, He et al. (2018) designed a graph regularized non-negative matrix factorization (NMF) framework for prediction. In 2020, Gao et al. (2021) developed multilabel fusion collaborative matrix factorization to solve lncRNA–disease association prediction task. In 2021, Xu et al. (2021a) developed regularized NMF and obtained better prediction results in the lncRNA–protein interaction prediction. However, these models may not achieve better prediction results if the dataset is very sparse.

Some existing methods inevitably have certain limitations. For example, some methods used a single similarity that may cause these methods to be biased toward the fully studied diseases or microbes. Besides, constructions of some algorithms contain many artificial parameters, and it is not easy to select the best parameters for a new dataset, which may reduce the robustness of the model. The imbalance problem of the contribution of microbes and diseases needs to be considered since their numbers are different. The benchmark microbe–disease dataset is very sparse; it is essential to weaken the effect caused by the sparse dataset and let known observed data provide more effective information. Effective methods are still scarce since most MDAs remain unknown (Fan et al., 2019; Long et al., 2021). It is necessary to overcome or weaken these limitations and develop new computational methods to improve prediction performance.

In general, from the algebraic view, biological problems of association prediction could be transformed into matrix completion problems. With the rapid development of machine learning, matrix factorization is a useful tool that has been widely used for matrix completion and solving recommendation system problems. In addition, graph regularization-based methods have been successfully applied to semisupervised learning. Considering some limitations of the previous computation-based methods, to improve the prediction performance, we designed a novel method called CWNMF-GLapRLS for MDA prediction. It used the proposed collaborative weighted NMF technique to recover the sparse association matrix and used the developed graph Laplacian regularized least squares for prediction. The experimental results showed our method achieved superior performance. It is an effective tool to predict potential MDAs and can provide more help for biomedical researchers.

Materials and Methods

Dataset

In this study, a widely used benchmark dataset (HMDAD) was used in our experiments. It can be downloaded from http://www.cuilab.cn/hmdad, which was collected by Ma et al. (2017). It contains 292 human microbes, 39 diseases, and 483 experimentally confirmed associations. After filtering out repetitive associations, we obtained 450 associations for prediction. The summary of the microbe–disease association dataset is tabulated in Table 1.

TABLE 1

Table 1. Summary of microbe–disease association dataset.

Overview of the Proposed Method

To predict potential MDAs, we proposed a novel machine learning-based joint framework named CWNMF-GLapRLS based on the collaborative weighted non-negative matrix factorization (CWNMF) and graph Laplacian regularized least squares (GlapRLS). Figure 1 illustrates the flowchart of the prediction method. It can be decomposed into the following main steps. First, we calculate the functional similarity of microbes through the microbe–disease association network and symptom-based disease similarity. Second, we obtain the GIP kernel similarity based on the topological structure information of the known association matrix, respectively. Third, we calculate the integrated similarities by similarity fusion. Fourth, the proposed CWNMF technique is implemented to reconstruct the association matrix. Finally, we use the designed GlapRLS to score the microbe–disease pairs.

FIGURE 1

Figure 1. The flowchart of CWNMF-GLapRLS framework for prediction.

Similarity Measures

For convenience, we set two sets D = {d₁, d₂, …, d_i,…d_nd} and M = {m₁, m₂, …, m_j, …, m_nm}, which represent all diseases and microbes, where nd represents the number of diseases and nm denotes the number of microbes. We constructed a binary matrix XR^nd×nm to represent the microbe–disease association network:

X (i, j) = {\begin{matrix} 1, i f d i s e a s e d_{i} i s a s s o c i a t e d w i t h m i c r o b e m_{j} \\ 0, o t h e r w i s e \end{matrix} (1)

For disease d_i, its interaction profile is represented by IP(d_i){0, 1}^1*nm, which denotes the ith row of the binary matrix X. For microbe m_p, its interaction profile is denoted by IP(m_p){0, 1}^nd*1, which represents the pth column of the binary matrix X.

Symptom-Based Disease Similarity

Some similarity calculation methods of diseases have been proposed using different kinds of disease information. Symptom-based disease similarity has been increasingly demonstrated that it can provide effective information for MDA prediction (Peng et al., 2018b; Zou et al., 2018). In this work, we also introduced symptom-based disease similarity and utilized $S_{d}^{S} R^{n d \times n d}$ to represent the similarity matrix. $S_{d}^{S} (d_{i}, d_{j})$ represents the similarity between diseases d_i and d_j. More details of the calculation method could be found in a previous study (Zhou et al., 2014). They used a vector of symptoms to represent every disease and used the cosine similarity and term frequency-inverse document frequency (TF-IDF) technique to calculate the similarity of diseases.

Microbe Functional Similarity

In this section, inspired by previous work (Zhang et al., 2018; Li et al., 2019) and the basic assumption that microbes will have similar interaction patterns with phenotype diseases that have similar symptoms, we proposed a method to calculate the functional similarity of microbes through the symptom-based disease similarity and association network.

Firstly, we suppose microbes m_i and m_j are associated with M and N diseases, respectively. Then, set D_i = {d_i1, d_i2, …, d_ip, …, d_iM} and D_j = {d_j1, d_j2, …, d_jq, …, d_jN} represent two subsets of diseases in the database, in which all diseases are related to the microbe m_i and microbe m_j, respectively. Subsequently, we define the microbe functional similarity as follows:

\begin{matrix} \sum_{p = 1}^{M} (max_{1 \leq q \leq N} S_{d}^{S} (d_{i p}, d_{j q})) + \\ S_{m}^{F} (m_{i}, m_{j}) = \frac{\sum_{q = 1}^{N} (max_{1 \leq p \leq M} S_{d}^{S} (d_{j q}, d_{i p}))}{M + N} \end{matrix} (2)

where $S_{d}^{S}$ denotes the symptom-based disease similarity matrix; $max_{1 \leq q \leq N} S_{d}^{S} (d_{i p}, d_{j q})$ represents the maximum similarity score between disease d_ip and all diseases of subset D_j; $S_{m}^{F}$ is defined as the microbe functional similarity matrix.

Gaussian Interaction Profile Kernel Similarity

In this work, symptom-based disease similarity matrix $S_{d}^{S}$ and microbe functional similarity matrix $S_{m}^{F}$ are both sparse. To integrate more effective information and mine the topology information of known association networks as much as possible, we further introduced popular GIP kernel similarity to calculate the similarity of diseases and microbes (van Laarhoven et al., 2011; Xu et al., 2021b). First, IP(d_i) of disease d_i and IP(d_j) of disease d_j were extracted from the training microbe–disease association matrix. Then, we measure the GIP kernel similarity between disease pairs as follows:

S_{d}^{G} (d_{i}, d_{j}) = e x p (- σ_{d} {|| I P (d_{i}) - I P (d_{j}) ||}^{2}) (3)

σ_{d} = σ_{d}^{^{'}} / (\frac{1}{n d} \sum_{k = 1}^{n d} {|| I P (d_{k}) ||}^{2}) (4)

where σ_d is a normalized kernel bandwidth and updated through Eq. (4); $σ_{d}^{^{'}}$ is an adjustment coefficient and was set to 1; $S_{d}^{G}$ denotes the GIP kernel similarity matrix of diseases.

Similarly, we can calculate the GIP kernel similarity of microbes:

S_{m}^{G} (m_{p}, m_{q}) = e x p (- σ_{m} {|| I P (m_{p}) - I P (m_{q}) ||}^{2}) (5)

σ_{m} = σ_{m}^{^{'}} / (\frac{1}{n m} \sum_{k = 1}^{n m} {|| I P (m_{k}) ||}^{2}) (6)

where $σ_{m}^{^{'}}$ is an adjustment coefficient and was set to 1; σ_m is a normalized kernel bandwidth and updated through Eq. (6); $S_{m}^{G}$ represents the microbe GIP kernel similarity matrix.

Integrated Similarities

Multisimilarity fusion is an effective technique that can fuse different feature information and improve performance. However, the microbe functional similarity matrix is sparse; not every microbe has a functional similarity. It may be unreasonable if the integrated similarity is calculated as a mean of functional similarity and GIP kernel similarity. This approach will dilute the GIP kernel similarity of the integrated similarity. To supplement and integrate more effective biological information for microbes, we defined an integrated similarity for microbes. The calculation of similarity between microbes m_p and m_q is defined as follows:

S_{m} (m_{p}, m_{q}) = {\begin{matrix} \frac{S_{m}^{F} (m_{p}, m_{q}) + S_{m}^{G} (m_{p}, m_{q})}{2}, i f S_{m}^{F} (m_{p}, m_{q}) \neq 0 \\ S_{m}^{G} (m_{p}, m_{q}), o t h e r w i s e \end{matrix} (7)

where S_mR^nm×nm denotes the integrated microbe similarity matrix. Specifically, the final similarity will be calculated as a mean if the microbe pair has a functional similarity. Otherwise, the GIP kernel similarity will be assigned to the integrated similarity.

Similarly, the integrated similarity calculation method of diseases d_i and d_j is defined as follows:

S_{d} (d_{i}, d_{j}) = {\begin{matrix} \frac{S_{d}^{S} (d_{i}, d_{j}) + S_{d}^{G} (d_{i}, d_{j})}{2}, i f S_{d}^{S} (d_{i}, d_{j}) \neq 0 \\ S_{d}^{G} (d_{i}, d_{j}), o t h e r w i s e \end{matrix} (8)

where S_dR^nd×nd denotes the integrated disease similarity matrix.

Collaborative Weighted Non-negative Matrix Factorization

In general, to recover the association matrix, we could transform this biological problem into a recommendation task. NMF enforced non-negativity constraints on factor matrixes for a low-rank approximation of the non-negative matrix (Lee and Seung, 1999), which could ensure that every element can be represented as an additive linear combination of canonical coordinates. Microbe–disease binary association data X is a non-negative matrix. We could use the NMF for matrix completion or association prediction.

In this work, microbe–disease association data X is incomplete and sparse. To deal with missing values and effectively overcome the data sparsity problem, we introduced weighted non-negative matrix factorization (WNMF), which slightly changed classical NMF by introducing a weighting term. WNMF was first proposed to cope with missing values in large-scale networks for predicting and representing distances (Mao and Saul, 2004) and has been used for recommendation systems (Gu et al., 2010) to solve the incomplete data problem. The biological problem can be translated into minimizing the following objective:

\begin{matrix} J = \sum_{i = 1}^{n d} \sum_{j = 1}^{n m} Y_{i j} {(X_{i j} - {(W H^{T})}_{i j})}^{2} \\ s . t . W \geq 0, H \geq 0 \end{matrix} (9)

where XR^nd×nm are the training association data; the product of non-negative matrices WR^nd×k and HR^nm×k is the best approximation of X, k≪min{nd, nm}. Microbes and diseases are mapped into a shared latent space with a low-dimensionality k. Y is a non-negative weight matrix used to reduce the influence of missing values on matrix factorization, where Y_ij = 0 indicates X_ij is a missing value and Y_ij = 1 indicates X_ij is an observed value. The objective function will degenerate into the standard NMF when all weights of matrix Y are equal to one.

In 2000, Lee and Seung (2001) have shown that the iterative update algorithm can ensure NMF objective function convergence and is very easy to use and code. At the same time, an iterative multiplicative updating algorithm was also used to solve WNMF (Zhang et al., 2006). The objective function leads to the following updated formulas:

w_{i k} = w_{i k} \frac{{(Y ⊙ X H)}_{i k}}{{(Y ⊙ (W H^{T}) H)}_{i k}} (10)

h_{j k} = h_{j k} \frac{{({(Y ⊙ X)}^{T} W)}_{j k}}{{({(Y ⊙ (W H^{T}))}^{T} W)}_{j k}} (11)

where ⊙ is the Hadamard product. These updated rules are computationally efficient.

In 2021, Xu et al. (2021a) developed regularized NMF and obtained better prediction results in the lncRNA–protein interaction prediction. This study proved that collaborative factorization of the similarity matrix can effectively guide matrix factorization and improve prediction performance. To introduce more effective similarity information to guide the matrix factorization, two collaborative regularization terms were incorporated into the WNMF framework to fuse similarity information and constrain two low-dimensional representations. It can be turned into a constrained optimization problem and formulated a joint matrix factorization framework of association data and similarity data. Then, we can obtain a novel objective function as follows:

\begin{matrix} J = \sum_{i = 1}^{n d} \sum_{j = 1}^{n m} Y_{i j} {(X_{i j} - {(W H^{T})}_{i j})}^{2} \\ + λ_{1} {∥ S_{d} - W W^{T} ∥}_{F}^{2} + λ_{2} {∥ S_{m} - H H^{T} ∥}_{F}^{2} \\ s . t . W \geq 0, H \geq 0 \end{matrix} (12)

where ||⋅||_F is the Frobenius norm; λ₁ and λ₂ are non-negative regularization parameters balancing two collaborative regularization terms and the reconstruction error. The objective function will degenerate into WNMF if λ₁ and λ₂ are equal to zero.

To prevent overfitting and adjust the smoothness of W and H, we introduced the Tikhonov (L₂) regularization terms (Xiao et al., 2018) into the objective function and obtained the final collaborative weighted non-negative matrix factorization (CWNMF) objective function as follows:

\begin{matrix} J = \sum_{i = 1}^{n d} \sum_{j = 1}^{n m} Y_{i j} {(X_{i j} - {(W H^{T})}_{i j})}^{2} \\ + λ_{1} {∥ S_{d} - W W^{T} ∥}_{F}^{2} + λ_{2} {∥ S_{m} - H H^{T} ∥}_{F}^{2} \\ + α ({∥ W ∥}_{F}^{2} + {∥ H ∥}_{F}^{2}) \\ s . t . W \geq 0, H \geq 0 \end{matrix} (13)

where α is used to adjust the Tikhonov regularization terms, which is a regularization coefficient. To improve the robustness of the model, we set the same value for the same Tikhonov regularization terms, and α was set to 1 for the dataset.

Since the objective function is not convex in both variables W and H, the iterative update algorithm was used to search the local minimum. Here, we used the Lagrange multipliers method and Karush–Kuhn–Tucker (KKT) conditions to optimize the objective function. Eventually, we obtained the following multiplicative updates:

w_{i k} = w_{i k} \frac{{(Y ⊙ X H + 2 λ_{1} S_{d} W)}_{i k}}{{(Y ⊙ (W H^{T}) H + α W + 2 λ_{1} W W^{T} W)}_{i k}} (14)

h_{j k} = h_{j k} \frac{{({(Y ⊙ X)}^{T} W + 2 λ_{2} S_{m} H)}_{j k}}{{({(Y ⊙ (W H^{T}))}^{T} W + α H + 2 λ_{2} H H^{T} H)}_{j k}} (15)

Then, we can obtain the reconstructed association matrix X* = WH^T. The low-dimensionality representation k was set as 35 in the process of prediction.

Graph Laplacian Regularized Least Squares

In this section, to improve the prediction performance, we developed a semisupervised learning method named graph Laplacian regularized least squares based on the reconstructed association matrix X*. Graph regularization is used to fully exploit data geometric structure for semisupervised learning. Specifically, in the prediction space of microbes, with the above defined integrated microbe similarity matrix S_m, the graph Laplacian regularization term was incorporated into the least-squares framework to enhance the learning performance. The optimization problem can be formularized as follows:

min_{F_{m}} {∥ X^{*} - F_{m}^{T} ∥}_{F}^{2} + β_{m} \frac{1}{2} (\sum_{i, j = 1}^{n m} {∥ F_{m i} - F_{m j} ∥}^{2} S_{m_{i j}}) (16)

where X*R^nd×nm is a reconstructed association matrix obtained by the CWNMF method; β_m is the regularization coefficient; F_m is the prediction score matrix based on the microbes; F_mi denotes the ith row of F_m ∈ R^nm×nd; and F_mj denotes the jth row of F_m. The graph Laplacian regularization term (Xiao et al., 2018; Cai et al., 2020) can be transformed into a matrix form by some algebraic manipulations:

\frac{1}{2} (\sum_{i, j = 1}^{n m} {∥ F_{m i} - F_{m j} ∥}^{2} S_{m_{i j}}) = T r (F_{m}^{T} L_{m} F_{m}) (17)

where Tr(?) denotes the trace of a matrix; L_m = D_m−S_m is the graph Laplacian matrix for S_m. D_m is the diagonal matrix whose entries are calculated as the column sums of S_m. Therefore, Eq. (16) can be transformed into the following equation:

min_{F_{m}} {∥ X^{*} - F_{m}^{T} ∥}_{F}^{2} + β_{m} T r (F_{m}^{T} L_{m} F_{m}) (18)

where F_m = S_mα_m, α_m ∈ R^nm×nd is a matrix (Xia et al., 2010). To improve the robustness of the model and according to the choice of previous similar work (van Laarhoven et al., 2011), β_m was set to 1. We can obtain the solution of the optimization problem by some manipulations, $α_{m}^{*} = {(S_{m} + L_{m} S_{m})}^{- 1} X^{* T}$ . Then, in the microbe prediction space, the prediction score matrix can be calculated as follows:

F_{m} = S_{m} {(S_{m} + L_{m} S_{m})}^{- 1} X^{* T} (19)

Similarly, for disease prediction space, the optimization problem can be formularized as the following equation:

min_{F_{d}} {∥ X^{*} - F_{d} ∥}_{F}^{2} + β_{d} T r (F_{d}^{T} L_{d} F_{d}) (20)

where β_d was also set to 1. We can obtain the prediction score matrix in the disease prediction space.

F_{d} = S_{d} {(S_{d} + L_{d} S_{d})}^{- 1} X^{*} (21)

Finally, the predicted microbe–disease association matrix is calculated as $F^{*} = η F_{m}^{T} + (1 - η) F_{d}$ , where η is a tradeoff parameter describing the importance of microbe and disease space. The microbe-related diseases can be prioritized by the size of the prediction scores in matrix F*. The detailed steps of the CWNMF-GlapRLS procedure are detailed in Algorithm 1.

Algorithm 1. CWNMF-GlapRLS Algorithm.

Input: Matrices XR^nd×nm, S_dR^nd×nd and S_mR^nm×nm; non-negative weight matrix YR^nd×nm; regularization coefficients λ₁ and λ₂; tradeoff parameter η.

Output: Predicted score matrix F*.

Randomly initialize two non-negative matrices WR^nd×k and HR^nm×k.

Repeat

Update W and H by the following rules:

$w_{i k} = w_{i k} \frac{{(Y ⊙ X H + 2 λ_{1} S_{d} W)}_{i k}}{{(Y ⊙ (W H^{T}) H + α W + 2 λ_{1} W W^{T} W)}_{i k}}$

$h_{j k} = h_{j k} \frac{{({(Y ⊙ X)}^{T} W + 2 λ_{2} S_{m} H)}_{j k}}{{({(Y ⊙ (W H^{T}))}^{T} W + α H + 2 λ_{2} H H^{T} H)}_{j k}}$

Until convergence

Reconstruct association matrix X* = WH^T.

Calculate diagonal matrix D_m;

L_m = D_m−S_m;

F_m = S_m(S_m + L_mS_m)⁻¹X^T//calculate the score matrix F_m based on the microbe prediction space.

Calculate diagonal matrix D_d;

L_d = D_d−S_d;

F_d = S_d(S_d + L_dS_d)⁻¹X//calculate the score matrix F_d based on the disease prediction space.

Return $F^{*} = η F_{m}^{T} + (1 - η) F_{d}$ .

Results

Evaluation Metrics

To ensure the reliability of experimental results, we implemented the global leave-one-out cross-validation (LOOCV) framework to validate the performance of models (Bao et al., 2017). In each round cross-validation of the LOOCV framework, the integrated similarity of diseases and microbes should be recalculated, which can guarantee independence between the validation set and the training set. Specifically, under this framework, every known microbe–disease pair will be regarded as a test set, the rest of the known pairs are treated as the training set in the dataset, and all pairs without observed association are used as candidate samples. We calculated the predicted microbe–disease score matrix by running the model. Then, the prediction score is compared with all candidate samples to get the ranking of each test sample. This testing sample will be regarded as a successful prediction if the rank is higher than the threshold. We used the receiver operating characteristic (ROC) curve to vividly describe the performance of the model by calculating sensitivity (true positive rates) and 1-specificity (false positive rates) with different thresholds. In addition, we calculated the area under curve (AUC) to intuitively describe the performance. Similarly, fivefold cross-validation (CV) was also applied to evaluate the effectiveness of the models. The experiment was repeatedly performed 10 times to reduce potential bias caused by random segmentation of the dataset. At the same time, the ROC curves and average AUC values were also obtained under the fivefold CV framework.

Parameter Sensitivity and Model Setting

It is necessary to evaluate the influence of model parameters on the prediction performance of CWNMF-GLapRLS. We studied the influence of two regularization parameters λ₁ and λ₂. The grid search method was adopted to find better model parameters. In the experiments, we first tuned the range of two parameters from 0 to 0.5, and each step is 0.01. Then, the proposed method was run to find the optimal model parameter values based on the AUC values on the 50×50 grid. Figure 2A shows the relationship between the AUC value and the parameter pair (λ₁, λ₂) under the fivefold framework. Finally, we selected the parameter pair of (0.02, 0.04) as the optimal value of (λ₁, λ₂) based on the grid search results under the two evaluation frameworks. Then, we fixed the parameter pair and adjusted the parameter η. The effects between parameter η and the AUC value are shown in Figure 2B. Finally, η was set at 0.15 as the optimal value for the following analysis.

FIGURE 2

Figure 2. (A) The illustration of determining the optimal values of parameter pair (λ₁, λ₂) under grid search. (B) The effects between parameter η and AUC value.

Iterative update algorithm can ensure objective function convergence and guarantee to converge to a locally optimal. Figure 3 shows the objective function convergence curve of CWNMF. From the figure, we can see that the convergence is fast, and the objective function value decreases as the iterations. The number of iterations is usually very small (fewer than 100) before practical convergence. Thus, the proposed method can scale to larger datasets. Finally, the number of iterations was set at 300 in the process of prediction.

FIGURE 3

Figure 3. Convergence behavior of CWNMF objection function.

Performance Analysis

Here, we compared five different forms (proposed method, proposed without microbe functional similarity, proposed method without weight, proposed method without GLapRLS, and proposed method without CWNMF) of the introduced method to analyze the proposed method. Especially, to improve the prediction performance and fuse more similarity information, we calculated microbe functional similarity. To deal with missing values and effectively overcome the data sparsity problem, we introduced WNMF, which slightly changed classical NMF by introducing a weighting term, and proposed the technique CWNMF for recovering the association matrix. The proposed method is a joint framework. The CWNMF technique was first used to recover the original matrix; then, the GLapRLS method was used for prediction. Figure 4 shows the performance comparison of methods with different forms on the HMDAD dataset. The proposed method performs better than the other four methods. From the figure, we can obtain that the combination of CWNMF and GLapRLS can significantly improve the prediction performance. The comparison results indicate that microbe functional similarity and weighting term are also effective in improving the performance of prediction.

FIGURE 4

Figure 4. The performance comparison of different methods.

Comparison With State-of-the-Art Prediction Methods

In this section, to evaluate the effectiveness of the proposed method, we compared it with 5 state-of-the-art methods, including graph regularized non-negative matrix factorization (GRNMFHMDA) (He et al., 2018), KATZ measure (KATZHMDA) (Chen et al., 2017), bi-random walk (BiRWHMDA) (Zou et al., 2017), Laplacian regularized least squares (LRLSHMDA) (Wang et al., 2017), and network topological similarity (NTSHMDA) (Luo and Long, 2020) for human MDA prediction methods. Optimal parameter combinations for 5 comparison methods are listed in Supplementary Table 1.

First, under the LOOCV framework, the ROC curves and AUC values of six methods have been shown in Figure 5. From the figure, we can see that the proposed method outperforms other methods with an AUC of 0.9362 under the LOOCV framework, while GRNMFHMDA, KATZHMDA, LRLSHMDA, BiRWHMDA, and NTSHMDA obtained AUC values of 0.8719, 0.8382, 0.8916, 0.8964, and 0.9040, respectively. In addition, the ROC curves and average AUC values of six methods under the fivefold CV framework have been shown in Figure 6. We can see that the proposed method is more outstanding than other methods with an AUC of 0.9161 under the fivefold CV framework, while GRNMFHMDA, KATZHMDA, LRLSHMDA, BiRWHMDA, and NTSHMDA obtained AUC values of 0.8555, 0.8324, 0.8809, 0.8839, and 0.8918, respectively. These experimental results proved that our method is effective and reliable, and may be an effective tool for seeking potential disease-related microbes.

FIGURE 5

Figure 5. The ROC curves and AUC values of six methods under LOOCV framework.

FIGURE 6

Figure 6. The ROC curves and average AUC values of six methods under fivefold CV framework.

Case Studies

Accumulating evidence has shown that the development and occurrence of human disease are closely related to the imbalance of the microbial community. To infer potential association, in this section, case studies were implemented on two different common human diseases (asthma and IBD). In this way, we used the number of validated predicted microbes of the top 15 prediction results to further measure the predictive capability, respectively. If the genus of a microbe is related to the disease, this microbe will be related to the disease. This assumption has been widely used in related studies (Niu et al., 2019; Wang et al., 2019). Specifically, for a given disease, all pairs without observed association were regarded as candidate samples. We calculated the association scores for all microbes based on the joint framework. All candidate microbe samples were prioritized based on their scores.

Asthma is a common chronic inflammatory disease, which affects the daily lives of 300 million people worldwide (Lambrecht and Hammad, 2015). To investigate asthma-causing microbes, the prediction results have been tabulated in Table 2. There are 13 out of the top 15 candidate microbes that have been successfully supported to be associated with asthma based on previously published medical or biological literature. According to the table, our method has an excellent effect. Increasing evidence has shown that the development and occurrence of human asthma are closely related to the imbalance of the microbial community. For example, some clinical evidence has shown that asthmatic patients have lower Actinobacteria, Firmicutes, and Bacteroides proportions (Björkstén et al., 1999; Marri et al., 2013). The colonization by Clostridium coccoides subcluster XIVa species at age 3 weeks may serve as an early indicator of possible asthma (Vael et al., 2011). In addition, colonization by Clostridium difficile at age 1 month was closely associated with asthma at 6–7 years old (Van Nimwegen et al., 2011). One study showed that Streptococcus increases the risk of asthma by early asymptomatic colonization (Teo et al., 2015). Lactobacillus has been shown to be beneficial to asthmatic children (Huang et al., 2018).

TABLE 2

Table 2. Prediction results of the top 15 asthma-associated microbes.

IBD starts with inflammation and is a collective term for a wide range of intestinal diseases, which is a worldwide healthcare problem (Hossen et al., 2020). IBD has become one of the most studied human diseases linked to gut microbiota (Kostic et al., 2014). We listed the top 15 IBD-associated microbes in Table 3. As a result, 14 out of the top 15 candidate microbes have been successfully validated to be associated with the IBD based on published literature. Emerging evidence showed that many microbes are closely related to IBD. For example, the infection of Clostridium difficile is a significant clinical challenge for IBD patients, which can result in morbidity and mortality (Hashash and Binion, 2014). Some studies showed Bacteroidetes, Bacteroides, Firmicutes, and Prevotella are associated with the development of IBD (Juste et al., 2014; Walters et al., 2014). In IBD patients, Prevotella, Veillonella, and Haemophilus were found, which can contribute largely to dysbiosis, which is associated with inflammatory responses (Said et al., 2014). The study confirmed that Helicobacter pylori was inversely associated with IBD (Sonnenberg and Genta, 2012). In addition, Veillonella and Bifidobacterium decreased, while the proportion of Lactobacillus increased in the feces of IBD patients (Takaishi et al., 2008). Case studies indicated that our method has a practical effect on potential association prediction.

TABLE 3

Table 3. Prediction results of the top 15 IBD-associated microbes.

Conclusion and Discussion

Studies investigating microbiomes demonstrated a critical role for microbes in human health and disease. Identifying potential disease-related microbes is essential for understanding the mechanisms of host–microbe interactions and revealing the pathological mechanism of human diseases. Here, we designed a joint framework for association prediction based on the proposed CWNMF and graph Laplacian regularized least squares. The experimental results showed that our method achieved the best performance by comparing it with 5 state-of-the-art models. Case studies of asthma and IBD also further demonstrated that the proposed method is a useful tool to infer potential associations. All experimental results adequately demonstrated that the proposed method has reliable and effective prediction performance.

There are several key factors that make the proposed method have effective performance. Firstly, compared with graph regularized NMF and collaborative matrix factorization, we introduced a weighting term and changed the NMF for prediction to deal with missing values and weaken the effect caused by a sparse dataset. Secondly, we calculated the functional similarity of microbes and introduced symptom-based disease similarity for fusing more similarity information. Thirdly, to restructure the sparse association matrix, two collaborative regularization terms were incorporated into the framework to fuse similarity information and constrain two low-dimensional representations, guiding the matrix factorization process. We used the iterative update algorithm to solve the matrix factorization objective function, which is easy to use and code. Semisupervised learning provides more effective information in the process of prediction. We hope that the proposed method can help biomedical researchers conduct follow-up research, and a growing number of potential disease-related microbes could be verified through biological or clinical experiments.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Author Contributions

DX: methodology, software, formal analysis, and writing—original draft, writing—review and editing. HX: data curation, methodology, software, and writing—original draft. YZ: supervision, funding acquisition, funding acquisition, and writing—review and editing. RG: formal analysis, supervision, and funding acquisition. All authors contributed to the article and approved the submitted version.

Funding

This work has been supported by the National Natural Science Foundation of China under (Grant Nos. 61877064, U1806202, and 61533011).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb.2022.834982/full#supplementary-material

References

Bao, W., Jiang, Z., and Huang, D. S. (2017). Novel human microbe-disease association prediction using network consistency projection. BMC Bioinform. 18:543. doi: 10.1186/s12859-017-1968-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Björkstén, B., Naaber, P., Sepp, E., and Mikelsaar, M. (1999). The intestinal microflora in allergic Estonian and Swedish 2-year-old children. Clin. Exp. Allergy 29, 342–346. doi: 10.1046/j.1365-2222.1999.00560.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Cai, D., He, X., Han, J., and Huang, T. S. (2020). Graph Regularized Nonnegative Matrix Factorization for Data Representation. Appl. Intell. 50, 438–447. doi: 10.1007/s10489-019-01539-9

CrossRef Full Text | Google Scholar

Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C., and Collins, J. J. (2018). Next-Generation Machine Learning for Biological Networks. Cell 173, 1581–1592. doi: 10.1016/j.cell.2018.05.015

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Huang, Y. A., You, Z. H., Yan, G. Y., and Wang, X. S. (2017). A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics 33, 733–739. doi: 10.1093/bioinformatics/btw715

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Xie, D., Zhao, Q., and You, Z. H. (2019). MicroRNAs and complex diseases: From experimental results to computational models. Brief. Bioinform. 20, 515–539. doi: 10.1093/bib/bbx130

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Yan, C. C., Zhang, X., Zhang, X., Dai, F., Yin, J., et al. (2016). Drug-target interaction prediction: Databases, web servers and computational models. Brief. Bioinform. 17, 696–712. doi: 10.1093/bib/bbv066

PubMed Abstract | CrossRef Full Text | Google Scholar

Cho, I., and Blaser, M. J. (2012). The human microbiome: At the interface of health and disease. Nat. Rev. Genet. 13, 260–270. doi: 10.1038/nrg3182

PubMed Abstract | CrossRef Full Text | Google Scholar

Ehrlich, S. D. (2011). “MetaHIT: The European Union Project on Metagenomics of the Human Intestinal”. In: Nelson K. (eds) Metagenomics of the Human Body. (New York, NY: Springer)

Google Scholar

Fan, C., Lei, X., Guo, L., and Zhang, A. (2019). Predicting the associations between microbes and diseases by integrating multiple data sources and path-based HeteSim scores. Neurocomputing 323, 76–85. doi: 10.1016/j.neucom.2018.09.054

CrossRef Full Text | Google Scholar

Gao, M. M., Cui, Z., Gao, Y. L., Wang, J., and Liu, J. X. (2021). Multi-Label Fusion Collaborative Matrix Factorization for Predicting LncRNA-Disease Associations. IEEE J. Biomed. Heal. Informatics 25, 881–890. doi: 10.1109/JBHI.2020.2988720

PubMed Abstract | CrossRef Full Text | Google Scholar

Gu, Q., Zhou, J., and Ding, C. (2010). Collaborative filtering: Weighted nonnegative matrix factorization incorporating user and item graphs. Proc. 10th SIAM Int. Conf. Data Mining SDM 2010, 199–210. doi: 10.1137/1.9781611972801.18

CrossRef Full Text | Google Scholar

Hashash, J. G., and Binion, D. G. (2014). Managing Clostridium difficile in Inflammatory Bowel Disease (IBD). Curr. Gastroenterol. Rep. 16, 14–19. doi: 10.1007/s11894-014-0393-1

PubMed Abstract | CrossRef Full Text | Google Scholar

He, B. S., Peng, L. H., and Li, Z. (2018). Human microbe-disease association prediction with graph regularized non-negative matrix factorization. Front. Microbiol. 9:2560. doi: 10.3389/fmicb.2018.02560

PubMed Abstract | CrossRef Full Text | Google Scholar

Hossen, I., Hua, W., Ting, L., Mehmood, A., Jingyi, S., Duoxia, X., et al. (2020). Phytochemicals and inflammatory bowel disease: a review. Crit. Rev. Food Sci. Nutr. 60, 1321–1345. doi: 10.1080/10408398.2019.1570913

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, C. F., Chie, W. C., and Wang, I. J. (2018). Efficacy of Lactobacillus administration in school-age children with asthma: A randomized, placebo-controlled trial. Nutrients 10:1678. doi: 10.3390/nu10111678

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Y. A., You, Z. H., Chen, X., Huang, Z. A., Zhang, S., and Yan, G. Y. (2017). Prediction of microbe-disease association from the integration of neighbor and graph with collaborative recommendation model. J. Transl. Med. 15:209. doi: 10.1186/s12967-017-1304-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Juste, C., Kreil, D. P., Beauvallet, C., Guillot, A., Vaca, S., Carapito, C., et al. (2014). Bacterial protein signals are associated with Crohn’s disease. Gut 63, 1566–1577. doi: 10.1136/gutjnl-2012-303786

PubMed Abstract | CrossRef Full Text | Google Scholar

Kostic, A. D., Xavier, R. J., and Gevers, D. (2014). The microbiome in inflammatory bowel disease: Current status and the future ahead. Gastroenterology 146, 1489–1499. doi: 10.1053/j.gastro.2014.02.009

PubMed Abstract | CrossRef Full Text | Google Scholar

Lambrecht, B. N., and Hammad, H. (2015). The immunology of asthma. Nat. Immunol. 16, 45–56. doi: 10.1038/ni.3049

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, D. D., and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791. doi: 10.1038/44565

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, D. D., and Seung, H. S. (2001). Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 13, 1–7.

Google Scholar

Ley, R. E., Peterson, D. A., and Gordon, J. I. (2006). Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell 124, 837–848. doi: 10.1016/j.cell.2006.02.017

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, G., Luo, J., Liang, C., Xiao, Q., Ding, P., and Zhang, Y. (2019). Prediction of LncRNA-Disease Associations Based on Network Consistency Projection. IEEE Access 7, 58849–58856. doi: 10.1109/ACCESS.2019.2914533

CrossRef Full Text | Google Scholar

Long, Y., Luo, J., Zhang, Y., and Xia, Y. (2021). Predicting human microbe-disease associations via graph attention networks with inductive matrix completion. Brief. Bioinform. 22, 1–13. doi: 10.1093/bib/bbaa146

PubMed Abstract | CrossRef Full Text | Google Scholar

Lozupone, C. A., Stombaugh, J. I., Gordon, J. I., Jansson, J. K., and Knight, R. (2012). Diversity, stability and resilience of the human gut microbiota. Nature 489, 220–230. doi: 10.1038/nature11550

PubMed Abstract | CrossRef Full Text | Google Scholar

Luo, J., and Long, Y. (2020). NTSHMDA: Prediction of Human Microbe-Disease Association Based on Random Walk by Integrating Network Topological Similarity. IEEE/ACM Trans. Comput. Biol. Bioinforma. 17, 1341–1351. doi: 10.1109/TCBB.2018.2883041

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, W., Zhang, L., Zeng, P., Huang, C., Li, J., Geng, B., et al. (2017). An analysis of human microbe-disease associations. Brief. Bioinform. 18, 85–97. doi: 10.1093/bib/bbw005

PubMed Abstract | CrossRef Full Text | Google Scholar

Mao, Y., and Saul, L. K. (2004). Modeling distances in large-scale networks by matrix factorization. Proc. 2004 ACM SIGCOMM Internet Meas. Conf. IMC 2004, 278–287. doi: 10.1145/1028788.1028827

CrossRef Full Text | Google Scholar

Marri, P. R., Stern, D. A., Wright, A. L., Billheimer, D., and Martinez, F. D. (2013). Asthma-associated differences in microbial composition of induced sputum. J. Allergy Clin. Immunol. 131, 346.e–352.e. doi: 10.1016/j.jaci.2012.11.013

PubMed Abstract | CrossRef Full Text | Google Scholar

Niu, Y. W., Qu, C. Q., Wang, G. H., and Yan, G. Y. (2019). RWHMDA: Random walk on hypergraph for microbe-disease association prediction. Front. Microbiol. 10, 102–300. doi: 10.3389/fmicb.2019.01578

PubMed Abstract | CrossRef Full Text | Google Scholar

O’Hara, A. M., and Shanahan, F. (2006). The gut flora as a forgotten organ. EMBO Rep. 7, 688–693. doi: 10.1038/sj.embor.7400731

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L., Shen, L., Liao, L., Liu, G., and Zhou, L. (2020). RNMFMDA: A Microbe-Disease Association Identification Method Based on Reliable Negative Sample Selection and Logistic Matrix Factorization With Neighborhood Regularization. Front. Microbiol. 11:592430. doi: 10.3389/fmicb.2020.592430

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L., Wang, C., Tian, X., Zhou, L., and Li, K. (2021). Finding lncRNA-protein Interactions Based on Deep Learning with Dual-net Neural Architecture. IEEE/ACM Trans. Comput. Biol. Bioinforma. doi: 10.1109/TCBB.2021.3116232 [Epub ahead of print].

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L. H., Sun, C. N., Guan, N. N., Li, J. Q., and Chen, X. (2018a). HNMDA: heterogeneous network-based miRNA–disease association prediction. Mol. Genet. Genomics 293, 983–995. doi: 10.1007/s00438-018-1438-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L. H., Yin, J., Zhou, L., Liu, M. X., and Zhao, Y. (2018b). Human microbe-disease association prediction based on adaptive boosting. Front. Microbiol. 9:2440. doi: 10.3389/fmicb.2018.02440

PubMed Abstract | CrossRef Full Text | Google Scholar

Qu, J., Zhao, Y., and Yin, J. (2019). Identification and analysis of human microbe-disease associations by matrix decomposition and label propagation. Front. Microbiol. 10:291. doi: 10.3389/fmicb.2019.00291

PubMed Abstract | CrossRef Full Text | Google Scholar

Said, H. S., Suda, W., Nakagome, S., Chinen, H., Oshima, K., Kim, S., et al. (2014). Dysbiosis of salivary microbiota in inflammatory bowel disease and its association with oral immunological biomarkers. DNA Res. 21, 15–25. doi: 10.1093/dnares/dst037

PubMed Abstract | CrossRef Full Text | Google Scholar

Schwabe, R. F., and Jobin, C. (2013). The microbiome and cancer. Nat. Rev. Cancer 13, 800–812. doi: 10.1038/nrc3610

PubMed Abstract | CrossRef Full Text | Google Scholar

Shi, J. Y., Huang, H., Zhang, Y. N., Cao, J. B., and Yiu, S. M. (2018). BMCMDA: A novel model for predicting human microbe-disease associations via binary matrix completion. BMC Bioinform. 19:281. doi: 10.1186/s12859-018-2274-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Sonnenberg, A., and Genta, R. M. (2012). Low prevalence of Helicobacter pylori infection among patients with inflammatory bowel disease. Aliment. Pharmacol. Ther. 35, 469–476. doi: 10.1111/j.1365-2036.2011.04969.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Takaishi, H., Matsuki, T., Nakazawa, A., Takada, T., Kado, S., Asahara, T., et al. (2008). Imbalance in intestinal microflora constitution could be involved in the pathogenesis of inflammatory bowel disease. Int. J. Med. Microbiol. 298, 463–472. doi: 10.1016/j.ijmm.2007.07.016

PubMed Abstract | CrossRef Full Text | Google Scholar

Teo, S. M., Mok, D., Pham, K., Kusel, M., Serralha, M., Troy, N., et al. (2015). The infant nasopharyngeal microbiome impacts severity of lower respiratory infection and risk of asthma development. Cell Host Microbe 17, 704–715. doi: 10.1016/j.chom.2015.03.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Tremlett, H., Bauer, K. C., Appel-Cresswell, S., Finlay, B. B., and Waubant, E. (2017). The gut microbiome in human neurological disease: A review. Ann. Neurol. 81, 369–382. doi: 10.1002/ana.24901

PubMed Abstract | CrossRef Full Text | Google Scholar

Turnbaugh, P. J., Ley, R. E., Hamady, M., Fraser-Liggett, C. M., Knight, R., and Gordon, J. I. (2007). The Human Microbiome Project. Nature 449, 804–810. doi: 10.1038/nature06244

PubMed Abstract | CrossRef Full Text | Google Scholar

Vael, C., Vanheirstraeten, L., Desager, K. N., and Goossens, H. (2011). Denaturing gradient gel electrophoresis of neonatal intestinal microbiota in relation to the development of asthma. BMC Microbiol. 11:68. doi: 10.1186/1471-2180-11-68

PubMed Abstract | CrossRef Full Text | Google Scholar

van Laarhoven, T., Nabuurs, S. B., and Marchiori, E. (2011). Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics 27, 3036–3043. doi: 10.1093/bioinformatics/btr500

PubMed Abstract | CrossRef Full Text | Google Scholar

Van Nimwegen, F. A., Penders, J., Stobberingh, E. E., Postma, D. S., Koppelman, G. H., Kerkhof, M., et al. (2011). Mode and place of delivery, gastrointestinal microbiota, and their influence on asthma and atopy. J. Allergy Clin. Immunol. 128, 948.e–955.e. doi: 10.1016/j.jaci.2011.07.027

PubMed Abstract | CrossRef Full Text | Google Scholar

Walters, W. A., Xu, Z., and Knight, R. (2014). Meta-analyses of human gut microbes associated with obesity and IBD. FEBS Lett. 588, 4223–4233. doi: 10.1016/j.febslet.2014.09.039

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, F., Huang, Z. A., Chen, X., Zhu, Z., Wen, Z., Zhao, J., et al. (2017). LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction. Sci. Rep. 7:7601. doi: 10.1038/s41598-017-08127-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, L., Wang, Y., Li, H., Feng, X., Yuan, D., and Yang, J. (2019). A bidirectional label propagation based computational model for potential microbe-disease association prediction. Front. Microbiol 10:684. doi: 10.3389/fmicb.2019.00684

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, Z., Klipfell, E., Bennett, B. J., Koeth, R., Levison, B. S., Dugar, B., et al. (2011). Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease. Nature 472, 57–65. doi: 10.1038/nature09922

PubMed Abstract | CrossRef Full Text | Google Scholar

Xia, Z., Wu, L. Y., Zhou, X., and Wong, S. T. C. (2010). Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst. Biol. 4:S6. doi: 10.1186/1752-0509-4-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Xiao, Q., Luo, J., Liang, C., Cai, J., and Ding, P. (2018). A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics 34, 239–248. doi: 10.1093/bioinformatics/btx545

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, D., Xu, H., Zhang, Y., Chen, W., and Gao, R. (2020a). Protein-Protein Interactions Prediction Based on Graph Energy and Protein Sequence Information. Molecules 25:1841.

Google Scholar

Xu, D., Zhang, J., Xu, H., Zhang, Y., Chen, W., Gao, R., et al. (2020b). Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data. BMC Genomics 21:650. doi: 10.1186/s12864-020-07038-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, D., Xu, H., Zhang, Y., Chen, W., and Gao, R. (2021a). LncRNA-protein interaction prediction based on regularized nonnegative matrix factorization and sequence information. Match 85, 555–574.

Google Scholar

Xu, D., Xu, H., Zhang, Y., Wang, M., Chen, W., and Gao, R. (2021b). MDAKRLS: Predicting human microbe-disease association based on Kronecker regularized least squares and similarities. J. Transl. Med. 19:66. doi: 10.1186/s12967-021-02732-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Yin, M.-M., Liu, J.-X., Gao, Y.-L., Kong, X.-Z., and Zheng, C.-H. (2020). NCPLP: A Novel Approach for Predicting Microbe-Associated Diseases With Network Consistency Projection and Label Propagation. IEEE Trans. Cybern. doi: 10.1109/tcyb.2020.3026652 [Epub ahead of print].

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, L., Yang, P., Feng, H., Zhao, Q., and Liu, H. (2021). Using Network Distance Analysis to Predict lncRNA–miRNA Interactions. Interdiscip. Sci. Comput. Life Sci. 13, 535–545. doi: 10.1007/s12539-021-00458-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, S., Wang, W., Ford, J., and Makedon, F. (2006). Learning from incomplete ratings using non-negative matrix factorization. Proc. Sixth SIAM Int. Conf. Data Min. 2006, 549–553. doi: 10.1137/1.9781611972764.58

CrossRef Full Text | Google Scholar

Zhang, W., Yang, W., Lu, X., Huang, F., and Luo, F. (2018). The bi-direction similarity integration method for predicting microbe-disease associations. IEEE Access 6, 38052–38061. doi: 10.1109/ACCESS.2018.2851751

CrossRef Full Text | Google Scholar

Zhao, Y., Wang, C.-C., and Chen, X. (2020). Microbes and complex diseases: from experimental results to computational models. Brief. Bioinform. 22:bbaa158. doi: 10.1093/bib/bbaa158

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, L., Wang, Z., Tian, X., and Peng, L. (2021). LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification. BMC Bioinform. 22:479. doi: 10.1186/s12859-021-04399-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, X., Menche, J., Barabási, A. L., and Sharma, A. (2014). Human symptoms-disease network. Nat. Commun 5:4212. doi: 10.1038/ncomms5212

PubMed Abstract | CrossRef Full Text | Google Scholar

Zou, S., Zhang, J., and Zhang, Z. (2017). A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network. PLoS One 12:e0184394. doi: 10.1371/journal.pone.0184394

PubMed Abstract | CrossRef Full Text | Google Scholar

Zou, S., Zhang, J., and Zhang, Z. (2018). Novel human microbe-disease associations inference based on network consistency projection. Sci. Rep. 8:8034. doi: 10.1038/s41598-018-26448-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: microbe, disease, association prediction, collaborative weighted non-negative matrix factorization, graph Laplacian regularized least squares

Citation: Xu D, Xu H, Zhang Y and Gao R (2022) Novel Collaborative Weighted Non-negative Matrix Factorization Improves Prediction of Disease-Associated Human Microbes. Front. Microbiol. 13:834982. doi: 10.3389/fmicb.2022.834982

Received: 14 December 2021; Accepted: 19 January 2022;
Published: 10 March 2022.

Edited by:

Qi Zhao, University of Science and Technology Liaoning, China

Reviewed by:

Wen Zhang, Huazhong Agricultural University, China
Lihong Peng, Hunan University of Technology, China

Copyright © 2022 Xu, Xu, Zhang and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yusen Zhang, emhhbmd5c0BzZHUuZWR1LmNu; Rui Gao, Z2FvcnVpQHNkdS5lZHUuY24=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.