SCCPMD: Probability matrix decomposition method subject to corrected similarity constraints for inferring long non-coding RNA–disease associations

Lin, Lieqing; Chen, Ruibin; Zhu, Yinting; Xie, Weijie; Jing, Huaiguo; Chen, Langcheng; Zou, Minqing

doi:10.3389/fmicb.2022.1093615

ORIGINAL RESEARCH article

Front. Microbiol., 11 January 2023

Sec. Systems Microbiology

Volume 13 - 2022 | https://doi.org/10.3389/fmicb.2022.1093615

SCCPMD: Probability matrix decomposition method subject to corrected similarity constraints for inferring long non-coding RNA–disease associations

Lieqing Lin¹

Ruibin Chen²

Yinting Zhu²

Weijie Xie²

Huaiguo Jing³^*

Langcheng Chen¹^*

Minqing Zou⁴

¹Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, China
²School of Computer, Guangdong University of Technology, Guangzhou, China
³Sports Department, Guangdong University of Technology, Guangzhou, China
⁴Department of Experiment Teaching, Guangdong University of Technology, Guangzhou, China

Accumulating evidence has demonstrated various associations of long non-coding RNAs (lncRNAs) with human diseases, such as abnormal expression due to microbial influences that cause disease. Gaining a deeper understanding of lncRNA–disease associations is essential for disease diagnosis, treatment, and prevention. In recent years, many matrix decomposition methods have also been used to predict potential lncRNA-disease associations. However, these methods do not consider the use of microbe-disease association information to enrich disease similarity, and also do not make more use of similarity information in the decomposition process. To address these issues, we here propose a correction-based similarity-constrained probability matrix decomposition method (SCCPMD) to predict lncRNA–disease associations. The microbe-disease associations are first used to enrich the disease semantic similarity matrix, and then the logistic function is used to correct the lncRNA and disease similarity matrix, and then these two corrected similarity matrices are added to the probability matrix decomposition as constraints to finally predict the potential lncRNA–disease associations. The experimental results show that SCCPMD outperforms the five advanced comparison algorithms. In addition, SCCPMD demonstrated excellent prediction performance in a case study for breast cancer, lung cancer, and renal cell carcinoma, with prediction accuracy reaching 80, 100, and 100%, respectively. Therefore, SCCPMD shows excellent predictive performance in identifying unknown lncRNA–disease associations.

Introduction

Non-coding RNAs such as microRNAs (miRNAs), Circular RNA (circRNA) and long non-coding RNAs (lncRNAs) play crucial roles in controlling the biological processes of plants and animals (Zhang et al., 2020b; Wang et al., 2021a, 2022). Owing to their roles as genetic regulators in the development of complex disorders such as cancer, miRNAs have the potential to serve as diagnostic markers and therapeutic targets (Chen et al., 2019b; Hill and Tran, 2021; Huang et al., 2022a,b). Several algorithmic models have also been developed for the exploration of miRNA–disease and miRNA-disease associations (Chen et al., 2019c, 2021a; Zhang et al., 2021a,b). However, as medicine advances, more and more studies have also shown that lncRNAs play an important role in many different diseases (Cao et al., 2019). LncRNAs are RNA molecules with transcriptional lengths above 200 nucleotides that lack protein-coding capabilities (Xing et al., 2021). For example, HOXA-AS2 was identified as a novel cancer-associated lncRNA, which exhibits aberrant expression in a variety of malignancies, including breast, gastric, gallbladder, hepatocellular, and pancreatic cancers (Wang et al., 2018a). With increasing recognition of the importance of lncRNAs, more in-depth research has focused on the relationship between lncRNAs and diseases. However, traditional biological validation experiments are time-consuming and costly; thus, there is an urgent need to develop accurate and effective computational methods to determine possible lncRNA–disease associations. Many computational models have recently been developed to successfully predict possible lncRNA–disease associations, which can be classified into three main categories.

The first category is characterized by machine-learning methods (Zhang et al., 2020a; Lan et al., 2022). Chen and Yan (2013) proposed the first such approach to predict lncRNA–disease associations using Laplace regularized least squares in a semi-supervised learning framework. Subsequently, by combining genomic, glomerular, and transcriptomic data, Zhao et al. (2015) devised a computational method based on a simple Bayesian classifier approach, which led to the discovery of 707 potential cancer-associated lncRNAs. Zhu et al. (2021) predicted lncRNA–disease associations by integrating several similarity matrices and combining incremental principal component analysis and random forest techniques. However, supervised learning-based models such as support vector machine and plain Bayesian classifiers rely heavily on difficult-to-obtain negative sample (Chen et al., 2017).

The second category is based on building biological networks to predict lncRNA–disease associations (Zhang et al., 2019a, 2020c). Sun et al. (2014) proposed RWRlncD, a global network computational strategy that applies restart random wandering (RWR) on lncRNA functional similarity networks to infer potential associations between human lncRNAs and disease. Zhang et al. (2019b) integrated known topological interactions of lncRNA–disease, lncRNA–miRNA, and miRNA–disease to construct a linked tripartite network, and used the topology of the obtained network to calculate the similarity of disease pairs and lncRNA pairs. Finally, rule-based inference methods were used to predict new lncRNA–disease associations. Zhou et al. (2021) employed a rotating forest classifier to train prediction models after creating a heterogeneous network by combining relationships among miRNAs, lncRNAs, proteins, drugs, and diseases. However, the heterogeneous networks constructed by these network-based approaches relying on the relationships of lncRNAs themselves, miRNAs, proteins, and drugs to lncRNAs (diseases) can result in failure in reliable predictions of new diseases and/or new lncRNAs.

The third category includes matrix decomposition methods (Chen et al., 2018a,b, 2021b; Xie et al., 2021). To effectively predict probable relationships, Fu et al. (2018) employed matrix triple decomposition to split a data matrix from heterogeneous data sources into low-rank matrices and reconstruct the lncRNA–disease association matrix. Based on probabilistic matrix decomposition, Xuan et al. (2019) deduced probable lncRNA–disease associations by assuming that low-rank matrices are positively distributed with Gaussian noise. To enhance the potential association between lncRNAs and diseases, Gao et al. (2021) optimized the lncRNA and disease space by multi-labeling and fusing these labels. Finally, co-matrix decomposition was used to predict lncRNA–disease correlations. Wang et al. (2021b) treated the discovery of disease-associated lncRNA as a recommender system problem, and predicted the relationships between lncRNA and diseases using a graph-regularized non-negative matrix decomposition approach. (Liu et al., 2021) proposed an lncRNA–disease association prediction approach based on double sparse collaborative matrix decomposition. To boost the sparsity, the L2,1-norm was introduced to the conventional co-matrix decomposition method. However, none of the algorithms presented above use similar information of lncRNA and disease as constraints to optimize the matrix decomposition algorithm. Thus, there is still some room for improvement in the prediction performance.

Traditional probabilistic matrix decomposition only uses probabilistic linear models with Gaussian noise to model the interaction of lncRNAs with diseases. Based on the assumption that similar lncRNAs/diseases are usually interrelated with the corresponding disease/lncRNA, we here propose a correction-based similarity-constrained probability matrix decomposition (SCCPMD) method for predicting lncRNA–disease associations. Considering the noise effect of the similarity matrix of lncRNAs and diseases, the noise is reduced by correcting the similarity matrix using a logistic function to highlight strong correlations within the similarity range [0,1] while diluting weak correlations. The lncRNA and disease similarity are then used as constraints in the probability matrix decomposition process, resulting in two low-rank matrices to predict the potential lncRNA–disease association. Leave-one-out cross-validation (LOOCV) and five-fold cross-validation (5-fold CV) were performed to validate the predictive performance of SCCPMD using known lncRNA–disease association datasets. The final area under the curve (AUC) values of SCCPMD reached 0.9787 and 0.9528 ± 0.0036 with LOOCV and 5-fold CV, respectively, which were both better than the prediction performances obtained with existing advanced algorithms. In addition, we confirmed the effectiveness of SCCPMD in application to three test cases of human diseases: breast cancer, lung cancer, and renal cell carcinoma (RCC).

Materials and methods

Datasets

We used the LncRNADisease database (Bao et al., 2019), which provides a dataset of lncRNA–disease associations. After removing duplicate lncRNAs and diseases as well as non-human data, 1,690 unique experimentally validated lncRNA–disease associations were obtained, including 447 unique lncRNAs and 218 unique diseases. The lncRNA–disease associations were described by building a disease–lncRNA adjacency matrix, $Y \in R^{n l \times n d}$ , where $n l$ and $n d$ represent the number of lncRNAs and diseases, respectively. The matrix $Y$ is defined as follows:

Y (i, j) = {\begin{array}{l} 0 lncRNA l (i) has no association with disease d (j) \\ 1 lncRNA l (i) is associated with disease d (j) \end{array} (1)

In other words, if an lncRNA $l_{i}$ is confirmed to be associated with a disease $d_{j}$ , then $Y (i, j)$ is set to 1; otherwise, $Y (i, j)$ is 0.

Semantic similarity of disease

We built a directed acyclic graph (DAG) based on the descriptor data from the Medical Subject Headings (MeSH) of the National Library of Medicine¹ to determine the semantic similarity among diseases. A disease $d$ is described by $D A G (d) = (d, V (d), E (d))$ , where $V (d)$ and $E (d)$ are the vertex set and edge set of the $D A G$ , respectively. Based on the $D A G$ layer structure of disease $d$ , we can calculate the semantic value ( $S$ ) of disease $m$ to disease $d$ as follows:

T_{d} (m) = {\begin{matrix} 1, i f m = d \\ \max {{0.5}^{*} T_{d} (m^{'}) | m^{'} \in c h i l d r e n o f m, i f m \neq d \end{matrix} (2)

According to the $D A G$ of a disease, the semantic value of a disease is defined as the sum of the ancestral nodes of the disease and the semantic contribution value of the disease to itself, expressed by the following equation:

\begin{array}{l} T_{d} = \sum_{m \in V (d)} T_{d} (m) & (3) \end{array}

Based on the above steps, we can construct the semantic similarity matrix $S S$ to represent the semantic similarity between disease $d_{i}$ and disease $d_{j}$ :

\begin{array}{l} S S (d_{i}, d_{j}) = \frac{\sum_{m \in V (d_{i}) \cap V (d_{j})} (T_{d_{i}} (m) + T_{d_{j}} (m))}{T_{d_{i}} + T_{d_{j}}} & (4) \end{array}

Gaussian interaction profile kernel similarity for diseases

To address the sparsity of the semantic similarity matrix of diseases and integrate more information on disease similarity, we used microbe-disease associations to calculate Gaussian similarity of diseases. We downloaded human microbe-disease associations from the Human Microbe-Disease Association Database (HMDAD). Microbe-disease associations were described by creating a microbe-disease adjacency matrix, $A \in R^{m \times n}$ , where $m$ and $d$ represent the number of microbes and diseases, respectively. As a measure of disease similarity, we constructed Gaussian interaction spectral kernel similarity using radial basis functions. We calculated the Gaussian interaction distribution based on the adjacency matrix A. The Gaussian interaction spectral kernel similarity between disease d_i and disease d_j can be calculated by the following equation:

\begin{array}{l} G D (d_{i}, d_{j}) = \exp (- γ_{d} {| | A (:, i) - A (:, j) | |}^{2}) & (5) \end{array}

\begin{array}{l} γ_{d} = γ / (\frac{1}{n} \sum_{i = 1}^{n} {‖ A (:, i) ‖}^{2}) & (6) \end{array}

Integrated similarity for diseases

We combine the disease semantic similarity $S S$ with the disease Gaussian similarity $G D$ to construct the final disease similarity matrix $S D$ . as follows, for disease $d_{i}$ and disease $d_{j}$ , $S D (d_{i}, d_{j}) = G D (d_{i}, d_{j})$ if $S S = 0$ and $S D (d_{i}, d_{j}) = S S (d_{i}, d_{j})$ otherwise.

D S (d_{i}, d_{j}) = {\begin{array}{l} G D (d_{i}, d_{j}) i f S S (d_{i}, d_{j}) = 0 \\ S S (d_{i}, d_{j}) o t h e r w i s e \end{array} (7)

Expression similarity of LncRNAs

LncRNA expression profiles can be utilized to reflect the similarity between lncRNAs, since related lncRNAs exhibit co-expression characteristics in various tissues (Chen et al., 2019a). For this purpose, we used RNA-sequencing data retrieved from the ArrayExpress database to create lncRNA expression profiles. The Spearman correlation coefficient between the expression profiles of two lncRNAs was then used to determine the degree of similarity in their expression patterns, defined as $E S$ , where $E S (l_{i}, l_{j}) \in [0, 1]$ denotes the expression similarity of lncRNAs $l_{i}$ and $l_{j}$ .

SCCPMD method

Overview

SCCPMD involves the following five steps, which are schematically outlined in Figure 1: (i) constructing lncRNA–disease association networks, (ii) constructing DAGs based on MeSH information to calculate the disease semantic similarity $S S$ and calculating disease Gaussian similarity $G D$ based on microbe-disease associations, (iii) integration of disease semantic similarity and disease gaussian similarity to obtain disease similarity $S D$ , (iv) calculating lncRNA expression similarity $E S$ based on Spearman correlation coefficients, (v) performing logistic function transformation for similarity correction of disease similarity and lncRNA expression similarity to reduce the noise introduced by the similarity matrix during matrix decomposition, and (vi) using the proposed constrained probability matrix decomposition method to help predict potential lncRNA–disease associations.

FIGURE 1

Figure 1. Flow chart of the SCCPMD approach.

Similarity correction

To reduce the noise that lncRNA and disease similarity matrices introduce during matrix decomposition, similarity correction techniques were used. The noise present in the similarity matrix is reduced by the logistic function so as to enhance the strong correlations in the similarity range [0,1] while diluting the weak correlations. This approach has previously been used in the study of disease-related genes (Vanunu et al., 2010). The logistic function is defined as follows:

\begin{array}{l} L (x) = \frac{1}{1 + e^{a x + b}} & (8) \end{array}

$L (x) \approx 0$ when $x \in [0, 0.3]$ and $L (x) \approx 1$ when $x \in [0.6, 1]$ . This means that weakly similar coefficients in the range of [0,0.3] are lost information, whereas strong similar coefficient values in the range of [0.6,1] usually exhibit significant co-expression of the relationship. Accordingly $L (0)$ needs to be close to 0; therefore, we set $L (0) = 0.0001$ to obtain $b = \log (9999)$ . In addition, $a$ is a correction degree coefficient that is used for parameter adjustment of the model. The corrected lncRNA expression similarity $L E$ and the disease similarity $L D$ are thus obtained as follows:

\begin{array}{l} L E (i, j) = \frac{1}{1 + e^{a \times E S (i, j) + b}}, i, j \in [1, n l] & (9) \end{array}

\begin{array}{l} L D (i, j) = \frac{1}{1 + e^{a \times D S (i, j) + b}}, i, j \in [1, n d] & (10) \end{array}

Constraint probability matrix decomposition

Following the similarity correction steps outlined above, we can obtain the association matrix $Y$ representing the relationship between lncRNA and disease from the corrected lncRNA–lncRNA expression similarity $L E$ and the corrected disease–disease similarity $L D$ . The values of $L E$ and $L D$ fall in the [0,1] interval. Let $W \in R^{k \times n l}$ and $D \in R^{k \times n d}$ be the lncRNA and disease latent feature matrices, where $k \in \min (n l, n d)$ . The latent feature vectors specific to lncRNAs and diseases are represented by the column vectors $W_{i}$ and $D_{j}$ , respectively. The goal is then to find lncRNA and disease latent models ( $W \in R^{k \times n l}$ and $D \in R^{k \times n d}$ ) whose product ( $W^{T} D$ ) can reconstruct the interaction matrix $Y$ . From a probabilistic point of view, the conditional distribution of the observed interactions $Y \in {0, 1}$ is expressed as:

\begin{array}{l} P (Y | W, D, σ^{2}) = \prod_{i = 1}^{n l} \prod_{j = 1}^{n d} {[f (Y_{i j} | W_{i}^{T} D_{j}, σ^{2})]}^{I_{i j}} & (11) \end{array}

where $f (x |, μ |, σ 2)$ is the probability density function of the Gaussian normal distribution with mean 𝜇 and variance $σ^{2}$ , and $I_{i j}$ is the indicator function that is equal to 1 if the lncRNA $l_{i}$ is related with disease $d_{j}$ and is 0 otherwise. A probabilistic representation of the association matrix $Y$ is then given by $P (Y | W, D, σ^{2}$ ). We use the following zero-mean spherical Gaussian priors on the lncRNA and disease eigenvectors as a generative model for the lncRNA and disease latent models:

\begin{array}{l} P (W | σ_{W}^{2}) = \prod_{i = 1}^{n l} f (W_{i} | 0, σ_{W}^{2} I) & (12) \end{array}

\begin{array}{l} P (D | σ_{D}^{2}) = \prod_{i = 1}^{n d} f (D_{j} | 0, σ_{D}^{2} I) & (13) \end{array}

where $I$ is a $k$ -dimensional identity diagonal matrix. Then, the posterior distribution of lncRNA and disease characteristics is derived as:

\begin{matrix} P (W, D | Y, σ^{2}, σ_{W}^{2}, σ_{D}^{2}) = \frac{P (W, D, Y, σ^{2}, σ_{W}^{2}, σ_{D}^{2})}{P (Y, σ^{2}, σ_{W}^{2}, σ_{D}^{2})} \\ = \frac{P (Y | W, D, Y, σ^{2}) \times P (W, D | σ_{W}^{2}, σ_{D}^{2})}{P (Y, σ^{2}, σ_{W}^{2}, σ_{D}^{2})} \\ ~ P (Y | W, D, σ^{2}) \times P (W, D | σ_{W}^{2}, σ_{D}^{2}) \\ = P (Y | W, D, σ^{2}) \times P (W | {(σ_{W})}^{2}) \times P (D | (σ_{D}^{2})) \\ = \prod_{i = 1}^{n l} \prod_{j = 1}^{n d} {[f (Y_{i j} | W_{i}^{T} D_{j}, σ^{2})]}^{I_{i j}} \\ \times \prod_{i = 1}^{n l} f (W_{i} | 0, σ_{W}^{2} I) \times \prod_{i = 1}^{n d} f (D_{j} | 0, σ_{D}^{2} I) \end{matrix} (14)

Taking the logarithm of equation (11), the distribution is transformed to:

\begin{matrix} \ln P (W, D | Y, σ^{2}, σ_{W}^{2}, σ_{D}^{2}) = \frac{1}{2 σ^{2}} \sum_{i = 1}^{n l} \sum_{j = 1}^{n d} I_{i j} {(Y_{i j} - W_{i}^{T} D_{j})}^{2} \\ - \frac{1}{2 σ^{2}} \sum_{i = 1}^{n l} W_{i}^{T} W_{i} - \frac{1}{2 σ^{2}} \sum_{j = 1}^{n d} D_{j}^{T} D_{j} \\ - \frac{1}{2} (\begin{array}{l} (\sum_{i = 1}^{n l} \sum_{j = 1}^{n d} I_{i j}) \ln σ^{2} \\ + (n l) k \ln σ_{W}^{2} + (n d) k \ln σ_{D}^{2} \end{array}) + c \end{matrix} (15)

where $c$ is a constant. With the hyperparameters held constant, maximization of the log posterior for lncRNA and disease characteristics is identical to minimization of the sum of squared errors with a quadratic regularization term objective function:

\begin{array}{l} \min \frac{1}{2} \sum_{i = 1}^{n l} \sum_{j = 1}^{n d} I_{i j} {(Y_{i j} - W_{i}^{T} D_{j})}^{2} + \frac{λ_{W}}{2} \sum_{i = 1}^{n l} {‖ W_{i} ‖}_{F r o}^{2} + \frac{λ_{D}}{2} \sum_{j = 1}^{n d} {‖ D_{j} ‖}_{F r o}^{2} & (16) \end{array}

where $λ_{W} = σ^{2} / σ_{W}^{2}$ ， $λ_{D} = σ^{2} / σ_{D}^{2}$ , ${‖ \cdot ‖}_{F r o}^{2}$ represents the Frobenius norm. However, the conventional probabilistic matrix decomposition model only uses a probabilistic linear model with Gaussian noise to depict the interaction between lncRNAs and diseases, leaving room for improvement. Based on the assumption that similar lncRNAs are usually interrelated with corresponding diseases and vice versa, CPMD takes more biological information (such as the similarity of lncRNAs and diseases) into account for the prediction. Accordingly, we suggest the following as a new objective function for CPMD:

\begin{array}{l} \min \frac{1}{2} \sum_{i = 1}^{n l} \sum_{j = 1}^{n d} I_{i j} {(Y_{i j} - W_{i}^{T} D_{j})}^{2} + \frac{λ_{W}}{2} \sum_{i = 1}^{n l} {‖ W_{i} ‖}_{F r o}^{2} + \frac{λ_{D}}{2} \sum_{j = 1}^{n d} {‖ D_{j} ‖}_{F r o}^{2} \\ + \frac{λ_{1}}{2} {‖ W^{T} W - L D ‖}_{F r o}^{2} + \frac{λ_{2}}{2} {‖ D^{T} D - L E ‖}_{F r o}^{2} \end{array} (17)

where $W_{i}$ represents the $k$ -dimensional potential feature vector of lncRNAs, $W^{T} W$ is the lncRNA weighted similarity matrix, and $D^{T} D$ is the disease weighted similarity matrix. Here, we use the gradient descent algorithm to solve the optimization problem in equation (14). First, the corresponding Lagrangian function $Γ_{f}$ of equation (14) is defined as:

\begin{matrix} Γ_{f} = \frac{1}{2} T r (I \times (Y Y^{T} - Y D^{T} W - W^{T} D Y^{T} + W^{T} D D^{T} W)) \\ + \frac{λ_{W}}{2} T r (W W^{T}) + \frac{λ_{D}}{2} T r (D D^{T}) \\ + \frac{λ_{1}}{2} T r (L D {(L D)}^{T} - L D W^{T} W - W^{T} W (L D) + W^{T} W W^{T} W) \\ + \frac{λ_{2}}{2} T r (L E {(L E)}^{T} - L E D^{T} D - D^{T} D (L E) + D^{T} D D^{T} D) \\ + T r (Φ W^{T} + T r (Ψ D^{T})) \end{matrix} (18)

where 𝑇𝑟(∙) denotes the trace of the matrix, and 𝛷=[ $φ_{i k}$ ] and 𝛹=[ $ψ_{j k}$ ] are the constraints 𝑊_𝑖𝑘≥0 and $D$ _𝑗𝑘≥0 for Lagrange multipliers. The partial derivatives of $W$ and $D$ are:

\begin{matrix} \frac{\partial Γ_{f}}{\partial W} = I \times (- D Y^{T} + D D^{T} W) \\ + λ_{W} W + 2 λ_{1} (- W (L D) + W W^{T} W) + Φ \end{matrix} (19)

\begin{matrix} \frac{\partial Γ_{f}}{\partial D} = I \times (- W Y^{T} + W W^{T} W) + λ_{D} D \\ + 2 λ_{2} (- D (L E) + D D^{T} D) + Ψ \end{matrix} (20)

Using the Karush-Kuhn-Tucker conditions $φ_{i k} W_{i k}$ =0 and $ψ_{j k} D_{j k}$ =0，the following equations for $W_{i k}$ and $D_{j k}$ can be obtained:

\begin{array}{l} {(I \times (- D Y^{T} + D D^{T} W))}_{i k} W_{i k} + {(λ_{W} W)}_{i k} W_{i k} \\ + {(2 λ_{1} (- W (L D) + W W^{T} W))}_{i k} W_{i k} = 0 \end{array} (21)

\begin{array}{l} {(I \times (- W Y + W W^{T} D))}_{j k} D_{j k} + {(λ_{D} D)}_{j k} D_{j k} \\ + {(2 λ_{2} (- D (L E) + D D^{T} D))}_{j k} D_{j k} = 0 \end{array} (22)

Thus, we can obtain the following update rule:

\begin{array}{l} W_{i k} \times \frac{{(I \times (D Y^{T}) + 2 λ_{1} (W (L D)))}_{i k}}{{(I \times (D D^{T} W))}_{i k} + {(λ_{W} W)}_{i k} + {(2 λ_{1} (W W^{T} W))}_{i k}} \to W_{i k}^{n e w} & (23) \end{array}

\begin{array}{l} D_{j k} \times \frac{{(I \times (W Y) + 2 λ_{2} (D (L E)))}_{j k}}{{(I \times (W W^{T} D))}_{j k} + {(λ_{D} D)}_{j k} + {(2 λ_{2} (D D^{T} D))}_{j k}} \to D_{j k}^{n e w} & (24) \end{array}

In accordance with equations (20) and (21), the matrices $W$ and $D$ are continuously updated until reaching the objective function’s local minimum. Finally, the predicted lncRNA–disease interaction matrix is calculated using the formula $Y^{*} = W^{T} D$ . In general, the 𝑗th column of $Y^{*}$ indicates the interaction score between disease $d_{j}$ and the lncRNA, with a higher score indicating a more significant interaction.

Results and discussion

Assessment indicators

Both LOOCV and 5-fold CV methods were utilized to assess the SCCPMD model’s efficacy in predicting potential lncRNA–disease associations (Huang et al., 2022c; Sun et al., 2022). Each proven lncRNA–disease association is listed as a test sample in the LOOCV framework, whereas the other unidentified relationship pairings are listed as training samples. All confirmed lncRNA–disease associations are separated into five groups in the 5-fold CV framework, and in each experiment, one group is chosen as the test group and the other as the training group. Using this method, we ran the experiment 100 times and computed the mean of all outcomes. Since the lncRNA–disease dataset only contains a small number of known lncRNA–disease associations and the AUC is known to be insensitive to a skewed class distribution, we used the AUC of the receiver operating characteristic curve to evaluate the performance of SCCPMD (Zhao et al., 2022).

Optimal parameter selection

There are six parameters in SCCPMD: $a, k, λ_{W}, λ_{D}, λ_{1}$ , and $λ_{2}$ . To tease out the effect of these five parameter choices on the model, we performed 100 experiments in the 5-fold CV framework and calculated the average AUC values. First, there is a similarity correction component for parameter $a$ . We searched for the optimal parameter in the range of −1 to −10. Figure 2 clearly shows that the highest AUC value was reached when $a = - 4$ .

FIGURE 2

Figure 2. The impact of different $α$ values under 5-fold cross-validation.

The parameter $k$ represents the number of lncRNA and disease latent feature matrix row vectors, which determines the size of the latent feature matrix. As shown in Figure 3, we restricted the range of $k$ from 10 to 100. The highest AUC value was achieved for SCCPMD when $k = 20$ .

FIGURE 3

Figure 3. The impact of different $k$ values under 5-fold cross-validation.

Parameters $λ_{W}, λ_{D}, λ_{1},$ and $λ_{2}$ exist in the constrained probability matrix decomposition part, which controls the influence of each part in the final update rule of the lncRNA and disease characteristic matrix. As shown in Figures 4, 5, we set the range of all four parameters to be from 0.1 to 1.

FIGURE 4

Figure 4. The impact of different $λ_{W}$ and $λ_{D}$ values under 5-fold cross-validation.

FIGURE 5

Figure 5. The impact of different $λ_{1}$ and $λ_{2}$ values under 5-fold cross-validation.

Based on the above experiments, the best values of these five parameters were finally determined as $a = - 4, k = 20, λ_{W} = 0.8, λ_{D} = 0.6, λ_{1} = 0.6$ , and $λ_{2} = 0.8$ .

Algorithm comparison

To evaluate the predictive performance of the SCCPMD model, SCCPMD was compared with five existing advanced methods: dual sparse collaborative matrix factorization (DSCMF; Liu et al., 2021), geometric matrix completion lncRNA–disease association (GMCLDA; Lu et al., 2020), local random walk-based prediction of human lncRNA and disease associations (Li et al., 2021), probabilistic matrix factorization method for identifying lncRNA–disease associations (PMFILDA; Xuan et al., 2019), and bi-random walks for predicting lncRNA–disease associations (BRWLDA; Yu et al., 2017). As shown in Figure 6, the AUC value of the SCCPMD curve in the LOOCV framework was 0.9787, which was larger than that obtained with the other prediction methods (DSCMF, AUC = 0.9101; GMCLDA, AUC = 0.9086; LRWHNLDA, AUC = 0.9083; PMFILDA, AUC = 0.8850; and BRWLDA, AUC = 0.8376), indicating that the performance of SCCPMD is better than that of existing calculation methods. To further validate the prediction performance of SCCPMD, the 5-fold CV framework was used for validation. As shown in Figure 7, SCCPMD obtained a reliable AUC of 0.9528 ± 0.0036, which was much higher than the AUC values of 0.8946 ± 0.0038, 0.8804 ± 0.0009, 0.8844 ± 0.0014, 0.8705 ± 0.0047, and 0.8172 ± 0.0014 for the comparison methods DSCMF, GMCLDA, LRWHNLDA, PMFILDA, and BRWLDA, respectively. The computational methods we compared were only for lncRNA-disease association pairs, predicting potential associations based on the similarity between lncRNA and disease. The SCCPMD model uses microbe-disease associations to enrich disease similarities, while correcting the similarity matrix to highlight strong similarities and reduce noise in the original similarities. Therefore, SCCPMD shows better performance than these five methods and would be more favorable for the prediction of lncRNA–disease associations.

FIGURE 6

Figure 6. Area under the receiver operating characteristic curve (AUC) values of leave-one-out cross-validation (LOOCV) between SCCPMD and the other five comparison models.

FIGURE 7

Figure 7. Area under the receiver operating characteristic curve (AUC) values of 5-fold cross-validation between SCCPMD and the other five comparison models.

Case study

Malignancy, as a general term to refer to cancer, has a significant negative impact on human health. With a global annual mortality rate of more than 10 million, cancer remains one of the main contributors to mortality (Zaimy et al., 2017). To validate the actual predictive performance of SCCPMD for lncRNA–disease associations, three cancer types with high hazard were selected as disease case studies: breast cancer, lung cancer, and RCC. The predicted correlations were validated in three lncRNA–disease association databases: the lncRNA disease database, Lnc2cancer database, and MNDR database.

Table 1 shows the top 10 lncRNAs that were predicted to be associated with breast cancer using our model, nine of which have previously been reported to be associated with breast cancer. Breast epithelial cells can become cancerous when they proliferate uncontrollably in response to several oncogenic stimuli (Fahad, 2019). Four lncRNAs, including LINC00667, were identified by analysis of gene expression data from 768 breast cancer patients in The Cancer Genome Atlas database, suggesting potential predictive biomarkers for breast cancer with clinical value (Zhu et al., 2020). Among these markers, PVT1 has been reported to affect mature adipogenic mediators by regulating p21 expression in triple-negative breast cancer cells (Wang et al., 2018b). Functional studies showed that the proliferation, migration, and invasion of breast cancer cells overexpressing LINC01089 were significantly reduced and that epidermal growth factor reversed these effects (Yuan et al., 2019). TSIX is an lncRNA that has been explored as a stable non-invasive breast cancer immunological biomarker, which plays a role in X chromosome inactivation and breast cancer (Salama et al., 2020).

TABLE 1

Table 1. Top 10 lncRNAs predicted by SCCPMD to be connected to breast cancer.

Table 2 shows the top 10 lncRNAs that were predicted to be associated with lung cancer using our model, all of which have been reported to play roles in lung cancer. Despite improvements in our knowledge of lung cancer risk, progression, immunologic control, and treatment choices, lung cancer—a malignancy that starts in the bronchial mucosa or glands of the lungs—remains the most common cause of cancer-related death (Bade and Cruz, 2020). Amplification of PVT1 in lung cancer patients was associated with a poor prognosis for survival. PVT1 levels are increased in lung cancer cells, which promotes their growth and metastasis both in vivo and in vitro (Pan et al., 2020). The expression of SNHG1 in non-small cell lung cancer (NSCLC) tissues and cells is high. Silencing SNHG1 could suppress the migration and invasion of NSCLC cells, which also promoted apoptosis and decreased the cell proliferation rate (Li and Zheng, 2020). Considerable upregulation of the lncRNA CDKN2 B-AS1 has been detected in both lung cancer tissues and cell lines (Wang et al., 2020). In vitro studies demonstrated that blocking NEAT1 with short hairpin RNA prevented lung cancer cells from surviving and migrating or invading (Ma et al., 2020). Table 3 shows the top 10 lncRNAs that were predicted to be associated with RCC with our model, all of which have been associated with RCC in previous studies. RCC comprises a group of malignant tumors originating from the renal cortical epithelium, most commonly in the upper pole of the kidney (Pullen Jr, 2021). By inhibiting cell cycle progression and reversing the epithelial-to-mesenchymal transition (EMT) phenotype, NEAT1 knockdown could reduce the rate of RCC cell proliferation and suppressed RCC migration and invasion (Liu et al., 2017). By controlling EMT-related genes, loss-of-function and gain-of-function pathways demonstrated that CRNDE promotes the migration and invasion of clear cell RCC cells (Ding et al., 2018). MEG3 has been proposed to induce apoptosis in RCC cells by activating the mitochondrial pathway (Wang et al., 2015). Functional assays revealed that MIAT knockdown prevented kidney cancer cells from proliferating and metastasizing both in vitro and in vivo (Qu et al., 2018).

TABLE 2

Table 2. Top 10 lncRNAs predicted by SCCPMD to be connected to lung cancer.

TABLE 3

Table 3. Top 10 lncRNAs predicted by SCCPMD to be connected to renal cell carcinoma.

Conclusion

An increasing number of studies have shown that exploration of potential lncRNA–disease associations can be expedited and more effectively performed by developing computational models. Recent results have also showed that matrix decomposition is a reliable method for predicting lncRNA-disease associations. We here propose a novel method to predict unknown lncRNA–disease associations based on corrected similarity added as a constraint to the probability matrix decomposition (SCCPMD). We confirmed the excellent performance of SCCPMD, demonstrating superiority in prediction to existing advanced algorithms, which is attributed to the following three factors: (1) the disease Gaussian similarity obtained by fusing microbe-disease associations calculation can solve the original problem of sparse disease semantic similarity, (2) the corrected similarity performance highlights the effects of strong correlations while reducing the effects of weak correlations, thus reducing the overall noise in the matrix; and (3) introducing lncRNA and disease similarity constraints in the traditional probability matrix decomposition makes better use of this biological information to improve the prediction performance. The AUC values of SCCPMD in the LOOCV and 5-fold CV frameworks reached up to 0.9787 and 0.9528 ± 0.0036, respectively, which were much higher than those obtained with the comparative algorithms. Additionally, we chose three complex diseases as case studies, demonstrating that SCCPMD performs well with real-world clinical data.

Although SCCPMD enriches disease similarity using microbe-disease associations, prediction results are also affected by microbe-disease associations. In addition, relying on a single lncRNA expression similarity can also make the model limited. Integration of more similarity information is expected to make the proposed model more robust. Therefore, in future work we will try to combine more bioinformatic datasets and fuse multiple lncRNA similarities to improve the robustness and predictive performance of the model.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.

Ethics statement

Ethical review and approval were not required for the study of human participants in accordance with the local legislation and institutional requirements. Written informed consent from the patients/ participants OR patients/participants legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

Author contributions

LL, HJ, and LC: conceptualization. LL: data curation and resources. LL, RC, YZ, WX, HJ, LC, and MZ: formal analysis and writing—review and editing. LL, RC, and YZ: investigation. LL, RC, YZ, WX, HJ, and LC: methodology and supervision. LL and MZ: project administration. RC, YZ, and WX: validation and writing draft. RC: visualization. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the National Natural Science Foundation of China (72001202 and 62002070), the Opening Project of Guangdong Province Key Laboratory of Computational Science at Sun Yat-sen University (2021013), the Science and Technology Plan Project of Guangzhou City (202102021236), and the Philosophy and Social Science Co-Construction Project of Guangzhou City (2020GZGJ115).

Acknowledgments

We thank reviewers for valuable suggestions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^https://www.nlm.nih.gov/mesh/meshhome.html

References

Bade, B. C., and Cruz, C. S. D. (2020). Lung cancer 2020: epidemiology, etiology, and prevention[J]. Clin. Chest Med. 41, 1–24. doi: 10.1016/j.ccm.2019.10.001

CrossRef Full Text | Google Scholar

Bao, Z., Yang, Z., Huang, Z., Zhou, Y., Cui, Q., and Dong, D. (2019). LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases[J]. Nucleic Acids Res. 47, D1034–D1037. doi: 10.1093/nar/gky905

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, H. L., Liu, Z. J., Huang, P. L., Yue, Y. L., and Xi, J. N. (2019). lncRNA-RMRP promotes proliferation, migration and invasion of bladder cancer via miR-206[J]. Eur. Rev. Med. Pharmacol. Sci. 23, 1012–1021. doi: 10.26355/eurrev_201902_16988

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Li, T. H., Zhao, Y., Wang, C. C., and Zhu, C. C. (2021a). Deep-belief network for predicting potential miRNA-disease associations[J]. Brief. Bioinform. 22:bbaa186. doi: 10.1093/bib/bbaa186

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Sun, Y. Z., Guan, N. N., Qu, J., Huang, Z. A., Zhu, Z. X., et al. (2019a). Computational models for lncRNA function prediction and functional similarity calculation[J]. Brief. Funct. Genomics 18, 58–82. doi: 10.1093/bfgp/ely031

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Sun, L. G., and Zhao, Y. (2021b). NCMCMDA: miRNA–disease association prediction through neighborhood constraint matrix completion[J]. Brief. Bioinform. 22, 485–496. doi: 10.1093/bib/bbz159

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Wang, L., Qu, J., Guan, N. N., and Li, J. Q. (2018a). Predicting miRNA–disease association based on inductive matrix completion[J]. Bioinformatics 34, 4256–4265. doi: 10.1093/bioinformatics/bty503

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Xie, D., Zhao, Q., and You, Z. H. (2019b). MicroRNAs and complex diseases: from experimental results to computational models[J]. Brief. Bioinform. 20, 515–539. doi: 10.1093/bib/bbx130

CrossRef Full Text | Google Scholar

Chen, X., and Yan, G. Y. (2013). Novel human lncRNA–disease association inference based on lncRNA expression profiles[J]. Bioinformatics 29, 2617–2624. doi: 10.1093/bioinformatics/btt426

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Yan, C. C., Zhang, X., and You, Z. H. (2017). Long non-coding RNAs and complex diseases: from experimental results to computational models[J]. Brief. Bioinform. 18, 558–576. doi: 10.1093/bib/bbw060

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Yin, J., Qu, J., and Huang, L. (2018b). MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction[J]. PLoS Comput. Biol. 14:e1006418. doi: 10.1371/journal.pcbi.1006418

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Zhu, C. C., and Yin, J. (2019c). Ensemble of decision tree reveals potential miRNA-disease associations[J]. PLoS Comput. Biol. 15:e1007209. doi: 10.1371/journal.pcbi.1007209

PubMed Abstract | CrossRef Full Text | Google Scholar

Ding, C., Han, F., Xiang, H., Xia, X., Wang, Y., Dou, M., et al. (2018). LncRNA CRNDE is a biomarker for clinical progression and poor prognosis in clear cell renal cell carcinoma[J]. J. Cell. Biochem. 119, 10406–10414. doi: 10.1002/jcb.27389

PubMed Abstract | CrossRef Full Text | Google Scholar

Fahad, U. M. (2019). Breast cancer: current perspectives on the disease status[J]. Breast Cancer Metastasis and Drug Resistance. 1152, 51–64. doi: 10.1007/978-3-030-20301-6_4

CrossRef Full Text | Google Scholar

Fu, G., Wang, J., Domeniconi, C., and Yu, G. (2018). Matrix factorization-based data fusion for the prediction of lncRNA–disease associations[J]. Bioinformatics 34, 1529–1537. doi: 10.1093/bioinformatics/btx794

PubMed Abstract | CrossRef Full Text | Google Scholar

Gao, M. M., Cui, Z., Gao, Y. L., Wang, J., and Liu, J. X. (2021). Multi-label fusion collaborative matrix factorization for predicting LncRNA-disease associations[J]. IEEE J. Biomed. Health Inform. 25, 881–890. doi: 10.1109/JBHI.2020.2988720

PubMed Abstract | CrossRef Full Text | Google Scholar

Hill, M., and Tran, N. (2021). miRNA interplay: mechanisms and consequences in cancer[J]. Dis. Model. Mech. 14:dmm047662. doi: 10.1242/dmm.047662

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, L., Zhang, L., and Chen, X. (2022a). Updated review of advances in microRNAs and complex diseases: experimental results, databases, webservers and data fusion[J]. Brief. Bioinform. 23:bbac397. doi: 10.1093/bib/bbac397

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, L., Zhang, L., and Chen, X. (2022b). Updated review of advances in microRNAs and complex diseases: taxonomy, trends and challenges of computational models[J]. Brief. Bioinform. 23:bbac358. doi: 10.1093/bib/bbac358

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, L., Zhang, L., and Chen, X. (2022c). Updated review of advances in microRNAs and complex diseases: towards systematic evaluation of computational models[J]. Brief. Bioinform. 23:bbac407. doi: 10.1093/bib/bbac407

PubMed Abstract | CrossRef Full Text | Google Scholar

Lan, W., Lai, D., Chen, Q., Wu, X., Chen, B., Liu, J., et al. (2022). LDICDL: LncRNA-disease association identification based on collaborative deep learning[J]. IEEE/ACM Trans. Comput. Biol. Bioinform. 19, 1715–1723. doi: 10.1109/TCBB.2020.3034910

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, J., Zhao, H., Xuan, Z., Yu, J., Feng, X., Liao, B., et al. (2021). A novel approach for potential human LncRNA-disease association prediction based on local random walk[J]. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 1049–1059. doi: 10.1109/TCBB.2019.2934958

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, X., and Zheng, H. (2020). LncRNA SNHG1 influences cell proliferation, migration, invasion, and apoptosis of non-small cell lung cancer cells via the miR-361-3p/FRAT1 axis[J]. Thoracic Cancer. 11, 295–304. doi: 10.1111/1759-7714.13256

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, F., Chen, N., Gong, Y., Xiao, R., Wang, W., and Pan, Z. (2017). The long non-coding RNA NEAT1 enhances epithelial-to-mesenchymal transition and chemoresistance via the miR-34a/c-met axis in renal cell carcinoma[J]. Oncotarget 8, 62927–62938. doi: 10.18632/oncotarget.17757

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, J. X., Gao, M. M., Cui, Z., Gao, Y. L., and Li, F. (2021). DSCMF: prediction of LncRNA-disease associations based on dual sparse collaborative matrix factorization[J]. BMC bioinformatics. 22:241. doi: 10.1186/s12859-020-03868-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Lu, C., Yang, M., Li, M., Li, Y., Wu, F. X., and Wang, J. (2020). Predicting human lncRNA-disease associations based on geometric matrix completion[J]. IEEE J. Biomed. Health Inform. 24, 2420–2429. doi: 10.1109/JBHI.2019.2958389

PubMed Abstract | CrossRef Full Text | Google Scholar

Ma, F., Lei, Y. Y., Ding, M. G., Luo, L. H., Xie, Y. C., and Liu, X. L. (2020). LncRNA NEAT1 interacted with DNMT1 to regulate malignant phenotype of cancer cell and cytotoxic T cell infiltration via epigenetic inhibition of p53, cGAS, and STING in lung cancer[J]. Front. Genet. 11:250. doi: 10.3389/fgene.2020.00250

PubMed Abstract | CrossRef Full Text | Google Scholar

Pan, Y., Liu, L., Cheng, Y., Yu, J., and Feng, Y. (2020). Amplified LncRNA PVT1 promotes lung cancer proliferation and metastasis by facilitating VEGFC expression[J]. Biochem. Cell Biol. 98, 676–682. doi: 10.1139/bcb-2019-0435

PubMed Abstract | CrossRef Full Text | Google Scholar

Pullen, R. L. Jr. (2021). Renal cell carcinoma, part 1[J]. Nursing 51, 34–40. doi: 10.1097/01.NURSE.0000753972

CrossRef Full Text | Google Scholar

Qu, Y., Xiao, H., Xiao, W., Xiong, Z., Hu, W., Gao, Y., et al. (2018). Upregulation of MIAT regulates LOXL2 expression by competitively binding MiR-29c in clear cell renal cell carcinoma[J]. Cell. Physiol. Biochem. 48, 1075–1087. doi: 10.1159/000491974

PubMed Abstract | CrossRef Full Text | Google Scholar

Salama, E. A., Adbeltawab, R. E., and El Tayebi, H. M. (2020). XIST and TSIX: novel cancer immune biomarkers in PD-L1-overexpressing breast cancer patients[J]. Front. Oncol. 9:1459. doi: 10.3389/fonc.2019.01459

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, J., Shi, H., Wang, Z., Zhang, C., Liu, L., Wang, L., et al. (2014). Inferring novel lncRNA–disease associations based on a random walk model of a lncRNA functional similarity network[J]. Mol. BioSyst. 10, 2074–2081. doi: 10.1039/c3mb70608g

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, F., Sun, J., and Zhao, Q. (2022). A deep learning method for predicting metabolite–disease associations via graph neural network[J]. Brief. Bioinform. 23:bbac266. doi: 10.1093/bib/bbac266

PubMed Abstract | CrossRef Full Text | Google Scholar

Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., and Sharan, R. (2010). Associating genes and protein complexes with disease via network propagation[J]. PLoS Comput. Biol. 6:e1000641. doi: 10.1371/journal.pcbi.1000641

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, C. C., Han, C. D., Zhao, Q., and Chen, X. (2021a). Circular RNAs and complex diseases: from experimental results to computational models[J]. Brief. Bioinform. 22:bbab286. doi: 10.1093/bib/bbab286

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, M., Huang, T., Luo, G., Huang, C., Xiao, X. Y., Wang, L., et al. (2015). Long non-coding RNA MEG3 induces renal cell carcinoma cells apoptosis by activating the mitochondrial pathway[J]. J. Huazhong Univ. Sci. Technolog. Med. Sci. 35, 541–545. doi: 10.1007/s11596-015-1467-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, J., Su, Z., Lu, S., Fu, W., Liu, Z., Jiang, X., et al. (2018a). LncRNA HOXA-AS2 and its molecular mechanisms in human cancer[J]. Clin. Chim. Acta 485, 229–233. doi: 10.1016/j.cca.2018.07.004

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, L., Wang, R., Ye, Z., Wang, Y., Li, X., Chen, W., et al. (2018b). PVT1 affects EMT and cell proliferation and migration via regulating p21 in triple-negative breast cancer cells cultured with mature adipogenic medium[J]. Acta Biochim. Biophys. Sin. 50, 1211–1218. doi: 10.1093/abbs/gmy129

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, G., Xu, G., and Wang, W. (2020). Long noncoding RNA CDKN2B-AS1 facilitates lung cancer development through regulating miR-378b/NR2C2[J]. Onco. Targets. Ther. 13, 10641–10649. doi: 10.2147/OTT.S261973

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, M. N., You, Z. H., Wang, L., Li, L. P., and Zheng, K. (2021b). LDGRNMF: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization[J]. Neurocomputing 424, 236–245. doi: 10.1016/j.neucom.2020.02.062

CrossRef Full Text | Google Scholar

Wang, W., Zhang, L., Sun, J., Zhao, Q., and Shuai, J. (2022). Predicting the potential human lncRNA–miRNA interactions based on graph convolution network with conditional random field[J]. Brief. Bioinform. 23:bbac463. doi: 10.1093/bib/bbac463

PubMed Abstract | CrossRef Full Text | Google Scholar

Xie, G., Chen, H., Sun, Y., Gu, G., Lin, Z., Wang, W., et al. (2021). Predicting circRNA-disease associations based on deep matrix factorization with multi-source fusion[J]. Interdisciplinary Sciences: Computational Life Sciences. 13, 582–594. doi: 10.1007/s12539-021-00455-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Xing, C., Sun, S., Yue, Z. Q., and Bai, F. (2021). Role of lncRNA LUCAT1 in cancer[J]. Biomed. Pharmacother. 134:111158. doi: 10.1016/j.biopha.2020.111158

PubMed Abstract | CrossRef Full Text | Google Scholar

Xuan, Z., Li, J., Yu, J., Feng, X., Zhao, B., and Wang, L. (2019). A probabilistic matrix factorization method for identifying lncRNA-disease associations[J]. Genes. 10:126. doi: 10.3390/genes10020126

PubMed Abstract | CrossRef Full Text | Google Scholar

Yu, G., Fu, G., Lu, C., Ren, Y., and Wang, J. (2017). BRWLDA: bi-random walks for predicting lncRNA-disease associations[J]. Oncotarget 8, 60429–60446. doi: 10.18632/oncotarget.19588

PubMed Abstract | CrossRef Full Text | Google Scholar

Yuan, H., Qin, Y., Zeng, B., Feng, Y., Li, Y., Xiang, T., et al. (2019). Long noncoding RNA LINC01089 predicts clinical prognosis and inhibits cell proliferation and invasion through the Wnt/β-catenin signaling pathway in breast cancer[J]. Onco. Targets. Ther. 12, 4883–4895. doi: 10.2147/OTT.S208830

PubMed Abstract | CrossRef Full Text | Google Scholar

Zaimy, M. A., Saffarzadeh, N., Mohammadi, A., Pourghadamyari, H., Izadi, P., Sarli, A., et al. (2017). New methods in the diagnosis of cancer and gene therapy of cancer based on nanoparticles[J]. Cancer Gene Ther. 24, 233–243. doi: 10.1038/cgt.2017.16

CrossRef Full Text | Google Scholar

Zhang, Y., Chen, M., Li, A., Cheng, X., Jin, H., and Liu, Y. (2020a). LDAI-ISPS: LncRNA–disease associations inference based on integrated space projection scores[J]. Int. J. Mol. Sci. 21:1508. doi: 10.3390/ijms21041508

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, H., Liang, Y., Han, S., Peng, C., and Li, Y. (2019a). Long noncoding RNA and protein interactions: from experimental results to computational models based on network methods[J]. Int. J. Mol. Sci. 20:1284. doi: 10.3390/ijms20061284

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, H., Liang, Y., Peng, C., Han, S., du, W., and Li, Y. (2019b). Predicting lncRNA-disease associations using network topological similarity based on deep mining heterogeneous networks[J]. Math. Biosci. 315:108229. doi: 10.1016/j.mbs.2019.108229

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, L., Liu, T., Chen, H., Zhao, Q., and Liu, H. (2021a). Predicting lncRNA–miRNA interactions based on interactome network and graphlet interaction[J]. Genomics 113, 874–880. doi: 10.1016/j.ygeno.2021.02.002

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, P., Meng, J., Luan, Y., and Liu, C. (2020b). Plant miRNA–lncRNA interaction prediction with the ensemble of CNN and IndRNN[J]. Interdisciplinary Sciences: Computational Life Sciences. 12, 82–89. doi: 10.1007/s12539-019-00351-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, L., Yang, P., Feng, H., Zhao, Q., and Liu, H. (2021b). Using network distance analysis to predict lncRNA–miRNA interactions[J]. Interdisciplinary Sciences: Computational Life Sciences. 13, 535–545. doi: 10.1007/s12539-021-00458-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y., Ye, F., Xiong, D., and Gao, X. (2020c). LDNFSGB: prediction of long non-coding rna and disease association using network feature similarity and gradient boosting[J]. BMC bioinformatics. 21:377. doi: 10.1186/s12859-020-03721-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, J. X., Sun, J. Q., et al. (2022). Predicting potential interactions between lncRNAs and proteins via combined graph auto-encoder methods. Brief. Bioinform. doi: 10.1093/bib/bbac527

CrossRef Full Text | Google Scholar

Zhao, T., Xu, J., Liu, L., Bai, J., Xu, C., Xiao, Y., et al. (2015). Identification of cancer-related lncRNAs through integrating genome, regulome and transcriptome features[J]. Mol. BioSyst. 11, 126–136. doi: 10.1039/c4mb00478g

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, J. R., You, Z. H., Cheng, L., and Ji, B. Y. (2021). Prediction of lncRNA-disease associations via an embedding learning HOPE in heterogeneous information networks[J]. Molecular Therapy-Nucleic Acids. 23, 277–285. doi: 10.1016/j.omtn.2020.10.040

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu, M., Lv, Q., Huang, H., Sun, C., Pang, D., and Wu, J. (2020). Identification of a four-long non-coding RNA signature in predicting breast cancer survival[J]. Oncol. Lett. 19, 221–228. doi: 10.3892/ol.2019.11063

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu, R., Wang, Y., Liu, J. X., and Dai, L. Y. (2021). IPCARF: improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier[J]. BMC bioinformatics. 22:175. doi: 10.1186/s12859-021-04104-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: lncRNA-long noncoding RNA, disease, similarity correction, constraint probability matrix decomposition, associations prediction

Citation: Lin L, Chen R, Zhu Y, Xie W, Jing H, Chen L and Zou M (2023) SCCPMD: Probability matrix decomposition method subject to corrected similarity constraints for inferring long non-coding RNA–disease associations. Front. Microbiol. 13:1093615. doi: 10.3389/fmicb.2022.1093615

Received: 09 November 2022; Accepted: 30 November 2022;
Published: 11 January 2023.

Edited by:

Qi Zhao, University of Science and Technology Liaoning, China

Reviewed by:

Chun-Chun Wang, China University of Mining and Technology, China
Li Zhang, Liaoning University, China

Copyright © 2023 Lin, Chen, Zhu, Xie, Jing, Chen and Zou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Huaiguo Jing, ✉ amluZ2hnQGdkdXQuZWR1LmNu; Langcheng Chen, ✉ Y2xjQGdkdXQuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.