- 1School of Information Science and Engineering, Shandong Normal University, Jinan, China
- 2School of Information Engineering, East China Jiaotong University, Nanchang, China
- 3College of Information Science and Engineering, Hunan Normal University, Changsha, China
- 4School of Computer Science, University of South China, Hengyang, China
- 5College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
Long noncoding RNAs (lncRNAs) are a class of noncoding RNA molecules longer than 200 nucleotides. Recent studies have uncovered their functional roles in diverse cellular processes and tumorigenesis. Therefore, identifying novel disease-related lncRNAs might deepen our understanding of disease etiology. However, due to the relatively small number of verified associations between lncRNAs and diseases, it remains a challenging task to reliably and effectively predict the associated lncRNAs for given diseases. In this paper, we propose a novel multiview consensus graph learning method to infer potential disease-related lncRNAs. Specifically, we first construct a set of similarity matrices for lncRNAs and diseases by taking advantage of the known associations. We then iteratively learn a consensus graph from the multiple input matrices and simultaneously optimize the predicted association probability based on a multi-label learning framework. To convey the utility of our method, three state-of-the-art methods are compared with our method on three widely used datasets. The experiment results illustrate that our method could obtain the best prediction performance under different cross validation schemes. The case study analysis implemented for uterine cervical neoplasms further confirmed the utility of our method in identifying lncRNAs as potential prognostic biomarkers in practice.
Introduction
With the completion of ENCODE project, researchers have found that only 2% of genes in the human genome encode proteins, while approximately 75% of the human genome is involved in the process of primary transcripts (Djebali et al., 2012; Li and Chang, 2014; Zhang et al., 2018b). The discovery of extensive transcription of large RNA transcripts which do not code for proteins, termed long noncoding RNAs (lncRNAs), provides a new perspective in understanding the centrality of RNA in gene regulation (Rinn and Chang, 2012). Evidences have shown that lncRNAs are key regulators for many cellular functions, including splicing, gene regulation, and hormone-like activity (Gao et al., 2019a; Mongelli et al., 2019). Moreover, the dysregulation of lncRNAs has been proved to be closely related with various human diseases, such as types of cancer, neurological as well as cardiovascular diseases (Feng et al., 2018; Zhang et al., 2019b). Consequently, identifying potential disease-related lncRNAs is of great importance and might shed new light on the understanding of the pathogenesis of complex diseases.
As a powerful complementary tool for biological and clinical experiments, many computational approaches have been developed to effectively predict the lncRNA-disease associations (Zou et al., 2016; Chen et al., 2017; Zhang et al., 2018c; Gong et al., 2019; Yue et al., 2019). Under the assumption that similar diseases are more likely to be associated with functionally similar lncRNAs, Chen et al. proposed Laplacian regularized least squares for lncRNA-disease association in terms of a semi-supervised learning framework(Chen and Yan, 2013). Liu et al. combined the gene expression profiles, lncRNA expression profiles and disease-associated genes to infer the potential associated diseases for human lncRNAs globally (Liu et al., 2014). In addition to the aforementioned datasets, Chen also incorporated the Gaussian interaction profile kernel similarity into their model and adopted the KATZ measure for lncRNA–disease association (Chen, 2015). Zhou et al. first constructed a heterogeneous network in terms of three sub-networks and then ranked the relevant lncRNAs for a given disease by applying the random walk with restart on the constructed network (Zhou et al., 2015). Chen et al. further improved the random walk with restart framework by initializing the probability vector according to the integration of lncRNA expression similarity and disease semantic similarity (Chen et al., 2016). Fu et al. decomposed the data matrices of heterogeneous data sources into low-rank matrices via matrix tri-factorization to explore the intrinsic as well as the shared structure, and then used the optimized low-rank matrices to obtain the potential associations (Fu et al., 2018). Lu et al. extracted a set of primary feature vectors and used the inductive matrix completion framework to infer the lncRNA-disease association (Lu et al., 2018). Lan et al. constructed a web server for lncRNA-disease association prediction by integrating multiple biological data resources (Lan et al., 2017). Xiao et al. obtained the association probability for a given lncRNA-disease association according to the lengths of the paths linking them in the constructed heterogeneous network (Xiao et al., 2018). Hu et al. adopted the bi-random walk algorithm to construct a linear model for the lncRNA-disease association prediction (Hu et al., 2019). Yu et al. applied a collaborative filtering model together with the Naive Bayesian Classifier on a constructed lncRNA-miRNA-disease tripartite network to effectively predict novel lncRNA-disease associations (Yu et al., 2019). Both Xie et al. and Chen et al. first fused different similarity matrices for lncRNAs and diseases based on a similarity kernel fusion model and then applied different classification frameworks to predict potential associations (Chen et al., 2019; Xie et al., 2019). Cui et al. developed a novel computational framework based on bipartite local model with nearest profile-based association inferring for prediction (Cui et al., 2019). Recently, Guo et al. employed the autoencoder to obtain the optimal feature space from the original feature set which was constructed from different types of similarities (Guo et al., 2019). The newly constructed features were then fed to a rotating forest to classify the lncRNA–disease associations and achieved remarkable performance.
Although the methods mentioned above have made great contributions to discover potential disease-related lncRNAs, the prediction accuracy is still limited in several ways. For example, in spite of the multiple biological data sources used in existing methods, the integration of the similarity matrices constructed from these data sources was simply performed by averaging them, which might be suboptimal. Furthermore, since the lncRNA–disease association data was relatively sparse, how to fully take advantage of the existing information during the prediction process remains challenging. To solve these issues, we here propose a multiview consensus graph learning method for disease-related lncRNAs prediction. Concretely, a set of similarity matrices for lncRNAs and diseases are first constructed by leveraging the known lncRNA–disease associations, respectively. We then iteratively learn a consensus graph from the multiple similarity matrices and obtain the final association probabilities between lncRNAs and diseases using a multi-label learning framework. To confirm the utility of our method, we compare the proposed method with several state-of-the-art methods on three widely used datasets under different evaluation metrics. The experimental results of various cross validation schemes clearly indicate that our method could achieve better prediction performance compared to the other three methods. Furthermore, we illustrate the potential of our method in identifying prognostic biomarkers for uterine cervical neoplasms in a case-study analysis.
Materials and Methods
Human lncRNA–Disease Associations
The LncRNADisease database is used as the data of known lncRNA–disease associations (Chen et al., 2013; Bao et al., 2019). We used three versions of LncRNADisease, June-2012 Version (marked as Dataset1), January-2014 Version (marked as Dataset2), and June-2015 Version (marked as Dataset3) in our experiments (Li et al., 2019). After filtering the lncRNA-disease associations with irregular disease names or lncRNA names and merging duplicate items, we obtained 276 interactions between 150 diseases and 112 lncRNAs for dataset1, 319 interactions between 169 diseases and 131 lncRNAs for dataset2, and 621 interactions between 226 diseases and 285 lncRNAs for dataset3, respectively (Table 1). For convenience, we use Y ∈ ℝp × q to represent the known lncRNA–disease association matrix, where p and q denote the number of lncRNAs and diseases, respectively. If disease j has an association with lncRNA i, then Yij = 1, otherwise Yij = 0.
Disease Semantic Similarity
To calculate the disease semantic similarity, we followed the same approach as described in previous work (Wang et al., 2010). Specifically, each disease d can be described by a Directed Acyclic Graphs (DAGs) that consists of three items DAG = (d, T(d), E(d)), where T(d) and E(d) are all the parent nodes of d including itself and all links from the ancestor nodes to child nodes, respectively. The contribution of disease t to the semantic value of disease d is defined as:
The overall semantic value of a given disease d can then be calculated as:
As a result, given a pair of diseases i and j, their semantic similarity is defined as:
We use AD(1) ∈ ℝq × q to denote the obtained disease semantic similarity matrix and stands for the semantic similarity for a disease pair i and j.
lncRNA Functional Similarity
Similarly, the lncRNA functional similarity was also calculated according to previous studies (Wang et al., 2010; Liang et al., 2019). For each lncRNA pair, we measured their similarity as follows:
where m and n are the number of diseases related to lncRNA li and lj, and D(l) represents the disease set related to lncRNA l. We use AL(1) ∈ ℝp × p to denote the obtained lncRNA functional similarity matrix and stands for the functional similarity for a pair of lncRNAs i and j.
Gaussian Interaction Profile Kernel Similarity
Gaussian interaction profile kernel similarity is widely used in various semi-supervised prediction tasks for measuring similarities (Zou et al., 2016; Zhang et al., 2017; Zhu et al., 2018; Pan et al., 2019; Yin et al., 2019). Here we also adopted this similarity measure to construct the similarity matrices for lncRNAs and diseases, respectively. Concretely, given two lncRNAs li and lj, their Gaussian interaction profile kernel similarity is defined as:
where IP(li) is in essence the i-th row of matrix Y, is a parameter controlling the kernel bandwidth and p is the number of lncRNAs. Similarly, for a pair of diseases di and dj, we have:
where IP(di) is in essence the i-th column of matrix Y, controls the kernel bandwidth and q is the number of diseases. Finally, we use AD(2) ∈ ℝq × q and AL(2) ∈ ℝp × p to denote the kernel similarity matrices for diseases and lncRNAs, respectively.
Cosine Similarity
Cosine similarity is another effective method for measuring similarities and is widely used in recommender systems (Gao et al., 2019b; Zhang et al., 2019a). Therefore, we also adopted cosine similarity to build the similarity matrices for lncRNAs and diseases. The cosine similarity for a pair of lncRNAs or diseases is calculated as:
where the definition of IP(·)is the same as that in the previous section. As a result, we use AD(3) ∈ ℝ q × q and AL(3) ∈ ℝp × p to record the cosine similarities for disease pairs and lncRNA pairs, respectively.
Methods
Notations
We first briefly introduce the notations used throughout the paper. All the matrices are denoted by italic uppercase letters while vectors are expressed in bold lowercase letters. The transpose, the trace and the Frobenius norm of a given matrix M are denoted by MT, Tr(M) and ||M||F, respectively. Mij represents the element at the i-th row and j-th column of M. 1 is a column vector with all elements equal to 1. For a given similarity matrix S, its degree matrix DS is a diagonal matrix whose main diagonal entry is ∑j (Sij + Sji)/2, and its Laplacian matrix LS is defined as LS = DS − (ST + S)/2.
Multiview Consensus Graph Learning for LncRNA–Disease Association Prediction
Given a set of similarity matrices for both lncRNAs and diseases, our aim is to find an optimal consensus graph based on these similarity matrices for subsequent prediction. Specifically, suppose we have n similarity matrices AD(1), AD(2),….,AD(n) ∈ ℝq × q constructed for diseases, and m similarity matrices AL(1), AL(2),….,AL(m) ∈ ℝp × p for lncRNAs, we propose to learn a consensus graph for the disease space and lncRNA space from multiple views by the following objective function respectively (Han et al., 2018; Wang et al., 2020):
The weight parameters and added for each view guarantee that the objective functions in Eq. (12) and (13) adaptively learn an optimal consensus graph in terms of the importance of each view (Liu et al., 2018b). Finally, we integrate the optimization process from two spaces into one framework with graph-based multi-label learning and obtain the final objective function as follows:
where LSD and LSL are the Laplacian matrices for the similarity matrices SD and SL, SDij and SLij denote the (i,j)-th elements in SD and SL, respectively. The constraints imposed on both SD and SL ensures that the learned similarities have explicit meanings. Y is the known binary lncRNA-disease association matrix defined above. Specifically, the objective proposed in Eq. (14) has two advantages in predicting lncRNA–disease associations. First of all, it incorporates multiple data resources to learn a reliable similarity matrix and could be well adapted to arbitrary number of input similarity matrices. Moreover, the predicted label matrix F and the learned consensus graph can collaboratively guide the learning process of each other and thus lead to better results (Zhang et al., 2018a). We propose an efficient method to solve Eq. (14) in the following subsection.
Optimization
In this section, we derive an efficient algorithm to solve the objective function in Eq. (14) in an iterative manner.
i) Updating SD and SL. For clarity, we only give the derivation for solving SD and the optimization for SL can be obtained similarly. By fixing the other variables in the objective function, Eq. (14) degenerates to Eq. (12). It can be rewritten in the following form:
Since different rows of SD are independent, we can then solve each row separately:
Denoting hi as a vector whose j-th element is , Eq. (16) can then be converted to:
Eq. (17) could be solved by an efficient iterative algorithm proposed in (Huang et al., 2015).
ii) Updating wD and wL. When SD, SL, F and wL are fixed, Eq. (14) becomes:
To solve Eq. (18), we first convert the target graph SD into a column vector by stacking its columns together. Similarly, we convert the multiple input similarity matrices AD(v)(v = 1,2…,n) into a set of vectors and denote a matrix G as . Then Eq. (18) can be transformed into:
Eq. (19) can also be solved by the algorithm proposed in (Huang et al., 2015; Liu et al., 2018a). The optimization for wL could be derived in a similar way.
iii) Update F. By fixing the other variables, Eq. (14) is reduced to the following problem:
Taking the derivative of Eq. (20) with respect to F and setting it to zero, we have:
Eq. (21) could be solved easily as a Sylvester equation (Zha et al., 2009; Shi et al., 2018).
The whole optimization process is summarized in Algorithm 1 and Figure 1 illustrates the overall workflow of our method. Moreover, the source code of our method can be freely downloaded at: https://github.com/hjtan516/MCGLLDA.
Results
Performance Evaluation
In this section, we compared the proposed method with three state-of-the-art methods i.e. BiwalkLDA (Hu et al., 2019), SIMCLDA (Lu et al., 2018) and KATZLDA (Chen, 2015) on the aforementioned three datasets. Firstly, two evaluation metrics Leave-One-Out Cross Valuation (LOOCV) and five-fold Cross Validation (CV) were conducted to systematically evaluate the prediction performance of each method. Both LOOCV and five-fold CV take part of the known lncRNA–disease associations as test samples and use the remaining as the training samples. However, LOOCV only takes one association at a time as the test sample while in five-fold CV all the known associations are randomly divided into five parts and one part was used as the test set each time. The Receiver Operating Characteristic (ROC) Curve was plotted in terms of the cross validation results and the Area Under the ROC Curve (AUC) was calculated to measure the prediction accuracy. As shown in Figures 2 and 3, our method reached the highest AUCs on all three datasets in both LOOCV and five-fold CV.
Figure 2 The comparison results between our method and the other three methods in terms of LOOCV using (A) Dataset1; (B) Dataset2; (C) Dataset3.
Figure 3 The comparison results between our method and the other three methods in terms of five-fold CV using (A) Dataset1; (B) Dataset2; (C) Dataset3.
Next, we adopted Leave-One-Disease-Out Cross Validation (LODOCV) to test the ability of all methods in predicting the potential related lncRNAs for diseases without known associations. Specifically, for each disease, we removed all its associated lncRNAs and made predictions by leveraging the information from other diseases and lncRNAs. As a result, we could obtain a list of AUC values for each method and we used density plots to demonstrate the comparison results. As shown in Figure 4, compared with the other methods, our method obtained the highest numbers of AUC values greater than 0.9 on all three datasets. The Wilcoxon signed rank test also validated the significance of our method over the other three methods in terms of LODOCV (Table 2). In summary, these results clearly indicated that our method outperformed the other three methods in predicting reliable lncRNA–disease associations.
Figure 4 The comparison results between our method and the other three methods in terms of LODOCV using (A) Dataset1; (B) Dataset2; (C) Dataset3.
Parameter Analysis
In Eq. (14), we used two parameters α and β to balance the importance between the similarity graph learning and the predicted association matrix learning. We investigated the impacts of α and β on the prediction performance of our method. Specifically, α was tested in the range from 0.0001 to 1 and β was tested from 0.0001 to 10. To determine the best combination of α and β, five-fold cross validation was carried out on Dataset3. As a result, when both α and β were set to 0.0001, our method achieved the best performance (Figure 5).
Figure 5 The influence of the two parameters α and β on the prediction accuracy of five-fold cross-validation.
Convergence Analysis
We also studied the practical convergence speed of our method. Specifically, Figure 6 illustrated the value variations of Eq. (14) with the number of iterations on Dataset3. As can be seen from the figure, the objective function value of Eq. (14) became stable in 5 iterations, indicating that our method converges rapidly and can be used in practice.
Case Study
To demonstrate the potential of our method in identifying lncRNAs as meaningful biomarkers for a given disease, we carried out a case-study analysis on Uterine CErvical Neoplasms (UCEC). Uterine Cervical Neoplasms is one of the most frequent causes of death in women and its early detection can significantly decrease its death rate (Jeong et al., 2003). To make reliable predictions, we applied our method on a newer version (July-2017) of lncRNA–disease associations from LncRNADisease database. In particular, associations with lncRNAs that were not recorded in BioMart and diseases that were not included in the MeSH Category C for diseases were excluded during the implementation. The predicted associations were then validated by another two widely used databases recording disease-related lncRNAs, i.e. Lnc2Cancer (Ning et al., 2016) and MNDR (Wang et al., 2013). As expected, the two databases confirmed that 9 out of the top 10 predicted lncRNAs were verified to be related with UCEC (Table 3). The only unconfirmed lncRNA is MIR7-3HG. To evaluate whether this lncRNA might be involved in UCEC, we further downloaded the lncRNA expression profile of 316 UCEC samples from TANRIC (Li et al., 2015) and performed the Kaplan–Meier survival analysis by using MIR7-3HG as the biomarker accordingly (Figure 7). The statistical significance in the survival analysis was calculated using the log rank test (Bewick et al., 2004). Notably, the results demonstrated that the higher expression level of MIR7-3HG was related with significantly decreased survival rates of UCEC patients, indicating that MIR7-3HG might play an important role in the pathogenesis of UCEC.
Table 3 The top 10 predicted lncRNAs to be associated with cervical uterine neoplasms by our method.
Figure 7 Kaplan–Meier survival analysis using MIR7-3HG as a prognostic biomarker in uterine cervical neoplasms. Patients are divided into “high” and “low” groups according to their expression level of MIR7-3HG against the mean expression level across all patients.
Conclusion
Increasing evidences have shown that lncRNAs accomplish a remarkable variety of biological functions and thus the aberrant expression or dysfunction of lncRNA might lead to various diseases. As a result, discovering newly disease-related lncRNAs might deepen our understanding of the biological roles of lncRNAs in carcinogenesis. In this work, a novel multiview consensus graph learning method for predicting lncRNA–disease associations was proposed. We first constructed a set of similarity matrices for lncRNAs and diseases by leveraging the known lncRNA–disease associations. We then learned a consensus graph for lncRNAs and diseases from the multiple similarity matrices and predicted the association probability between lncRNAs and diseases based on a multi-label learning framework. The results of LOOCV, five-fold CV as well as LODOCV on three widely used datasets all confirmed the superiority of our method. Moreover, the convergence analysis indicates that our method has a fast convergence rate and could be well adapted in practice. Lastly, the case study conducted for UCEC indicated that the expression level of MIR7-3HG was significantly related with the survival rate of patients and thus it might play important roles in the pathogenesis of UCEC. In summary, our method could reliably predict potential lncRNA–disease associations and could be easily extended to incorporate more data sources.
The success of our method is mainly two-fold. First, the known lncRNA–disease associations were leveraged to construct multiple kernel similarity matrices to better characterize the lncRNA similarities as well as disease similarities. Second, the view weights imposed for each view during the learning process guaranteed that more reliable similarity matrices have higher impacts on the final consensus graph. Despite the commendable results obtained, our method could still be improved in several ways. For example, the optimal values of the two parameters α and β might be searched by dynamic objective genetic algorithms. Besides, the integration of lncRNA expression data in our model should also be considered in the future.
Data Availability Statement
The datasets generated for this study are available on request to the corresponding author.
Author Contributions
CL conceived the study. HT and QS developed the algorithm and analyzed the results. CL and HT wrote the paper. CL and JL supervised the study. All authors have read and approved the final manuscript.
Funding
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61602283, 61873089, U1836216) and the Major Fundamental Research Project of Shandong Province (Grant No. ZR2019ZD03).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
Bao, Z., Yang, Z., Huang, Z., Zhou, Y., Cui, Q., Dong, D. (2019). LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 47, D1034–D1037. doi: 10.1093/nar/gky905
Bewick, V., Cheek, L., Ball, J. (2004). Statistics review 12: survival analysis. Crit. Care 8, 389–394. doi: 10.1186/cc2955
Chen, X., Yan, G. Y. (2013). Novel human lncRNA–disease association inference based on lncRNA expression profiles. Bioinformatics 29, 2617–2624. doi: 10.1093/bioinformatics/btt426
Chen, G., Wang, Z., Wang, D., Qiu, C., Liu, M., Chen, X., et al. (2013). LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 41, D983–D986. doi: 10.1093/nar/gks1099
Chen, X., You, Z. H., Yan, G. Y., Gong, D. W. (2016). IRWRLDA: improved random walk with restart for lncRNA–disease association prediction. Oncotarget 7, 57919–57931. doi: 10.18632/oncotarget.11141
Chen, X., Yan, C. C., Zhang, X., You, Z. H. (2017). Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinform. 18, 558–576. doi: 10.1093/bib/bbw060
Chen, Q., Lai, D., Lan, W., Wu, X., Chen, B., Chen, Y. P., et al. (2019). ILDMSF: inferring associations between long non-coding RNA and disease based on multi-similarity fusion. IEEE/ACM Trans. Comput. Biol. Bioinform. doi: 10.1109/TCBB.2019.2936476
Chen, X. (2015). KATZLDA: KATZ measure for the lncRNA–disease association prediction. Sci. Rep. 5, 16840. doi: 10.1038/srep16840
Cui, Z., Liu, J. X., Gao, Y. L., Zhu, R., Yuan, S. S. (2019). LncRNA–disease associations prediction using bipartite local model with nearest profile-based association inferring. IEEE J. BioMed. Health. doi: 10.1109/JBHI.2019.2937827
Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., et al. (2012). Landscape of transcription in human cells. Nature 489, 101–108. doi: 10.1038/nature11233
Feng, Y. M., Chen, S., Xu, J. R., Zhu, Q., Ye, X. L., Ding, D. F., et al. (2018). Dysregulation of lncRNAs GM5524 and GM15645 involved in high-glucose-induced podocyte apoptosis and autophagy in diabetic nephropathy. Mol. Med. Rep. 18, 3657–3664. doi: 10.3892/mmr.2018.9412
Fu, G. Y., Wang, J., Domeniconi, C., Yu, G. X. (2018). Matrix factorization-based data fusion for the prediction of lncRNA–disease associations. Bioinformatics 34, 1529–1537. doi: 10.1093/bioinformatics/btx794
Gao, M. M., Cui, Z., Gao, Y. L., Liu, J. X., Zheng, C. H. (2019a). Dual-network sparse graph regularized matrix factorization for predicting miRNA–disease associations. Mol. Omics 15, 130–137. doi: 10.1039/C8MO00244D
Gao, Y. L., Cui, Z., Liu, J. X., Wang, J., Zheng, C. H. (2019b). NPCMF: Nearest Profile-based Collaborative Matrix Factorization method for predicting miRNA–disease associations. BMC Bioinf. 20, 353. doi: 10.1186/s12859-019-2956-5
Gong, Y., Niu, Y., Zhang, W., Li, X. (2019). A network embedding-based multiple information integration method for the MiRNA–disease association prediction. BMC Bioinf. 20, 468. doi: 10.1186/s12859-019-3063-3
Guo, Z. H., You, Z. H., Wang, Y. B., Yi, H. C., Chen, Z. H. (2019). A learning-based method for lncRNA–disease association identification combing similarity information and rotation forest. iScience 19, 786–795. doi: 10.1016/j.isci.2019.08.030
Han, Y., Zhu, L., Cheng, Z., Li, J., Liu, X. (2018). Discrete optimal graph clustering. IEEE Trans. Cybern. doi: 10.1109/TCYB.2018.2881539
Hu, J., Gao, Y., Li, J., Zheng, Y., Wang, J., Shang, X. (2019). A novel algorithm based on bi-random walks to identify disease-related lncRNAs. BMC Bioinf. 20, 569. doi: 10.1186/s12859-019-3128-3
Huang, J., Nie, F. P., Huang, H. (2015). “A new simplex sparse learning model to measure data similarity for clustering,” in The twenty-fourth International Conference on Aritificial Intelligence(Buenos Aires, Argentina), 3569–3575.
Jeong, Y. Y., Kang, H. K., Chung, T. W., Seo, J. J., Park, J. G. (2003). Uterine cervical carcinoma after therapy: CT and MR imaging findings. Radiographics 23, 969–981. discussion 981. doi: 10.1148/rg.234035001
Lan, W., Li, M., Zhao, K., Liu, J., Wu, F. X., Pan, Y., et al. (2017). LDAP: a web server for lncRNA–disease association prediction. Bioinformatics 33, 458–460. doi: 10.1093/bioinformatics/btw639
Li, L. J., Chang, H. Y. (2014). Physiological roles of long noncoding RNAs: insight from knockout mice. Trends Cell Biol. 24, 594–602. doi: 10.1016/j.tcb.2014.06.003
Li, J., Han, L., Roebuck, P., Diao, L., Liu, L., Yuan, Y., et al. (2015). TANRIC: an interactive open platform to explore the function of lncRNAs in cancer. Cancer Res. 75, 3728–3737. doi: 10.1158/0008-5472.CAN-15-0273
Li, G. H., Luo, J. W., Liang, C., Xiao, Q., Ding, P. J., Zhang, Y. J. (2019). Prediction of lncRNA–disease associations based on network consistency projection. IEEE Access 7, 58849–58856. doi: 10.1109/ACCESS.2019.2914533
Liang, C., Yu, S., Luo, J. (2019). Adaptive multi-view multi-label learning for identifying disease-associated candidate miRNAs. PloS Comput. Biol. 15, e1006931. doi: 10.1371/journal.pcbi.1006931
Liu, M. X., Chen, X., Chen, G., Cui, Q. H., Yan, G. Y. (2014). A computational framework to infer human disease-associated long noncoding RNAs. PloS One 9, e84408. doi: 10.1371/journal.pone.0084408
Liu, H., Xu, B., Lu, D. J., Zhang, G. J. (2018a). A path planning approach for crowd evacuation in buildings based on improved artificial bee colony algorithm. Appl. Soft Comput. 68, 360–376. doi: 10.1016/j.asoc.2018.04.015
Liu, R., Wang, H., Yu, X. (2018b). Shared-nearest-neighbor-based clustering by fast search and find of density peaksR. Inform Sci. 450, 200–225. doi: 10.1016/j.ins.2018.03.031
Lu, C. Q., Yang, M. Y., Luo, F., Wu, F. X., Li, M., Pan, Y., et al. (2018). Prediction of lncRNA–disease associations based on inductive matrix completion. Bioinformatics 34, 3357–3364. doi: 10.1093/bioinformatics/bty327
Mongelli, A., Martelli, F., Farsetti, A., Gaetano, C. (2019). The dark that matters: long non-coding RNAs as master regulators of cellular metabolism in non-communicable diseases. Front. In Physiol. 10, 369. doi: 10.3389/Fphys.2019.00369
Ning, S., Zhang, J., Wang, P., Zhi, H., Wang, J., Liu, Y., et al. (2016). Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 44, D980–D985. doi: 10.1093/nar/gkv1094
Pan, Z., Zhang, H., Liang, C., Li, G., Xiao, Q., Ding, P., et al. (2019). Self-weighted multi-kernel multi-label learning for potential miRNA–disease association prediction. Mol. Ther. Nucleic Acids 17, 414–423. doi: 10.1016/j.omtn.2019.06.014
Rinn, J. L., Chang, H. Y. (2012). Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145–166. doi: 10.1146/annurev-biochem-051410-092902
Shi, D., Zhu, L., Cheng, Z., Li, Z., Zhang, H. (2018). Unsupervised multi-view feature extraction with dynamic graph learning. J. Visual Commun. Image Representation 56, 256–264. doi: 10.1016/j.jvcir.2018.09.019
Wang, D., Wang, J., Lu, M., Song, F., Cui, Q. (2010). Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics 26, 1644–1650. doi: 10.1093/bioinformatics/btq241
Wang, Y., Chen, L., Chen, B., Li, X., Kang, J., Fan, K., et al. (2013). Mammalian ncRNA-disease repository: a global view of ncRNA-mediated disease network. Cell Death Dis. 4, e765. doi: 10.1038/cddis.2013.292
Wang, Q., Chen, M., Nie, F., Li, X. (2020). Detecting coherent groups in crowd scenes by multiview clustering. IEEE Trans. Pattern Anal. Mach. Intell. 42, 46–58. doi: 10.1109/TPAMI.2018.2875002
Xiao, X., Zhu, W., Liao, B., Xu, J., Gu, C., Ji, B., et al. (2018). BPLLDA: predicting lncRNA–disease associations based on simple paths with limited lengths in a heterogeneous network. Front. In Genet. 9, 411. doi: 10.3389/fgene.2018.00411
Xie, G., Meng, T., Luo, Y., Liu, Z. (2019). SKF-LDA: similarity kernel fusion for predicting lncRNA–disease association. Mol. Ther. Nucleic Acids 18, 45–55. doi: 10.1016/j.omtn.2019.07.022
Yin, M. M., Cui, Z., Gao, M. M., Liu, J. X., Gao, Y. L. (2019). LWPCMF: logistic weighted profile-based collaborative matrix factorization for predicting miRNA–disease associations. IEEE/ACM Trans. Comput. Biol. Bioinform. doi: 10.1109/TCBB.2019.2937774
Yu, J., Xuan, Z., Feng, X., Zou, Q., Wang, L. (2019). A novel collaborative filtering model for lncRNA–disease association prediction based on the naive Bayesian classifier. BMC Bioinf. 20, 396. doi: 10.1186/s12859-019-2985-0
Yue, X., Wang, Z., Huang, J., Parthasarathy, S., Moosavinasab, S., Huang, Y., et al. (2019). Graph embedding on biomedical networks: methods, applications, and evaluations. Bioinformatics. doi: 10.1093/bioinformatics/btz718
Zha, Z. J., Mei, T., Wang, J., Wang, Z., Hua, X. S. (2009). Graph-based semi-supervised learning with multiple labels. J. Vis. Commun. Image Represent. 20, 97–103. doi: 10.1016/j.jvcir.2008.11.009
Zhang, W., Zhu, X., Fu, Y., Tsuji, J., Weng, Z. (2017). Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods. BMC Bioinf. 18, 464. doi: 10.1186/s12859-017-1875-6
Zhang, B., Lei, Z., Sun, J., Zhang, H. (2018a). Cross-media retrieval with collective deep semantic learning. Multimedia Tools Appl. 7, 1–20.
Zhang, W., Qu, Q. L., Zhang, Y. Q., Wang, W. (2018b). The linear neighborhood propagation method for predicting long non-coding RNA–protein interactions. Neurocomputing 273, 526–534. doi: 10.1016/j.neucom.2017.07.065
Zhang, W., Yue, X., Tang, G., Wu, W., Huang, F., Zhang, X. (2018c). SFPEL-LPI: sequence-based feature projection ensemble learning for predicting lncRNA–protein interactions. PloS Comput. Biol. 14, e1006616. doi: 10.1371/journal.pcbi.1006616
Zhang, W., Jing, K., Huang, F., Chen, Y., Li, B., Li, J., et al. (2019a). SFLLN: a sparse feature learning ensemble method with linear neighborhood regularization for predicting drug–drug interactions. Inform Sci. 497, 189–201. doi: 10.1016/j.ins.2019.05.017
Zhang, W., Li, Z. S., Guo, W. Z., Yang, W. T., Huang, F. (2019b). A fast linear neighborhood similarity-based network link inference method to predict microRNA–disease associations. IEEE/ACM Trans. Comput. Biol. Bioinform. doi: 10.1109/TCBB.2019.2931546
Zhou, M., Wang, X. J., Li, J. W., Hao, D. P., Wang, Z. Z., Shi, H. B., et al. (2015). Prioritizing candidate disease-related long non-coding RNAs by walking on the heterogeneous lncRNA and disease network, Mol. Biosyst 11, 760–769. doi: 10.1039/C4MB00511B
Zhu, L., Huang, Z., Li, Z., Xie, L., Shen, H. T. (2018). Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Trans. Neural Netw. Learn. Syst. 29, 5264–5276. doi: 10.1109/TNNLS.2018.2797248
Keywords: lncRNA–disease association, multiple similarity matrices, consensus graph learning, multi-label learning, survival analysis
Citation: Tan H, Sun Q, Li G, Xiao Q, Ding P, Luo J and Liang C (2020) Multiview Consensus Graph Learning for lncRNA–Disease Association Prediction. Front. Genet. 11:89. doi: 10.3389/fgene.2020.00089
Received: 15 December 2019; Accepted: 27 January 2020;
Published: 21 February 2020.
Edited by:
Wen Zhang, Huazhong Agricultural University, ChinaReviewed by:
Wei Lan, Guangxi University, ChinaYang Li, The Ohio State University, United States
Jin-Xing Liu, Qufu Normal University, China
Copyright © 2020 Tan, Sun, Li, Xiao, Ding, Luo and Liang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Cheng Liang, YWxjczQxN0BzZG51LmVkdS5jbg==