Skip to main content

METHODS article

Front. Genet., 10 December 2018
Sec. Computational Genomics
This article is part of the Research Topic Non-coding RNAs and Genomic Variants in Human Diseases: Approaches, Resources and Progresses View all 5 articles

MDA-SKF: Similarity Kernel Fusion for Accurately Discovering miRNA-Disease Association

\r\nLimin JiangLimin Jiang1Yijie DingYijie Ding2Jijun Tang,*Jijun Tang1,3*Fei Guo*Fei Guo1*
  • 1School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
  • 2School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
  • 3Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States

Identifying accurate associations between miRNAs and diseases is beneficial for diagnosis and treatment of human diseases. It is especially important to develop an efficient method to detect the association between miRNA and disease. Traditional experimental method has high precision, but its process is complicated and time-consuming. Various computational methods have been developed to uncover potential associations based on an assumption that similar miRNAs are always related to similar diseases. In this paper, we propose an accurate method, MDA-SKF, to uncover potential miRNA-disease associations. We first extract three miRNA similarity kernels (miRNA functional similarity, miRNA sequence similarity, Hamming profile similarity for miRNA) and three disease similarity kernels (disease semantic similarity, disease functional similarity, Hamming profile similarity for disease) in two subspaces, respectively. Then, due to limitations that some initial information may be lost in the process and some noises may be exist in integrated similarity kernel, we propose a novel Similarity Kernel Fusion (SKF) method to integrate multiple similarity kernels. Finally, we utilize the Laplacian Regularized Least Squares (LapRLS) method on the integrated kernel to find potential associations. MDA-SKF is evaluated by three evaluation methods, including global leave-one-out cross validation (LOOCV) and local LOOCV and 5-fold cross validation (CV), and achieves AUCs of 0.9576, 0.8356, and 0.9557, respectively. Compared with existing seven methods, MDA-SKF has outstanding performance on global LOOCV and 5-fold. We also test case studies to further analyze the performance of MDA-SKF on 32 diseases. Furthermore, 3200 candidate associations are obtained and a majority of them can be confirmed. It demonstrates that MDA-SKF is an accurate and efficient computational tool for guiding traditional experiments.

1. Introduction

MicroRNAs (miRNAs) are a set of small non-coding RNAs (about 20−25 nucleotides) that can normally function as negative regulators of target messenger RNA (mRNA) expression in the process of post-transcription (Jiang et al., 2010b). They restrain target mRNA via base pairing, and influence gene translation. And, it has been verified that miRNA also function as positive regulators (Lu et al., 2008). In recent years, some existing works demonstrate that miRNAs are involved in many significant biologic processes, including cell differentiation, development, proliferation, and signal transduction (Carthew and Sontheimer, 2009). In addition, some previous studies prove that miRNAs are related to various diseases, including cancers (Iorio et al., 2005), Alzheimer (Cogswell et al., 2008), Diabetes (Caporali et al., 2011), and Lymphoma (Roehle et al., 2008). For example, the expression level of hsa-mir-21 is related to more than 125 diseases (Li et al., 2014). Therefore, identifying more associations between miRNAs and diseases is beneficial for diagnosis and treatment of human complex diseases.

Traditional experimental method has high precision for discovering potential associations, but its process is complicated and time-consuming. It is especially important to develop an efficient and convenient method to detect the association between miRNA and disease. Up to now, massive associations are obtained via traditional experiments and stored in some public database. The dbDEMC (Yang et al., 2010) collects 20037 associations including 2,224 miRNAs and 36 cancer types. The HMDD (Li et al., 2014) stores 10,368 miRNA-disease associations including 572 miRNAs and 378 diseases. The miR2Disease (Jiang et al., 2009) stores 3,273 miRNA-disease associations including 349 miRNAs and 163 diseases. Based on known associations, various computational methods have been developed to uncover potential associations.

In the past few years, computational methods achieve outstanding performance for discovering the novel associations between miRNAs and diseases (Lan et al., 2016; Zeng et al., 2016b; Zou et al., 2016; Chen et al., 2017a; Li et al., 2017b). Most of existing computational methods are based on an assumption that miRNAs with high similarity tend to be related with same diseases and vice versa (Liu et al., 2016). The method proposed by Jiang et al. (2010a) uses a discrete hyper-geometric probability distribution to calculate the strength of miRNA-disease associations. The HDMP (Xuan et al., 2013) calculates the miRNAs functional similarity that be assigned different weights on the basis of miRNA family and cluster. Then, all the unlabeled miRNAs are ranked by their final scores. The RWRMDA (Chen et al., 2012) uses miRNAs functional similarity network and the model of Random Walk to calculate the probability of candidate miRNAs for a special disease. The MIDP (Xuan et al., 2015) employs an improved Random Walk to set scores for candidate miRNAs, so the miRNA with larger score has higher possibility associated with the special disease.

Above methods have significant performances at the aspect of finding novel associations, but can not work for a new disease without known related miRNAs. The WBSMDA (Chen et al., 2016) uses miRNA functional similarity matrix and disease semantic similarity matrix and Gaussian interaction profile kernel similarity matrix to reconstruct miRNA and disease similarity matrix. Then, an probability value for the miRNA-disease association can be calculate by using Within-Scores and Between-Scores. The WBSMDA solves the limitation of previous computational models, that is to say, it could work for diseases without any known related miRNAs and miRNAs without any known associated diseases. The NCPMDA (Gu et al., 2016) reconstructs miRNA similarity matrix by using miRNA functional similarities, miRNA family information and known associations, and constructs disease similarity matrix by integrating disease semantic similarity matrix and known associations. Then, the network consistency projection is employed to calculate final score of miRNA-disease pair. This method gets outstanding performance when handling a disease without any known related miRNAs.

Recently, machine learning algorithms are popular methods for identifying miRNA-disease associations (Luo and Xiao, 2017; Xiao et al., 2017; Luo et al., 2018). RLSMDA (Chen and Yan, 2014) constructs miRNA functional similarity and disease semantic similarity in two different subspaces. Then, two cost functions are constructed by Regularized Least Squares respectively. Finally, all predicted associations between two subspaces are combined to denote as the final results. This method has excellent performance at the aspect of uncovering potential associations between miRNAs and diseases. The PBMDA (You et al., 2017) uses miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity and known associations to construct a heterogeneous graph. A specific depth-first search algorithm is employed to traverse all pathes in the graph. Finally, the miRNA-disease score can be obtained to represent association probability. The LRSSLMDA (Chen and Huang, 2017) extracts miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity, and applies the Laplacian Regularized Sparse Subspace Learning to discover potential associations between miRNAs and diseases. The method proposed by Zeng et al. (2018) constructs a bilayer network by integrating miRNA and disease similarity networks and adjacency network. Then, this bilayer network and structural perturbation method (SPM) are employed to uncover potential associations.

Although all the mentioned methods have achieved outstanding performance for uncovering potential associations, most of them have suffered from different limitations or restrictions (Chen et al., 2017c; Peng et al., 2018). For example, how better to integrate these multiple kernels when extracting various similarity kernels for miRNAs and diseases. Most of models employ the linear weighting method to integrate multiple kernels into one kernel (Chen et al., 2017b; Lan et al., 2017). We believe that some information may be lost in the process and noises may exist in the final similarity kernel for Similarity Network Fusion (SNF) (Wang et al., 2014). Therefore, we propose the method of Similarity Kernel Fusion (SKF) in this paper. We retain the initial information of each kernel when integrating multiple kernels, and use a weight matrix to eliminate noises in the integrated similarity kernel.

In this paper, we introduce the method of MDA-SKF to uncovering potential associations between miRNAs and diseases. First, we construct similarity kernels from two subspaces, including miRNA subspace and disease subspace. In miRNA subspace, we extract miRNA functional similarity kernel and miRNA sequence similarity kernel. And we first propose miRNA Hamming profile similarity kernel using the miRNA-disease associations. These similarity kernels are used to represent miRNA similarity. In disease subspace, we extract disease semantic similarity kernel and disease functional similarity kernel. And we first propose disease Hamming profile similarity kernel by using disease-miRNA associations. These similarity kernels are employed to represent disease similarity. Second, we respectively integrate three kernels into one kernel by using SKF in each subspace. Then, we use the Laplacian Regularized Least Squares (LapRLS) (Xia et al., 2010) and integrated kernel to uncover potential associations in two subspaces. Finally, we average two predicted association matrices as the final predicted associations.

Three evaluation methods are used to verify the performance of MDA-SKF, including global Leave-One-Out Cross Validation (global LOOCV), local Leave-One-Out Cross Validation (local LOOCV), and 5-fold cross validation (5-fold CV). Compared with existing seven methods, MDA-SKF has the outstanding performance for uncovering potential miRNA-disease associations. For further verification, we use global validation and local validation to analyze 32 diseases associations. The experimental results show that our method have reliable performance on detecting novel associations. Meanwhile, we find that some special associations and corresponding miRNAs require more attention. These associations can be used to guide the traditional experience.

2. Materials and Methods

In this paper, we respectively establish three miRNA similarity kernels and three disease similarity kernels to predict association between miRNA and disease. Firstly, we integrate these kernels into one miRNA kernel and one disease kernel using the method of Similarity Kernel Fusion (SKF). Then, we employ Laplacian Regularized Least Squares on the integrated kernels to uncover potential association. Finally, we combine two predicted adjacency matrices from miRNA and disease subspaces to analyze potential associations. The flow chart of SKFMDA is shown in Figure 1.

FIGURE 1
www.frontiersin.org

Figure 1. The flowchart of MDA-SKF for uncovering miRNA-disease associations.

2.1. Human miRNA-Disease Association Dataset

We get 5,430 miRNA-disease associations including 495 miRNAs and 383 diseases, which are downloaded from HMDD (Li et al., 2014) database. The set of miRNAs is denoted by M={mi}i=1p and the set of diseases is denoted by D={dj}j=1q. The association matrix is represented by YRp×q, where Y(i, j)∈{0, 1}. When the miRNA mi is association with the disease dj, Y(i, j) is set to 1; otherwise, Y(i, j) is set to 0.

2.2. Similarity Kernels for Diseases and miRNAs

Our method is based on the assumption that miRNAs with high similarity apt to be related with the same diseases and diseases with high similarity apt to be related with the same miRNA. Therefore, we respectively establish three miRNA similarity kernels and three disease similarity kernels to uncover potential association between miRNA and disease.

2.2.1. Disease Semantic Similarity

In the MeSH (Lowe and Barnett, 1994) database, the disease di can be marked as a node in Directed Acyclic Graph (DAG). We denote a subnetwork as Gdi = (di, Tdi, Edi), where Tdi is the set of all ancestor nodes of di including itself and Edi is the set of corresponding links. A semantic score of each disease can be calculated by Equation (1) (Wang et al., 2010).

Ddi(t)={1ift=dimax{Δ*Ddi(t)|t children oft}iftdi    (1)

where the disease tTdi; Δ is the semantic contribution factor and Δ = 0.5.

Also, we denote the semantic score of the disease di by Equation (2).

DV(di)=tTdiDdi(t)    (2)

Then, we calculate the disease semantic similarity value between di and dj by Equation (3).

Kd,1(di,dj)=tTdiTdj(Ddi(t)+Ddj(t))DV(di)+DV(dj)    (3)

Finally, we obtain the disease semantic similarity Kd,1Rq×q.

2.2.2. Disease Functional Similarity

In the previous works (Luo et al., 2017), the associations between diseases and genes are used to calculate disease functional similarity. We download the Log Likehood Score (LLS) that is the probability of a functional linkage between genes in the HumanNet (Lee et al., 2011) database. We normalize the LLS by Equation (4).

LLS*(gk,gs)=LLS(gk,gs)-LLSminLLSmax-LLSmin    (4)

where LLS(gk, gs) is the LLS between k-th and s-th genes; LLS*(gk,gs) is the normalized LLS score; LLSmin and LLSmax represent the minimum and maximum LLS scores in HumanNet, respectively.

We define the functional similarity score between genes by Equation (5).

FS(gk,gs)={1                               ifk=sLLS*(gk,gs)ifkse(k,s)SHumanNET0ifkse(k,s)SHumanNET    (5)

where SHumanNET is the set of all links between genes in the HumanNet database; e(k, s) is the link between k-th and s-th genes.

Then, we define the functional similarity score between a gene g and a set of genes G as Equation (6).

FG(g)=maxgsGFS(g,gs)    (6)

The associations between diseases and genes are downloaded from SIDD (Liang et al., 2013). We define the functional similarity score between diseases by Equation (7).

Kd,2(di,dj)=gkGjFGi(gk)+gsGiFGj(gs)|Gi|+|Gj|    (7)

where gkGj and gsGi; Gi and Gj represent sets of genes which are related to diseases di and dj, respectively.

Finally, we obtain the disease functional similarity Kd,2Rq×q.

2.2.3. MiRNA Functional Similarity

We construct miRNA functional similarity kernel Km,1Rp×p, according to MISIM (Wang et al., 2010) proposed by Wang et al. This method used the disease semantic similarity and the known associations between miRNAs and diseases to structure miRNA functional similarity kernel. Here, Km, 1(mi, mj) is the functional similarity score between miRNAs mi and mj.

2.2.4. MiRNA Sequence Similarity

We obtain 495 miRNA sequences from miRBase database(Kozomara and Griffithsjones, 2014), and calculate sequence similarity of miRNAs by using the Needleman–Wunsch Algorithm. Then, we obtain miRNA sequence similarity kernel Km,2Rp×p, where Km, 2(mi, mj) is the sequence similarity score between miRNAs mi and mj.

2.2.5. Hamming Profile Similarity

The assumption that similar diseases are always related to similar miRNAs, is employed to uncover miRNA-disease associations. For a pair of vectors whose lengths are same, Hamming profile is the number of elements of which corresponding values are different. Higher Hamming profile value indicates lower similarity for two vectors. Therefore, we use Hamming profile and the topologic information of all known associations to measure disease similarity. Here, Hamming profile similarity kernel for diseases is defined as Equation (8).

Kd,3(di,dj)=1-|IP(di)!=IP(dj)||IP(di)|    (8)

where Kd,3Rq×q is the Hamming profile similarity for diseases; IP(di){0,1}p×1 is the i-th column of the association matrix Y.

Similarly, we calculate Hamming profile similarity kernel for miRNAs as Equation (9).

Km,3(mi,mj)=1-|IP(mi)!=IP(mj)||IP(mi)|    (9)

where Km,3Rp×p is the Hamming profile similarity for miRNAs; IP(mi){0,1}1×q denotes the i-th row of the associations matrix Y.

2.3. Similarity Kernel Fusion

We extract three miRNA similarity kernels (miRNA functional similarity, miRNA sequence similarity, Hamming profile similarity for miRNA) and three disease similarity kernels (disease semantic similarity, disease functional similarity, Hamming profile similarity for disease) in the above section.

In the following, we use similarity kernel fusion (SKF) to integrate three miRNA similarity kernels Km, l, l = 1, 2, 3. Therefore, we get the integrated similarity kernel Km*Rp×p.

Firstly, we normalize each original kernel by Equation (10).

Pm,l(mi,mj)=Km,l(mi,mj)mkMKm,l(mk,mj)    (10)

where Pm, l represents a normalized kernel and satisfies mkMPm,l(mk,mj)=1.

Secondly, we construct a sparse kernel for each original kernel by Equation (11).

Sm,l(mi,mj)={0 ifmjNiKm,l(mi,mj)mkNiKm,l(mi,mk) ifmjNi    (11)

where Sm, l represents a sparse kernel and satisfies mjMSm,l(mi,mj)=1; Ni represents a set of all neighbors of mi including itself.

Thirdly, we integrate three miRNA kernels by Equation (12).

Pm,lt+1=α(Sm,l×rlPm,rt2×Sm,lT)+(1-α)(rlPm,r02)    (12)

where Pm,r0 represents the initial status of Pm, r; Pm,lt+1 is the status of l-th kernel after t+1 iterations; α∈(0, 1).

After t+1 iterations, the overall kernel can be computed as Equation (13).

Km=13l=13Pm,lt+1    (13)

Finally, a weight matrix is established to further eliminate noise in the overall kernel as Equation (14).

wm(mi,mj)={1 ifmjNimiNj0 ifmjNimiNj0.5               otherwise     (14)

The integrated miRNA similarity kernel can be obtained as Equation (15).

Km*=wmKm    (15)

Similarity, we calculate the integrated disease similarity kernel as Kd*Rq×q.

2.4. Laplacian Regularized Least Squares

In this paper, we use Laplacian Regularized Least Squares (LapRLS) to uncover potential miRNA-disease associations. For the miRNA subspace, The objective function of LapRLS is defined as Equation (16).

minFm ||Y-Fm||F2+βm||FmTLmFm||F2    (16)

where Y is the known association matrix; βm is the regularization coefficient of LapRLS. FmRp×q represents the predicted association matrix in the miRNA subspace; Lm=Dm-12(Dm-Km*)Dm-12, in which Dm is a diagonal matrix whose diagonal element is the sum of the row elements of Km*.

The derivation of optimization algorithm were presented in Xia et al. (2010). We calculate the predicted association matrix FmRp×q in the miRNA subspace as Equation (17).

Fm=Km*(Km*+βmLmKm*)-1Y    (17)

Similarity, we can calculate the predicted association matrix FdRq×p in the disease subspace as Equation (18).

Fd=Kd*(Kd*+βdLdKd*)-1YT    (18)

The predicted matrices in miRNA and disease subspaces are Fm and Fd, respectively. Then, we define the final predicted association matrix as Equation (19).

F*=Fm+FdT2    (19)

where F*∈Rp×q.

3. Results

In this section, we analyze the performance of MDA-SKF from many aspects. First, we introduce three evaluation methods (global LOOCV, local LOOCV, and 5-fold CV) and two validation methods (global verification and local verification) to analyze the performance of MDA-SKF. Second, we discuss about the convergence and the parameter selection of SKF. Third, we compare the performance of SKF with SNF and average kernel. Fourth, we compare the performance of MDA-SKF with other excellent methods for uncovering potential associations between miRNAs and diseases. Fifth, we use case studies to further evaluate the reliability of MDA-SKF.

3.1. Evaluation Criteria and Verification Methods

In this paper, we use two evaluation criteria including Area Under the Curve (AUC) and Area Under the Precision-Recall curve (AUPR) to evaluate the performance of models. AUC is the area under the receiver operating characteristic (ROC) curve, which is created by plotting true positive rate against false positive rate at various threshold settings. AUPR is the area under the curve that is created by plotting precision against recall at various threshold settings.

In the process of experiments, global LOOCV, local LOOCV, and 5-fold CV are applied to evaluate the model's performance. In the global LOOCV, one of 5,430 known associations is left out in turn as the test set, and other associations are remained as the training set. In the local LOOCV, the known associations between a special disease and all miRNAs are left out as the test set, and other associations are regarded as training set. In the 5-fold, all known associations are randomly divided into five non-overlapping sets. each set is employed in turn to as test set and other sets are employed to as training set. In the process of experiments, the known associations in test set are reset to unknown, that is to say, some 1 are replaced by 0 in the association matrix Y.

Massive associations between miRNAs and diseases are obtained via the traditional experiment and stored in several databases, which provide a good condition for evaluating the performance of MDA-SKF. We use two methods including global validation and local validation to further analyze the reliability of MDA-SKF. In the global validation, we regard 5,430 known associations as training set that is used to uncover potential associations. These candidate associations are confirmed by the miR2Disease and dbDEMC databases. In the local validation, all known associations that are related to a special disease are reset to unknown ones. We use the rest of association as training set to uncover potential associations for this special disease. These candidate associations are confirmed by the HMDD, miR2Disease, and dbDEMC databases.

3.2. Convergence Performance

Since the convergence is very important for an iterative algorithm, we analyze the number of iterations of SKF. We define the relative error as Et=||Pt+1-Pt||||Pt|| in the process of iterations. We turn the number of iterations from 1 to 30 with step 1 to calculate the E after each iteration. The convergence processes of three miRNA kernels and three disease kernels are calculated in our experiments and the results of E are shown in Figure 2. It can be clearly seen that the process of convergence is very fast and the value of E achieves to 10−7 after 5 iterations. This phenomenon demonstrates that SKF model have excellent convergence performance in the process of integrating multiple kernels. In this paper, we set the number of iterations as 10 to ensure that it is enough to converge.

FIGURE 2
www.frontiersin.org

Figure 2. The relative errors of SKF model with different number of iterations.

3.3. Parameter Selection

In this section, we discuss about the parameter selection of SKF. There are two parameters α and the size of neighbors denoted as k. For selecting parameter α ,we use 5-fold CV and local LOOCV to analyze the values of α. We take α from 0.1 to 1 with step 0.1 in order to calculate AUC, shown in Figure 3. It can be found that AUC keep little fluctuation in the range between 0.1 and 0.9. As we can see, the value of AUC decreases by at least 0.1 when α = 1 (removing the original kernel information). It demonstrates that retaining the original information of each kernel is significant for integrating multiple kernels. In this paper, the value of α is set to 0.1.

FIGURE 3
www.frontiersin.org

Figure 3. The AUC values of SKF model with different values of α.

Meanwhile, the number of neighbors is an important parameter in this paper. It is related to the amount of important information and the noise reduction. In the 5-fold, k is taken from 30 to 100 with step 3 to find the optimal value. In the local LOOCV, the k is gradually varying from 30 to 350 with step 3 to find the best value. In Figure 4, we select the optimal k by the highest AUC value, and find that 36 and 192 are the best parameters of k for 5-fold and local LOOCV, respectively. Since both global LOOCV and 5-fold are similar, k is set to 36 in the global LOOCV. It's obvious that the value of k in the local LOOCV is bigger than that in the 5-fold. In the local LOOCV, our method produces the novel disease without known miRNA-based associations, so needs much more information about miRNA and disease similarity kernels.

FIGURE 4
www.frontiersin.org

Figure 4. The AUC values of SKF model with different numbers of neighbors.

The regularization coefficients of LapRLS, βm and βd, are closely related to the performance of LapRLS. We make βm equal to βd in this paper. To get obtain the optimal β, we take β from 2−20 to 210 and use 5-fold CV and local LOOCV to analyze the performance of LapRLS with different values of β. The results are shown in Figure 5. As seen in Figure 5, the AUC decreases when β increases from 20 to 210 and keeps slight change when β less than 2−3 and 20 for 5-fold CV and local LOOCV, respectively. In the 5-fold CV, the best AUC is 0.9553 when β are 2−5. In the local LOOCV, the best AUC is 0.8356 when β is 2−1. Therefore, we select the optimal β as 2−5 and 2−1 for 5-fold CV and local LOOCV, respectively.

FIGURE 5
www.frontiersin.org

Figure 5. The AUC values of SKF model with different values of β.

3.4. Comparison With Other Fusion Strategies

In this section, we compare the performance of Similarity Kernel Fusion (SKF) with Similarity Network Fusion (SNF) and average kernel fusion (AVG). The results demonstrate that SKF have significant performance in integrating multiple kernels. We use 5-fold CV to evaluate the performance of three fusion strategies. The results are shown in Figure 6. It can be observed that the best AUC of 0.9520 and the best AUPR of 0.5689 are obtained by SKF. Comparing with SNF, SKF achieves AUC improvement of 0.037 (0.9520 over 0.9150) and AUPR improvement of 0.2247 (0.5689 over 0.3442). Comparing with AVG, SKF achieves AUC improvement of 0.0268 (0.9520 over 0.9252) and AUPR improvement of 0.1458 (0.5689 over 0.4231). It shows that SKF is more excellent than SNF at the aspect of uncovering associations between miRNAs and diseases.

FIGURE 6
www.frontiersin.org

Figure 6. The AUC and AUPR of three fusion strategies in the 5-fold CV.

3.5. Comparison With Other Existing Methods

In this section, we compare the prediction performance of MDA-SKF with other seven existing methods [i.e., PBMDA (You et al., 2017), MCMDA (Li et al., 2017a), NCPMDA (Gu et al., 2016), WBSMDA (Chen et al., 2016), HDMP (Xuan et al., 2013), RLSMDA (Chen and Yan, 2014), and LRSSLMDA (Chen and Huang, 2017)] in global LOOCV, local LOOCV and 5-fold CV. Because other existing methods employ 5-fold CV in their paper, we choose 5-fold CV rather than 5-fold CV in this section. In Table 1, MDA-SKF obtains the highest AUCs in 5-fold CV (0.9501) and global LOOCV (0.9536), but NCPMDA obtains the best AUC (0.8584) in local LOOCV. Comparing with other existing methods, MDA-SKF achieves AUC improvement of at least 0.0358 and 0.0316 in global LOOCV and 5-fold CV, respectively.

TABLE 1
www.frontiersin.org

Table 1. The comparison results between SKFMDA and other seven computational methods.

3.6. Case Studies

In this section, we employ global validation and local validation on multiple important human diseases to further evaluate the reliability of MDA-SKF. To evaluate the performance of MDA-SKF, we select 32 diseases associated with more miRNAs. In the global validation, 5,430 associations are used to uncover potential associations. In the local validation, for a special disease, all known associations related to this special disease are reset as unknown associations. Then, other known associations are implemented to uncover potential associations. We extract top 50 candidate associations for each special disease. All predicted candidate associations are found in Supplementary Table 1. The statistical results are shown in Table 2. GV and LV are the numbers of confirmed associations in the top 50 by using global validation and local validation, respectively. P1 and P3 are the proportion of confirmed associations in the top 50 by using global validation and local validation, respectively. D1 is the number of miRNAs, and those miRNAs are associated with special disease and belonging to 498 miRNAs. The associations between those miRNAs and special disease can be verified from databases, like dbDEMC or miR2Disease. P2 is the proportion of D1 in the 495 miRNAs. D2 is the number of miRNAs, and those miRNAs are associated with special disease and belonging to 498 miRNAs. The associations between those miRNA and special disease can be verified from databases, like dbDEMC or miR2Disease or HMDD. P4 is the proportion of D2 in the 495 miRNAs. In Table 2, we find that P1 and P3 are significantly greater than P2 and P4 for the majority of diseases, respectively, excepting Biliary Tract Neoplasms and Skin Neoplasms. We also find that all candidate associations related with five diseases (Breast Neoplasms, Colorectal Carcinoma, Gastric Neoplasms, Pancreatic Neoplasms, and Lung Neoplasms) are confirmed for local validation. It demonstrates that MDA-SKF has excellent reliability for uncovering the associations between miRNAs and diseases.

TABLE 2
www.frontiersin.org

Table 2. The results of global validation and local validation.

To find some important miRNAs and potential associations, we analyze candidate associations relating with eight important human diseases (Breast Neoplasms, Colorectal Carcinoma, Gastric Neoplasms, Pancreatic Neoplasms, Lung Neoplasms, Colon Neoplasms, kidney neoplasms, lymphoma). Among them, six disease (Breast Neoplasms, Colorectal Carcinoma, Gastric Neoplasms, Pancreatic Neoplasms, Lung Neoplasms, Colon Neoplasms) are the top six diseases that are related to more miRNAs in the dbDEMC and miR2Disease database, and kidney neoplasms and lymphoma are used as case studies in many previous paper.

In the global validation, we gain a total of 400 candidate associations for eight diseases. The confirmed results are shown in Figure 7. In Figure 7, the red line represents unconfirmed and the green line represents confirmed. It can be find that most of candidate associations are confirmed by the miR2Disease and dbDEMC databases. It is obvious that five diseases are related to the same set of miRNAs, including hsa-let-7g, hsa-mir-1, hsa-mir-106b, hsa-mir-142, hsa-mir-15b, hsa-mir-223, and hsa-mir-29a.

FIGURE 7
www.frontiersin.org

Figure 7. The case study in the global verification. The red line represents unconfirmed; the green line represents confirmed.

In the local validation, we also gain a total of 400 candidate associations for eight diseases. The confirmed results are shown as Figure 8. In Figure 8, we find that most of 400 candidate associations are confirmed by the HMDD, miR2Disease and dbDEMC databases. It is obvious that eight diseases are related to the same set of miRNAs, including hsa-let-7a, hsa-let-7b, hsa-mir-1, and so on. It is worth noting that three associations, hsa-mir-34c and kidney neoplasms, hsa-mir-34c and lymphoma, hsa-mir-34c and colon neoplasms, are unconfirmed in the current databases. Meanwhile, hsa-mir-34c is related to other five diseases in the database. Therefore, we believe that these three novel associations have a high probability of linkage between miRNAs and diseases, and they need more attention in subsequent traditional experiments.

FIGURE 8
www.frontiersin.org

Figure 8. The case study in the local verification. The red line represents unconfirmed; the green line represents confirmed.

4. Conclusions

We propose MDA-SKF to uncover potential miRNA-disease associations in the paper. First, we extract three miRNA kernels (miRNA functional similarity, miRNA sequence similarity, miRNA Hamming profile similarity kernel) and three disease kernels (disease semantic similarity, disease functional similarity, disease Hamming profile similarity kernel) to embody the similarity of miRNAs and diseases, respectively. Then, we propose Similarity Kernel Fusion (SKF) model by using original information of each kernel and the newly designed noise-reduction methods to better integrate multiple kernels. Then, Laplacian Regularized Least Squares (LapRLS) is employed on integrated kernels to uncover potential miRNA-disease associations.

Many experiments show that compared with other seven outstanding models, MDA-SKF has better precision on the three evaluation methods (global LOOCV, local LOOCV, and 5-fold CV). In order to further evaluate the reliable of MDA-SKF, two validation methods (global validation and local validation) are used to execute case studies of 32 diseases. A large number of candidate associations are confirmed by the HMDD, dbDEMC and miR2Disease databases. In addition, three associations (hsa-mir-34c and kidney neoplasms, hsa-mir-34c and lymphoma, hsa-mir-34c and colon neoplasms) and some special miRNAs (hsa-let-7g, hsa-mir-1, hsa-mir-106b, etc) need more attention. The future work may further take more machine learning methods and more similarity kernels into account to accurately uncover associations between miRNAs and diseases. Also, similar strategy can be applied in the other link prediction problems, such as circular RNA detection (Zeng et al., 2017b), disease gene prediction (Zeng et al., 2016a, 2017a) and sequence analysis (Zou et al., 2018).

Data Availability Statement

The datasets and codes for this study can be found in the https://github.com/guofei-tju/MDA-SKF.

Author Contributions

FG, YD, and LJ conceived and designed the experiments. LJ and YD performed the experiments and analyzed the data. FG and LJ wrote the paper. FG and JT supervised the experiments and reviewed the manuscript.

Funding

This work is supported by a grant from the National Science Foundation of China (NSFC 61772362) and the Tianjin Research Program of Application Foundation and Advanced Technology (16JCQNJC00200).

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2018.00618/full#supplementary-material

References

Caporali, A., Meloni, M., Völlenkle, C., Bonci, D., Sala-Newby, G. B., Addis, R., et al. (2011). Deregulation of microRNA-503 contributes to diabetes mellitus-induced impairment of endothelial function and reparative angiogenesis after limb ischemia. Circulation 123, 282–291. doi: 10.1161/CIRCULATIONAHA.110.95232

PubMed Abstract | CrossRef Full Text

Carthew, R. W., and Sontheimer, E. J. (2009). Origins and mechanisms of miRNAs and siRNAs. Cell 136, 642–655. doi: 10.1016/j.cell.2009.01.035

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., and Huang, L. (2017). LRSSLMDA: Laplacian regularized sparse subspace learning for mirna-disease association prediction. PLoS Comput. Biol. 13:e1005912. doi: 10.1371/journal.pcbi.1005912

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Jiang, Z., Xie, D., Huang, D. S., Zhao, Q., Yan, G. Y., et al. (2017a). A novel computational model based on super-disease and miRNA for potential miRNA-disease association prediction. Mol. BioSyst. 13, 1202–1212. doi: 10.1039/C6MB00853D

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Liu, M. X., and Yan, G. Y. (2012). RWRMDA: predicting novel human microRNA-disease associations. Mol. Biosyst. 8, 2792–2798. doi: 10.1039/c2mb25180a

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Wu, Q. F., and Yan, G. Y. (2017b). RKNNMDA: Ranking-based KNN for miRNA-disease association prediction. RNA Biol. 14, 952–962. doi: 10.1080/15476286.2017.1312226

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Xie, D., Zhao, Q., and You, Z. H. (2017c). MicroRNAs and complex diseases: from experimental results to computational models. Briefings Bioinform. 18:558. doi: 10.1093/bib/bbx130

CrossRef Full Text | Google Scholar

Chen, X., Yan, C. C., Zhang, X., You, Z. H., Deng, L., Liu, Y., et al. (2016). WBSMDA: Within and between score for MiRNA-disease association prediction. Sci. Rep. 6:21106. doi: 10.1038/srep21106

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., and Yan, G. Y. (2014). Semi-supervised learning for potential human microrna-disease associations inference. Sci. Rep. 4:5501. doi: 10.1038/srep05501

PubMed Abstract | CrossRef Full Text | Google Scholar

Cogswell, J. P., Ward, J., Taylor, I. A., Waters, M., Shi, Y., Cannon, B., et al. (2008). Identification of miRNA changes in alzheimer's disease brain and CSF yields putative biomarkers and insights into disease pathways. J. Alzheimers Dis. 14, 27–41. doi: 10.3233/JAD-2008-14103

PubMed Abstract | CrossRef Full Text | Google Scholar

Gu, C., Bo, L., Li, X., and Li, K. (2016). Network consistency projection for human miRNA-disease associations inference. Sci. Rep. 6:36054. doi: 10.1038%2Fsrep36054

PubMed Abstract | CrossRef Full Text | Google Scholar

Iorio, M. V., Ferracin, M., Liu, C. G., Veronese, A., Spizzo, R., Sabbioni, S., et al. (2005). MicroRNA gene expression deregulation in human breast cancer. Cancer Res. 65, 7065–7070. doi: 10.1158/0008-5472.CAN-05-1783

PubMed Abstract | CrossRef Full Text | Google Scholar

Jiang, Q., Hao, Y., Wang, G., Juan, L., Zhang, T., Teng, M., et al. (2010a). Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst. Biol. 4(Suppl. 1):S2. doi: 10.1186%2F1752-0509-4-S1-S2

PubMed Abstract | CrossRef Full Text | Google Scholar

Jiang, Q., Hao, Y., Wang, G., Zhang, T., and Wang, Y. (2010b). “Weighted network-based inference of human microRNA-disease associations,” in Fifth International Conference on Frontier of Computer Science and Technology, FCST 2010 (Changchun), 431–435.

Google Scholar

Jiang, Q., Wang, Y., Hao, Y., Juan, L., Teng, M., Zhang, X., et al. (2009). miR2disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 37, D98–D104. doi: 10.1093/nar/gkn714

PubMed Abstract | CrossRef Full Text | Google Scholar

Kozomara, A., and Griffiths-jones, S. (2014). miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73. doi: 10.1093/nar/gkt1181

PubMed Abstract | CrossRef Full Text | Google Scholar

Lan, W., Li, M., Zhao, K., Liu, J., Wu, F. X., Pan, Y., et al. (2017). LDAP: a web server for lncRNA-disease association prediction. Bioinformatics 33, 458–460. doi: 10.1093/bioinformatics/btw639

PubMed Abstract | CrossRef Full Text | Google Scholar

Lan, W., Wang, J., Li, M., Liu, J., Wu, F. X., and Pan, Y. (2016). Predicting microRNA-disease associations based on improved microRNA and disease similarities. IEEE/ACM Trans. Comput. Biol. Bioinform. doi: 10.1109/TCBB.2016.2586190 [Epub ahead of print].

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, I., Blom, U. M., Wang, P. I., Shim, J. E., and Marcotte, E. M. (2011). Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21, 1109–1121. doi: 10.1101/gr.118992.110

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, J. Q., Rong, Z. H., Chen, X., Yan, G. Y., and You, Z. H. (2017a). MCMDA: Matrix completion for miRNA-disease association prediction. Oncotarget 8, 21187–21199. doi: 10.18632/oncotarget.15061

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, P., Peng, M., Bo, L., Huang, G., Wei, L., and Li, K. (2017b). Improved low-rank matrix recovery method for predicting miRNA-disease association. Sci. Rep. 7:6007. doi: 10.1038/s41598-017-06201-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y., Qiu, C., Tu, J., Geng, B., Yang, J., Jiang, T., et al. (2014). HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 42(Database issue), D1070–D1074. doi: 10.1093/nar/gkt1023

PubMed Abstract | CrossRef Full Text | Google Scholar

Liang, C., Wang, G., Li, J., Zhang, T., Xu, P., and Wang, Y. (2013). SIDD: a semantically integrated database towards a global view of human disease. PLoS ONE 8:e75504. doi: 10.1371/journal.pone.0075504

CrossRef Full Text | Google Scholar

Liu, Y., Zeng, X., He, Z., and Quan, Z. (2016). Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans. Comput. Biol. Bioinform. 14, 905–915. doi: 10.1109/TCBB.2016.2550432

PubMed Abstract | CrossRef Full Text | Google Scholar

Lowe, H. J., and Barnett, G. O. (1994). Understanding and using the medical subject headings (meSH) vocabulary to perform literature searches. JAMA 271, 1103–1108.

PubMed Abstract | Google Scholar

Lu, M., Zhang, Q., Deng, M., Miao, J., Guo, Y., Gao, W., et al. (2008). An analysis of human microRNA and disease associations. PLoS ONE 3:e3420. doi: 10.1371%2Fjournal.pone.0003420

PubMed Abstract | CrossRef Full Text | Google Scholar

Luo, J., Ding, P., Liang, C., and Chen, X. (2018). Semi-supervised prediction of human miRNA-disease association based on graph regularization framework in heterogeneous networks. Neurocomputing 294, 29–38. doi: 10.1016/j.neucom.2018.03.003

CrossRef Full Text | Google Scholar

Luo, J., and Xiao, Q. (2017). A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J. Biomed. Inform. 66, 194–203. doi: 10.1016/j.jbi.2017.01.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Luo, J., Xiao, Q., Liang, C., and Ding, P. (2017). Predicting microRNA-disease associations using kronecker regularized least squares based on heterogeneous omics data. IEEE Access 5, 2503–2513.

Google Scholar

Peng, J., Hui, W., Li, Q., Chen, B., Wei, Z., and Shang, X. (2018). A learning-based framework for miRNA-disease association prediction using neural networks. bioRxiv[Preprint]. 276048. doi: 10.1101/276048

CrossRef Full Text | Google Scholar

Roehle, A., Hoefig, K. P., Repsilber, D., Thorns, C., Ziepert, M., Wesche, K. O., et al. (2008). MicroRNA signatures characterize diffuse large B-cell lymphomas and follicular lymphomas. Br. J. Haematol. 142, 732–744. doi: 10.1111/j.1365-2141.2008.07237.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., et al. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337. doi: 10.1038/nmeth.2810

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, D., Wang, J., Lu, M., Song, F., and Cui, Q. (2010). Inferring the human microRNA functional similarity and functional network based on microrna-associated diseases. Bioinformatics 26, 1644–1650. doi: 10.1093/bioinformatics/btq241

PubMed Abstract | CrossRef Full Text | Google Scholar

Xia, Z., Wu, L. Y., Zhou, X., and Wong, S. T. (2010). Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst. Biol. 4(Suppl. 2):S6. doi: 10.1186/1752-0509-4-S2-S6

PubMed Abstract | CrossRef Full Text | Google Scholar

Xiao, Q., Luo, J., Liang, C., Cai, J., and Ding, P. (2017). A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics 34, 239–248. doi: 10.1093/bioinformatics/btx545

CrossRef Full Text | Google Scholar

Xuan, P., Han, K., Guo, M., Guo, Y., Li, J., Ding, J., et al. (2013). Correction: prediction of micrornas associated with human diseases based on weighted k most similar neighbors. PLoS ONE 8:e70204. doi: 10.1371/annotation/a076115e-dd8c-4da7-989d-c1174a8cd31e

PubMed Abstract | CrossRef Full Text | Google Scholar

Xuan, P., Han, K., Guo, Y., Li, J., Li, X., Zhong, Y., et al. (2015). Prediction of potential disease-associated microRNAs based on random walk. Bioinformatics 31:1805. doi: 10.1093/bioinformatics/btv039

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, Z., Ren, F., Liu, C., He, S., Sun, G., Gao, Q., et al. (2010). DBDEMC: a database of differentially expressed miRNAs in human cancers. BMC Genomics 11(Suppl. 4):S5. doi: 10.1186%2F1471-2164-11-S4-S5

PubMed Abstract | CrossRef Full Text | Google Scholar

You, Z. H., Huang, Z. A., Zhu, Z., Yan, G. Y., Li, Z. W., Wen, Z., et al. (2017). PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput. Biol. 13:e1005455. doi: 10.1371/journal.pcbi.1005455

PubMed Abstract | CrossRef Full Text | Google Scholar

Zeng, X., Ding, N., Rodríguezpatón, A., and Quan, Z. (2017a). Probability-based collaborative filtering model for predicting gene-disease associations. BMC Med. Genomics 10:76. doi: 10.1186/s12920-017-0313-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Zeng, X., Liao, Y., Liu, Y., and Zou, Q. (2016a). Prediction and validation of disease genes using hetesim scores. IEEE/ACM Trans. Comput. Biol. Bioinform. 14, 687–695. doi: 10.1109/TCBB.2016.2520947

PubMed Abstract | CrossRef Full Text | Google Scholar

Zeng, X., Lin, W., Guo, M., and Zou, Q. (2017b). A comprehensive overview and evaluation of circular RNA detection tools. PLoS Comput. Biol. 13:e1005420. doi: 10.1371/journal.pcbi.1005420

PubMed Abstract | CrossRef Full Text | Google Scholar

Zeng, X., Liu, L., Lu, L., and Zou, Q. (2018). Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics 34, 2425–2432. doi: 10.1093/bioinformatics/bty112

PubMed Abstract | CrossRef Full Text | Google Scholar

Zeng, X., Zhang, X., and Zou, Q. (2016b). Integrative approaches for predicting microrna function and prioritizing disease-related microRNA using biological interaction networks. Brief. Bioinform. 17, 193–203. doi: 10.1093/bib/bbv033

PubMed Abstract | CrossRef Full Text | Google Scholar

Zou, Q., Li, J., Song, L., Zeng, X., and Wang, G. (2016). Similarity computation strategies in the microRNA-disease network: a survey. Brief. Funct. Genomics 15:55–64. doi: 10.1093/bfgp/elv024

PubMed Abstract | CrossRef Full Text | Google Scholar

Zou, Q., Lin, G., Jiang, X., Liu, X., and Zeng, X. (2018). Sequence clustering in bioinformatics: an empirical study. Brief. Bioinform. bby090. doi: 10.1093/bib/bby090

CrossRef Full Text | Google Scholar

Keywords: Laplacian Regularized Least Squares, disease similarity, miRNA similarity, miRNA-disease association, Similarity Kernel Fusion

Citation: Jiang L, Ding Y, Tang J and Guo F (2018) MDA-SKF: Similarity Kernel Fusion for Accurately Discovering miRNA-Disease Association. Front. Genet. 9:618. doi: 10.3389/fgene.2018.00618

Received: 08 August 2018; Accepted: 23 November 2018;
Published: 10 December 2018.

Edited by:

Zhifu Sun, Mayo Clinic, United States

Reviewed by:

Xiangxiang Zeng, Xiamen University, China
Yan Guo, Vanderbilt University, United States

Copyright © 2018 Jiang, Ding, Tang and Guo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Fei Guo, fguo@tju.edu.cn
Jijun Tang, tangjijun@tju.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.