Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization

Li, Shijun; Chang, Miaomiao; Tong, Ling; Wang, Yuehua; Wang, Meng; Wang, Fang

doi:10.3389/fgene.2022.1023615

ORIGINAL RESEARCH article

Front. Genet., 20 January 2023

Sec. RNA

Volume 13 - 2022 | https://doi.org/10.3389/fgene.2022.1023615

This article is part of the Research TopicMachine Learning-Based Methods for RNA Data Analysis - Volume IIIView all 11 articles

Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization

Shijun Li*^†

Miaomiao Chang^†

Ling Tong

Yuehua Wang

Meng Wang

Fang Wang

Department of Pathology, Chifeng Municipal Hospital, Chifeng, China

Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.

1 Introduction

Breast cancer is the second leading cause of cancer-related death in women worldwide and the most common malignant tumor among US woman (Sun et al., 2017; DeSantis et al., 2019; Yang et al., 2013; Waks and Winer, 2019). During the past 25 years, breast cancer mortality rate showed a substantial increase in the world (Garrido-Castro et al., 2019). This increasing rate is one threaten to health for women in the world, in particular women from developing and low-income regions. More than 1.5 million women were diagnosed to breast cancer every year, which accounts for 25% among all women with cancers (Sun et al., 2017). In 2018, breast cancer accounts for approximately 24% of new cancer cases and approximately 15% of cancer deaths in women (Heer et al., 2020). In 2019, it is estimated that about 268,600 new patients suffer from invasive breast cancer and 48,100 patients suffer from ductal carcinoma in situ among US women. Moreover, 41,760 women may die from breast cancer in the same year (DeSantis et al., 2019). About 13% of women may suffer from invasive breast cancer in lifetime (DeSantis et al., 2019). The incident rate of breast cancer will increase by more than 46% by 2040 (Heer et al., 2020). Consequently, breast cancer has been one essential problem to be solved around the world.

However, the precise mechanisms of breast cancer remain unclear (Barzaman et al., 2020). Systemic treatment of breast cancer patients mainly consists of chemotherapy, endocrine treatment, and targeted therapy (Campos-Parra et al., 2018). In spite of rapid progress in different treatment strategies, accumulating patients show recurrence of the disease and decreased survival because of therapy resistance, which increases metastasis rates (Sledge et al., 2014). Once the metastasis occurs, the 5-year overall survival rate may be below 25% (Siegel et al., 2013).

Colorectal cancer is the third most frequent cancer and the second most death-caused cancer. It is estimated that there are about 1.9 million new cases and 0.9 million death cases worldwide in 2020 (Xi and Xu, 2021). Of new diagnose cases, 20% of patients have metastases and another 25% with localized disease may later develop metastases (Biller and Schrag, 2021). Its incidence is high in developed countries and is increasing in low- and middle-income countries, which poses a challenge to global public health (Biller and Schrag, 2021; Xi and Xu, 2021).

In this situation, it is essential to discover novel molecular biomarkers that can characterize therapy response for breast cancer and colorectal cancer. We can extend the overall survival rates of patients and delay or prevent the two cancers from metastases based on molecular biomarkers (Campos-Parra et al., 2018). Consequently, screening reliable biomarker is a research hotspot on the diagnosis and treatment of cancer including breast cancer and colorectal cancer (Huang et al., 2019; Yang et al., 2020; Peng et al., 2022a).

A substantial number of evidence suggest that over 80% of the human genome can be transcribed into non-coding RNAs, such as microRNAs (Peng et al., 2017; Peng et al., 2018; Chen et al., 2019; Huang et al., 2021), circle RNAs (Zhao et al., 2019; Lan et al., 2022), and long non-coding RNAs (lncRNAs) (Zhang et al., 2021a; Peng et al., 2021a; Peng et al., 2022b; Zhou et al., 2021a; Zhou et al., 2021b). In particular, lncRNAs obtain emerging interest as diagnostic biomarkers and therapeutic targets (Chandra Gupta and Nandan Tripathi, 2017; Guo et al., 2022). Differential expression of lncRNAs forms specific patterns to various complex diseases including cancer (Wahlestedt C, 2013). Once the regulation effects of lncRNAs are detected, they are promising therapeutic targets.

LncRNAs are closely related to breast cancer and colorectal cancer. For example, lncRNA BCRT1, MaTAR25, DSCAM-AS1, and CDC6 can promote breast cancer progression (Niknafs et al., 2016; Kong et al., 2019a; Chang et al., 2020; Liang et al., 2020), BCRT4 can induce signaling transduction in breast cancer (Xing et al., 2015), LINC00673 can promote cell proliferation of breast cancer (Qiao et al., 2019), and BORG can cause breast cancer metastasis and disease recurrence (Gooding et al., 2017). SNHG11, FEZF1-AS1, RP11, and DLEU1 have been reported to novel biomarkers of colorectal cancer (Bian et al., 2018; Liu et al., 2018; Wu et al., 2019; Xu et al., 2020). Thus, many computational models have been developed to discover lncRNA biomarkers for cancers (Peng et al., 2020a; Shen et al., 2022; Sun et al., 2022), for instance, rotation forest (Guo et al., 2019), KATZ measure (Chen, 2015), collaborative deep learning (Lan et al., 2020), matrix factorization (Fu et al., 2018; Wang et al., 2021a), network consistency projection (Li et al., 2019), and graph autoencoder (Shi et al., 2021).

In this manuscript, inspired by the association prediction method provided by Peng et al. (2020b), we develop a computational method, LDA-RWLMF, to predict LncRNA-Disease Associations (LDAs). LDA-RWLMF integrates random walk and Logistic Matrix Factorization to discover the roles of lncRNA biomarkers in the prognosis and diagnosis for breast cancer and colorectal cancer. First, we compute disease similarity and lncRNA similarity. Second, we first use random walk to extract negative LDAs. Third, we explored a logistic matrix factorization model to predict possible LDAs. The results from 5-fold cross validation show that LDA-RWLMF computes the best AUC value of 0.9312 on the MNDR dataset. Finally, we rank all lncRNA biomarkers for breast cancer and colorectal cancer after determining the performance of LDA-RWLMF.

2 Datasets

2.1 LncRNA-disease associations

Human LDA dataset was collected from the MNDR database (Cui et al., 2018; Fan et al., 2020) (http://www.rna-society.org/mndr/index.html). There are 1,529 LDAs between 89 diseases and 190 lncRNAs after preprocessing. For an LDA matrix between $n$ lncRNAs and $m$ diseases, we use $Y ϵ R^{n \times m}$ to describe the association information by Eq. 1:

Y_{i j} = {\begin{cases} 1 I f l n c R N A l_{i} & a s s o c i a t e s w i t h d_{j} \\ 0 & o t h e r w i s e \end{cases} (1)

2.2 Disease semantic similarity

We use the method provided by Fan et al. (2020) to compute disease semantic similarity based on the MeSH descriptors. Disease semantic similarity method provided by Fan et al. (2020) was based on LNCSIM1 and LNCSIM2 provided by Chen (2015). For a disease $A$ , suppose that $T_{A}$ represents its ancestor node set, $E_{A}$ denotes all edge set, its Directed Acyclic Graph (DAG) is represented as ${D A G}_{A} = {T_{A}, E_{A}}$ . For a disease term $t \in T_{A}$ in ${D A G}_{A}$ , its semantic contribution to $A$ is calculated by Eq. 2 (Chen, 2015):

{S V}_{A}^{1} (t) = {\begin{cases} 1 & t = A \\ \max (∆ \times {S V}_{A}^{1} (t^{'}) | t^{'} \in C (t) & t \neq A \end{cases} (2)

where $C (t)$ indicates the children of $t$ , $∆$ indicates the sematic contribution factor related to edges that link $t^{'}$ to $t$ , and $∆$ was usually set as 0.5 (Wang et al., 2010).

The above equation demonstrates that terms at the same layer from ${D A G}_{A}$ have the same semantic contribution to $A$ . But if two terms $t_{1}$ and $t_{2}$ are in the same layer of ${D A G}_{A}$ and $t_{1}$ appears in less in ${D A G}_{A}$ than $t_{2}$ , the conclusion from $t_{1}$ will be more specific than one from $t_{2}$ , thus, ${S V}_{A}^{1} (t_{1})$ is higher than ${S V}_{A}^{1} (t_{2}) .$

In this case, we compute the second semantic contribution of term $t \in T_{A}$ to disease $A$ by Eq. 3:

{S V}_{A}^{2} (t) = - \log \frac{D a g s (t)}{D} (3)

where $D$ indicates the number of diseases in MeSH, $D a g s (t)$ indicates the number of DAGs that contain the disease term $t$ . And the semantic contribution of $t$ in ${D A G}_{A}$ can be defined by Eq. 4:

{S V}_{A}^{3} (t) = {\begin{array}{c} 1 & t = A \\ \max ((∆ + \nabla) {S V}_{A}^{3} (t^{'}) | t^{'} \in C (t) & t \neq A \end{array} (4)

where $\nabla$ indicates the contribution factor related to information content, and is computed by Eq. 5:

\nabla = \frac{\max_{k \in K} (D a g s (k)) - d a g s (t)}{D} (5)

where $K$ indicates the disease set in MeSH.

Furthermore, the contribution of all terms in ${D A G}_{A}$ to the disease $A$ is computed by Eq. 6:

S V (A) = \sum_{t \in T_{A}} {S V}_{A}^{3} (t) (6)

Finally, the semantic similarity between two diseases (A and B) can be computed by Eq. 7:

S_{d}^{s} (A, B) = \frac{\sum_{t \in T_{A} \cap T_{B}} (S V_{A}^{3} (t) + S V_{B}^{3} (t))}{S V (A) + S V (B)} (7)

2.3 LncRNA functional similarity

We use the method provided by Fan et al. (Fan et al., 2020) to compute lncRNA functional similarity. Let that DG(u) [or DG(v)] indicate diseases linking to lncRNA $u$ (or $v$ ) on LDA matrix, the similarity between two lncRNAs $u$ and $v$ is obtained through disease semantic similarity in DG (u) and DG (v). A disease semantic similarity sub-matrix is first constructed. In the constructed matrix, rows and columns are diseases in DG (u) ∪DG (v), and each element indicates the semantic similarity between diseases. Suppose that $d_{u}$ indicate a disease in DG (u), the similarity between $d_{u}$ and DG(v) is computed by Eq. 8:

S (d_{u}, D G (v)) = \max_{d \in D G (v)} (S_{d} (d_{u}, d)) (8)

Similarly, the similarity between $d_{v}$ and DG (u) is computed by Eq. 9:

S (d_{v}, D G (u)) = \max_{d \in D G (u)} (S_{d} (d_{v}, d)) (9)

And the similarity of $D G (u) \to D G (v)$ is computed by Eq. 10:

S_{u \to v} = \sum_{d \in D G (u)} S (d, D G (v)) (10)

And similarity of $D G (v) \to D G (u)$ is computed by Eq. 11:

S_{v \to u} = \sum_{d \in D G (v)} S (d, D G (u)) (11)

The similarity between lncRNAs $u$ and $v$ is measured based on the disease semantic similarity by Eq. 12:

S_{l}^{f} (u, v) = \frac{S_{u \to v} + S_{v \to u}}{| D G (u) | + | D G (v) |} (12)

where $| D G (u) | a n d | D G (v) |$ are the number of diseases in $D G (u) a n d D G (v)$ .

3 Methods

We want to compute association probability for each lncRNA-disease pair based on disease semantic similarity and lncRNA functional similarity. The pipeline is shown in Figure 1.

FIGURE 1

FIGURE 1. The pipeline of LDA-RWLMF.

3.1 Gaussian association profile similarity and similarity fusion

In this section, we use Gaussian Association Profile (GAP) to compute the GAP similarity of diseases and lncRNAs. For a lncRNA $l_{i}$ , its GAP $A P (l_{i})$ is denoted using the $i$ th row of Y. The GAP similarity of lncRNAs $l_{i}$ and $l_{j}$ is defined by Eq. 13:

S_{l}^{g} (l_{i}, l_{j}) = \exp (- γ_{l} {‖ A P (l_{i}) - A P (l_{j}) ‖}^{2}) (13)

where $γ_{l} = γ_{l}^{'} / (\frac{1}{n} \sum_{k = 1}^{n} ∥ A P (l_{i}) ∥^{2})$ is the normalized kernel bandwidth with parameter $γ_{l}^{'}$ . Thus, the lncRNA similarity matrix $S_{l}$ is computed by Eq. 14:

S_{l} = \frac{1}{2} (S_{l}^{f} + S_{l}^{g}) (14)

Similarly, the disease GAP similarity $S_{d}$ can be computed.

3.2 Screening negative LDAs

There are not negative LDAs in the MNDR dataset. Credible negative LDAs help improve LDA prediction performance and further more effectively find potential lncRNA biomarkers for breast cancer and colorectal cancer. Peng et al. (2021b) developed a random walk with restart-based virus-drug association prediction method and obtained better performance. Inspired by the method provided by Peng et al. (2021b), we first compute association probability for each lncRNA-disease pair through random walk with restart and then screen credible negative LDAs.

We first constructed a heterogeneous network composed of lncRNA similarity network, disease similarity network, and LDA network. lncRNA similarity matrix $S_{l}$ , disease similarity matrix $S_{d}$ , and LDA matrix $Y$ are used as the adjacency matrices related to the heterogeneous network. The adjacency matrix related to the heterogeneous network is represented as Eq. 15:

H = [\begin{array}{c} S_{l} & Y \\ Y^{T} & S_{d} \end{array}] (15)

where $Y^{T}$ denotes the transpose of $Y$ .

We then compute transition probability on the heterogeneous graph. Suppose that $H = [\begin{array}{c} H_{l l} & H_{l d} \\ H_{d l} & H_{d d} \end{array}]$ indicate transition probability matrix, where $H_{l l}$ and $H_{d d}$ indicate the walks within lncRNA similarity network and disease similarity network, respectively, $H_{l d}$ and $H_{d l}$ indicate the jumps between networks. For an lncRNA/disease, when there is an association between the lncRNA/disease and diseases/lncRNAs, the node will either continue to walk in the current network based on a transition probability $λ \in [0,1]$ or jump between the above four networks.

The $i$ -th lncRNA will walk to the $j$ -th lncRNA through the transition probability $H_{l l} (i, j)$ by Eq. 16:

H_{l l} (i, j) = {\begin{array}{c} \frac{S_{l} (i, j)}{\sum_{k = 1}^{n} S_{l} (i, k)}, & i f \sum_{k = 1}^{m} Y (i, k) = 0 \\ \frac{(1 - λ) S_{l} (i, j)}{\sum_{k = 1}^{n} S_{l} (i, k)}, & o t h e r w i s e \end{array} (16)

or jump to a disease $d_{j}$ through the transition probability $H_{l d} (i, j)$ by Eq. 17:

H_{l d} (i, j) = {\begin{cases} \frac{λ Y (i, j)}{\sum_{k = 1}^{m} Y (i, k)}, i f \sum_{k = 1}^{m} Y (i, k) \neq 0 \\ 0, o t h e r w i s e \end{cases} (17)

Similarly, the $i$ -th disease $d_{i}$ will walk to the $j$ -th disease $d_{j}$ through the transition probability $H_{d d} (i, j)$ by Eq. 18:

H_{d d} (i, j) = {\begin{cases} \frac{S_{d} (i, j)}{\sum_{k = 1}^{m} S_{d} (i, k)}, & i f \sum_{k = 1}^{n} Y (k, i) = 0 \\ \frac{(1 - λ) S_{d} (i, j)}{\sum_{k = 1}^{m} S_{d} (i, k)}, & o t h e r w i s e \end{cases} (18)

or jump to an lncRNA $l_{j}$ through the transition probability $H_{d l} (i, j)$ by Eq. 19:

H_{d l} (i, j) = {\begin{cases} \frac{λ Y (i, j)}{\sum_{k = 1}^{n} Y (k, i)}, i f \sum_{k = 1}^{m} Y (k, i) \neq 0 \\ 0, o t h e r w i s e \end{cases} (19)

At the $t -$ th step, the association probability matrix between all lncRNA-disease pairs on the heterogeneous network is computed by Eq. 20:

P (t + 1) = (1 - θ) H^{T} * P (t) + θ P (0) (20)

where $H^{T}$ indicates the transpose of $H$ , and $θ$ is the restarting probability. $P (0)$ indicates the initial probability with $p_{i} (0) = [\begin{array}{c} (1 - η) v_{i} \\ η s_{i} \end{array}]$ , where $v_{i}$ and $s_{j}$ indicate the initial probability distributions on disease similarity network and lncRNA similarity network, respectively. And $η \in [0,1]$ is used to control the restarting probability in these two similarity networks. If $η < 0.5$ , the particle will more tend to restart from one of the seed microbes than from one of the seed diseases.

In the second step, we consider known LDAs as positive sample set $P$ , unknown lncRNA-disease pairs as unlabeled set $U$ and propose a PU learning approach to screen credible negative LDA sample set $R N$ . The method contains the following six steps:

Step 1. Randomly screening positive sample subset $D$ from $P$

Step 2. Adding $D$ into $U$ ;

Step 3. Considering $P - D$ as positive samples, $U + D$ as negative samples;

Step 4. Obtaining LDA score matrix $S^{N e g}$ using random walk with restart;

Step 5. Ranking lncRNA-disease pairs in $D$ based on $S_{\min}^{N e g}$ and obtaining the minimum score $S_{\min}^{N e g}$ in $D$ ;

Step 6. For every lncRNA-disease pair $x$ in $U$ :If $S^{N e g} (x) < S_{\min}^{N e g}$ then $R N = R N \cup x$ .

3.3 LDA prediction based on logistic matrix factorization

Logistic matrix factorization has been applied to multiple areas (Liu et al., 2020; Tang et al., 2021; Tian et al., 2022). Inspired by the approaches, we develop a logistic matrix factorization-based LDA prediction method, LDA-RWLMF.

Assume that both lncRNAs and diseases are mapped to $r$ -dimensional shared latent spaces ( $r ≪ n, m$ ), thus an lncRNA $l_{i}$ or disease $d_{i}$ can be represented as a latent vector $a_{i} \in R^{1 \times t} o r b_{i} \in R^{1 \times t} .$ The association probability $p_{i j}$ between $l_{i}$ and $d_{i}$ is calculated by Eq. 12:

p_{i j} = \frac{\exp (a_{i} b_{j}^{T})}{1 + \exp (a_{i} b_{j}^{T})} (21)

The latent vector matrix of all lncRNAs or diseases can be represented as $A \in R^{n \times r} o r B \in R^{m \times r}$ where $a_{i} o r b_{i}$ indicates the $i$ th or $j$ th row in $A o r B$ . In addition, known LDAs are more credible than unknown lncRNA-disease pairs. Thus, we assign higher confidence values to known LDAs than unknown lncRNA-disease pairs. Similar to Peng et al. (2020b), we use a constant $c$ to assess the importance of known LDAs and construct a prediction model by Eq. 22:

p (Y ∣ A, B) = \begin{array}{c} (\prod_{1 \leq i \leq n, 1 \leq j \leq m, y_{i j} = 1} {[p_{i j}^{y_{i j}} {(1 - p_{i j})}^{(1 - y_{i j})}]}^{c}) \\ \times (\prod_{1 \leq i \leq n, 1 \leq j \leq m, y_{i j} = 0} [p_{i j}^{y_{i j}} {(1 - p_{i j})}^{(1 - y_{i j})}]) \\ = \prod_{i = 1}^{n} \prod_{j = 1}^{m} p_{i j}^{c y_{i j}} {(1 - p_{i j})}^{(1 - y_{i j})} \end{array} (22)

Model (21) can be optimized based on the Bayesian distribution by Eq. 23:

\min_{A, B} \sum_{i = 1}^{m} \sum_{j = 1}^{n} (1 + c y_{i j} - y_{i j}) \log [1 + \exp (a_{i} b_{j}^{T})] - c y_{i j} a_{i} b_{j}^{T} + \frac{λ_{l}}{2} ∥ A ∥_{F}^{2} + \frac{λ_{d}}{2} ∥ B ∥_{F}^{2} (23)

where $λ_{l}$ and $λ_{d}$ are two parameters, ${‖ A ‖}_{F}$ indicates the Frobenius norm of $A$ . (Zhang et al. 2019a; Zhang et al. 2019b) integrated linear neighborhood information to model (22) to predict various associations. Similarly, we fuse neighborhood information to Eq. 23 by Eq. 24:

\begin{array}{c} \min_{A, B} \sum_{i = 1}^{m} \sum_{j = 1}^{n} (1 + c y_{i j} - y_{i j}) \ln [1 + \exp (a_{i} b_{j}^{T})] - c y_{i j} a_{i} b_{j}^{T} \\ + \frac{1}{2} t r [A^{T} (λ_{l} I + α L_{l}) A + \frac{1}{2} t r [B^{T} (λ_{d} I + α L_{d}) B \end{array} (24)

where $t r (\cdot)$ indicates the trace of the matrix. $L_{l}$ and $L_{d}$ indicate the corresponding Laplacian matrix of $A$ and $B$ . $L_{l} = (D_{l} + {\tilde{D}}_{l}) - (A + A^{T})$ where $D_{l}$ and ${\tilde{D}}_{l}$ are two diagonal matrices and $D_{l} (i, i) = \sum_{j = 1}^{m} a_{i j}$ and ${\tilde{D}}_{l} (i, i) = \sum_{i = 1}^{m} a_{i j}$ . Similarly, $L_{d}$ can be computed.

We compute $A$ and $B$ by solving Eq. 24 through an alternating gradient ascent approach.

Finally, lncRNA-disease association score $Y_{f i n} (i, j)$ for each lncRNA-disease pair can be computed by Eq. 25:

Y_{f i n} = A B^{T} (25)

4 Results

4.1 Experimental settings

We conduct 5-fold cross validation for 10 times to investigate the performance of LDA-RWLMF. AUC is used to evaluate the prediction accuracy of LDA identification models. AUC is the area under the true positive rate (TPR)-false positive rate (FPR) curve, where TPR and FPR are defined by Eqs 26, 27:

T P R = \frac{T P}{T P + F N} (26)

F P R = \frac{F P}{T N + F P} (27)

where TP, FP, TN, FN represent the number of true positives, false positives, true negatives, false negatives, respectively. Higher AUC is, better the prediction performance is. In addition, parameters in LDA-RWLMF are set to defaults provided by Peng et al. (2020b). And parameters in the other four comparison LDA prediction methods (LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM) are set to the same values provided by corresponding methods.

4.2 Performance comparison with other methods

To measure the performance of the proposed LDA-RWLMF method, we compare it with four other representative LDA inference approaches on the MNDR dataset. That is, LNCSIM1 (Chen, 2015), LNCSIM2 (Chen, 2015), ILNCSIM (Huang et al., 2016), and IDSSIM (Fan et al., 2020). LNCSIM1 and LNCSIM2 used Laplacian regularized least squares to predict possible LDAs based on disease DAGs and the information content, respectively. ILNCSIM first combined the hierarchical structure of disease DAG and the information content to compute disease similarity and then used Laplacian regularized least squares to infer new LDAs. IDSSIM designed a weighted K nearest neighbor approach to identify potential associations between lncRNAs and diseases by integrating disease semantic similarity and lncRNA functional similarity. Table 1 gives the AUC values of the four LDA identification methods and our proposed LDA-RWLMF on the MNDR dataset.

TABLE 1

TABLE 1. AUCs of LDA identification approaches on the MNDR dataset.

The results from Table 1 demonstrate that LDA-RWLMF computes the highest AUC compared to LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM on the MNDR dataset. Figure 2 gives the results of LDA-RWLMF from 10 time cross validation. From Figure 2, we can find that AUC obtain by LDA-RWLMF is relatively steady during 10 time cross validation.

FIGURE 2

FIGURE 2. The AUC of LDA-RWLMF from 10 time cross validation (t = 1, 2, 3, … , 10).

4.3 Case study

4.3.1 lncRNA biomarker identification for breast cancer

Breast cancer is the commonest life-threatening cancer in women (Key et al., 2001; Sharma, et al., 2010). lncRNAs play important roles in epigenetic regulation, transcriptional regulation and post-transcriptional regulation and have been potential biomarkers of many diseases. Substantial publications have reported that lncRNAs affect proliferation and apoptosis, invasion and metastasis, and cancer stemness of breast cancer. For example, LSINCT5 and Zfas one can promote the proliferation of breast cancer, HOTAIR suppresses invasion and migration of breast cancer, SOX2OT induces SOX2 expression in breast cancer, and SRA is the expression activator of breast cancer (Sun et al., 2017). We want to conduct case analyses to find possible lncRNA biomarkers for breast cancer based on the proposed LDA-RWLMF model.

In the MNDR dataset, there are 89 lncRNAs that may associate with breast cancer, where 54 lncRNAs have been experimentally validated to associate with the cancer and 35 lncRNAs have unknown associations with it. We use the proposed LDA-RWLMF method to rank the 89 lncRNAs for breast cancer. The results are shown in Tables 2, 3. Table 2 demonstrates the ranking results of the predicted top 48 lncRNAs according to the computed association score with breast cancer on the MNDR dataset. These 48 lncRNAs are known to link to breast cancer on the MNDR dataset and are ranked as top 48.

TABLE 2

TABLE 2. The rankings of the predicted top 48 lncRNAs according to association with breast cancer on the MNDR dataset.

TABLE 3

TABLE 3. The rankings of the remaining 41 lncRNAs according to association with breast cancer on the MNDR dataset.

Table 3 gives the rankings of the remaining 41 lncRNAs according to the association scores with breast cancer on the MNDR dataset. Among all lncRNAs unknown to associate with breast cancer on the MNDR dataset, lncRNA HULC is predicted to link to breast cancer with the highest association scores. Shi et al. (2016) observed that HULC can act as an oncogene biomarker in triple-negative breast cancer and as an independent possible poor prognostic factor in patients suffered from triple-negative breast cancer. Wang et al. (2019) found that HULC can promote the development of breast cancer through regulating the expression of LYPD1. Gavgani et al. (2020) investigated that the HULC knockdown can induce apoptosis and suppress cellular migration in breast cancer cells.

PCAT1 may link to breast cancer with the ranking of three among all lncRNAs unknown to associate with breast cancer on the MNDR dataset. Several studies have reported that PCAT1 can associate with breast cancer although its association with the cancer on the MNDR dataset is unobserved. Abdollahzadeh et al. (2020) reported that the altered regulation of PCAT1 may play crucial roles in the development and pathogenesis of breast cancer. Sarrafzadeh et al. (2017) assessed the expression of PCAT-1 through real-time reverse transcription polymerase chain reaction in breast tumor samples from 47 breast cancer patients and found that PCAT-1 may involve in the pathogenesis of breast cancers. Wang et al. (2021a) observed that PCAT-1 can facilitate breast cancer progression by binding to RACK1 and thus boosting oxygen-independent stability of HIF-1α. Tang et al. (2022) detect that PCAT1 can regulate the expression of PITX2 in breast cancer.

In addition, we predict that nephronectin intronic transcript 1 (NPTN-IT1, also known as lncRNA-LET) may have relationship with breast cancer. NPTN-IT1 has been reported to associate with bladder cancer through attenuating the expression of the target of miR-145 and ILF3 in bladder cancer (Zhang et al., 2021b). It was significantly down-regulated in multiple tumor tissues of colorectal cancer. It also has a regulation role in hypoxia signaling of hepatocellular carcinoma (Sun et al., 2013) and was highly expressed in HepG2 cells (Kong et al., 2019b). We hope that association between three lncRNAs (HULC, NPTN-IT1, and PCAT1) and breast cancer can be validated through wet experiments. Figure 3 shows the associations between the 41 lncRNAs that are ranked as the last 41 and breast cancer. Black solid lines represent known LDAs in the MNDR database. Green solid lines represent LDAs that can be observed in the lncRNA disease database. Red dots lines represent LDAs that are predicted to be potential lncRNA biomarkers of breast cancer and can be confirmed by related publications. Blue equal dash lines represent unknown LDAs.

FIGURE 3

FIGURE 3. The associations between the remaining 41 lncRNAs and breast cancer.

4.3.2 lncRNA biomarker identification for colorectal cancer

Colorectal cancer is a heterogeneous disease. It has high morbidity and mortality. lncRNAs demonstrate dense associations with colorectal cancer. In this study, we conduct case analyses to identify possible lncRNA biomarkers for colorectal cancer based on LDA-RWLMF. In the MNDR dataset, 89 lncRNAs possibly associate with colorectal cancer, where 55 lncRNAs have been validated to be the biomarkers of the cancer and remaining 34 lncRNAs have not been validated. We use LDA-RWLMF to compute the association scores between all 89 lncRNAs and colorectal cancer and rank the 89 lncRNAs for colorectal cancer. The results are shown in Tables 4, 5. Table 4 shows the rankings of the identified top 50 lncRNAs according to the computed association score with colorectal cancer on the MNDR dataset. The 50 lncRNAs are known to associate with colorectal cancer on the MNDR dataset and are ranked as top 50.

TABLE 4

TABLE 4. The rankings of the identified top 50 lncRNAs associated with colorectal cancer on the MNDR dataset.

TABLE 5

TABLE 5. The rankings of the remaining 41 lncRNAs according to association with breast cancer on the MNDR dataset.

Table 5 gives the rankings of the remaining 39 lncRNAs according to the association scores with colorectal cancer on the MNDR dataset. Among all lncRNAs unknown association with colorectal cancer on the MNDR dataset, lncRNA HAR1A is inferred to link to colorectal cancer with the highest association scores. HAR1A is a favorable prognostic biomarker for patients. Shi et al. (2019) analyzed the expression profiles of HAR1A using RT-qPCR and found its expression level was significantly lower in hepatocullular cancer. Chen et al. (2020) have still reported that the HAR1A expression levels were reduced in hepatocellular carcinoma tissues.

Figure 4 gives the associations between the remaining 39 lncRNAs and colorectal cancer. Black solid lines represent known LDAs in the MNDR database. Red dots lines represent LDAs that are predicted to be potential lncRNA biomarkers of breast cancer and can be confirmed by related publications. Blue equal dash lines represent unknown LDAs.

FIGURE 4

FIGURE 4. The associations between the remaining 39 lncRNAs and colorectal cancer.

5 Discussion and conclusion

Breast cancer and colorectal cancer are the most frequent cancers with high mortality rates. They demonstrate very high heterogeneity at molecular and clinical levels. With the fast development of next generation sequencing technologies, we can more accurately characterize the human genome. lncRNAs act mainly as gene expression regulators. The dysregulation of lncRNAs may destroy the normal transcriptional landscape and thus cause malignant transformation. In addition, their highly specific expression and functional tertiary structure force them to be as promising diagnostic biomarkers and potential targets for various diseases including breast cancer and colorectal cancer.

In this study, we proposed a computational lncRNA-disease association method (LDA-RWLMF) to identify potential biomarkers for breast cancer and colorectal cancer. First, a random walk with restart method was designed to extract negative LDAs. Second, a logistic matrix factorization model was explored to infer possible associations between lncRNAs and diseases. Finally, all lncRNAs are ranked according to association scores with breast cancer and colorectal cancer on the MNDR dataset.

We conduct 5-fold cross validation for 10 times to compare LDA-RWLMF with state-of-the-art LDA prediction models on the MNDR dataset, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results show that LDA-RWLMF computes the best AUC values of 0.9312. We predict that lncRNAs (HULC, NPTN-IT1, and PCAT1) may be possible biomarkers of breast cancer and colorectal cancer.

Our proposed LDA-RWLMF method has two disadvantages. First, it extracted credible negative LDA samples. In the area of association prediction, there are no negative association samples because of the limitation of biomedical experiments, which causes relatively poor performance. Thus, we designed a negative LDA extraction method based on PU learning. Second, the logistic matrix factorization model can effectively discover possible associations between two biological entities. Thus, we used the model to identify new LDAs. In addition, diseases and lncRNAs exhibit abundant biological features. In this study, we failed to consider these diverse features. In the future, we will further integrate more biological information to improve LDA prediction.

In the future, we will further design more effective negative sample screening method based on positive-unlabeled learning. In addition, we will also develop deep learning model for LDA prediction. We anticipate that the proposed LDA-RWLMF method can help design therapeutic regimens for personalized treatment of breast cancer and colorectal cancer and thus opportunely inhibit its recurrence.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

Conceptualization: SL, MC, and LT; Methodology: SL, MC, YW, MW, and FW; Project administration: SL; Software: SL and MC; Writing-original draft: SL; Writing-review and editing: SL, MC.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abdollahzadeh, R., Mansoori, Y., Azarnezhad, A., Daraei, A., Paknahad, S., Mehrabi, S., et al. (2020). Expression and clinicopathological significance of AOC4P, PRNCR1, and PCAT1 lncRNAs in breast cancer. Pathol. Res. Pract. 216 (10), 153131. doi:10.1016/j.prp.2020.153131

PubMed Abstract | CrossRef Full Text | Google Scholar

Barzaman, K., Karami, J., Zarei, Z., Hosseinzadeh, A., Kazemi, M. H., Moradi-Kalbolandi, S., et al. (2020). Breast cancer: Biology, biomarkers, and treatments. Int. Immunopharmacol. 84, 106535. doi:10.1016/j.intimp.2020.106535

PubMed Abstract | CrossRef Full Text | Google Scholar

Bian, Z., Zhang, J., Min, L., Feng, Y., Xue, W., Jia, Z., et al. (2018). LncRNA-FEZF1-AS1 promotes tumor proliferation and metastasis in colorectal cancer by regulating PKM2 signaling. Clin. Cancer Res. 24 (19), 4808–4819. doi:10.1158/1078-0432.CCR-17-2967

PubMed Abstract | CrossRef Full Text | Google Scholar

Biller, L. H., and Schrag, D. (2021). Diagnosis and treatment of metastatic colorectal cancer: A review. Jama 325 (7), 669–685. doi:10.1001/jama.2021.0106

PubMed Abstract | CrossRef Full Text | Google Scholar

Campos-Parra, A. D., López-Urrutia, E., Orozco Moreno, L. T., Lopez-Camarillo, C., Meza-Menchaca, T., Figueroa Gonzalez, G., et al. (2018). Long non-coding RNAs as new master regulators of resistance to systemic treatments in breast cancer. Int. J. Mol. Sci. 19 (9), 2711. doi:10.3390/ijms19092711

PubMed Abstract | CrossRef Full Text | Google Scholar

Chandra Gupta, S., and Nandan Tripathi, Y. (2017). Potential of long non-coding RNAs in cancer patients: From biomarkers to therapeutic targets. Int. J. Cancer 140 (9), 1955–1967. doi:10.1002/ijc.30546

PubMed Abstract | CrossRef Full Text | Google Scholar

Chang, K. C., Diermeier, S. D., Yu, A. T., Brine, L. D., and Spector, D. L. (2020). MaTAR25 lncRNA regulates the Tensin1 gene to impact breast cancer progression[J]. Nat. Commun. 11 (1), 1–19.

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, G., Wang, Z., Wang, D., Qiu, C., Liu, M., Chen, X., et al. (2012). LncRNADisease: A database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 41 (D1), D983–D986. doi:10.1093/nar/gks1099

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X. (2015). Katzlda: KATZ measure for the lncRNA-disease association prediction. Sci. Rep. 5 (1), 16840–16911. doi:10.1038/srep16840

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, X., Xie, D., Zhao, Q., and You, Z. H. (2019). MicroRNAs and complex diseases: From experimental results to computational models. Brief. Bioinform. 20 (2), 515–539. doi:10.1093/bib/bbx130

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, Y., Guo, Y., Chen, H., and Ma, F. (2020). Long non-coding RNA expression profiling identifies a four-long non-coding RNA prognostic signature for isocitrate dehydrogenase mutant glioma. Front. Neurol. 11, 573264. doi:10.3389/fneur.2020.573264

PubMed Abstract | CrossRef Full Text | Google Scholar

Cui, T., Zhang, L., Huang, Y., Yi, Y., Tan, P., Zhao, Y., et al. (2018). MNDR v2.0: An updated resource of ncRNA-disease associations in mammals Nucleic Acids Res. 46 (D1), D371–D374. doi:10.1093/nar/gkx1025

PubMed Abstract | CrossRef Full Text | Google Scholar

DeSantis, C. E., Ma, J., Gaudet, M. M., Newman, L. A., Miller, K. D., Goding Sauer, A., et al. (2019). Breast cancer statistics. Ca. Cancer J. Clin. 69 (6), 438–451. doi:10.3322/caac.21583

PubMed Abstract | CrossRef Full Text | Google Scholar

Duffy, M. J., Synnott, N. C., and Crown, J. (2018). Mutant p53 in breast cancer: Potential as a therapeutic target and biomarker. Breast Cancer Res. Treat. 170 (2), 213–219. doi:10.1007/s10549-018-4753-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Fan, W., Shang, J., Li, F., Sun,, Y., and Liu, J. X. (2020). Idssim: An lncRNA functional similarity calculation model based on an improved disease semantic similarity method[J]. BMC Bioinforma. 21 (1), 1–14.

PubMed Abstract | CrossRef Full Text | Google Scholar

Fu, G., Wang, J., Domeniconi, C., and Yu, G. (2018). Matrix factorization-based data fusion for the prediction of lncRNA-disease associations. Bioinformatics 34 (9), 1529–1537. doi:10.1093/bioinformatics/btx794

PubMed Abstract | CrossRef Full Text | Google Scholar

Garrido-Castro, A. C., Lin, N. U., and Polyak, K. (2019). Insights into molecular classifications of triple-negative breast cancer: Improving patient selection for treatment. Cancer Discov. 9 (2), 176–198. doi:10.1158/2159-8290.CD-18-1177

PubMed Abstract | CrossRef Full Text | Google Scholar

Gavgani, R. R., Babaei, E., Hosseinpourfeizi, M. A., Fakhrjou, A., and Montazeri, V. (2020). Study of long non-coding RNA highly upregulated in liver cancer (HULC) in breast cancer: A clinical & in vitro investigation. Indian J. Med. Res. 152 (3), 244–253. doi:10.4103/ijmr.IJMR_1823_18

PubMed Abstract | CrossRef Full Text | Google Scholar

Gooding, A. J., Zhang, B., Jahanbani, F. K., Gilmore, H. L., Chang, J. C., Valadkhan, S., et al. (2017). The lncRNA BORG drives breast cancer metastasis and disease recurrence. Sci. Rep. 7 (1), 1–18. doi:10.1038/s41598-017-12716-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Guo, Z. H., You, Z. H., Wang, Y. B., Yi, H. C., and Chen, Z. H. (2019). A learning-based method for LncRNA-disease association identification combing similarity information and rotation forest. IScience 19, 786–795. doi:10.1016/j.isci.2019.08.030

PubMed Abstract | CrossRef Full Text | Google Scholar

Guo, Z., Hui, Y., Kong, F., and Lin, X. (2022). Finding lung-cancer-related lncRNAs based on laplacian regularized least squares with unbalanced Bi-random walk. Front. Genet. 13, 933009. doi:10.3389/fgene.2022.933009

PubMed Abstract | CrossRef Full Text | Google Scholar

Heer, E., Harper, A., Escandor, N., Sung, H., McCormack, V., and Fidler-Benaoudia, M. M. (2020). Global burden and trends in premenopausal and postmenopausal breast cancer: A population-based study. Lancet. Glob. Health 8 (8), e1027–e1037. doi:10.1016/S2214-109X(20)30215-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, F., Yue, X., Xiong, Z., Yu, Z., Liu, S., and Zhang, W. (2021). Tensor decomposition with relational constraints for predicting multiple types of microRNA-disease associations. Brief. Bioinform. 22 (3), bbaa140. doi:10.1093/bib/bbaa140

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Y. A., Chen, X., You, Z. H., Huang, D. S., and Chan, K. C. C. (2016). Ilncsim: Improved lncRNA functional similarity calculation model. Oncotarget 7 (18), 25902–25914. doi:10.18632/oncotarget.8296

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Z., Shi, J., Gao, Y., Cui, C., Zhang, S., Li, J., et al. (2019). HMDD v3.0: A database for experimentally supported human microRNA-disease associations. Nucleic Acids Res. 47 (D1), D1013–D1017. doi:10.1093/nar/gky1010

PubMed Abstract | CrossRef Full Text | Google Scholar

Key, T. J., Verkasalo, P. K., and Banks, E. (2001). Epidemiology of breast cancer. Lancet. Oncol. 2 (3), 133–140. doi:10.1016/S1470-2045(00)00254-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Kong, J., Qiu, Y., Li, Y., Zhang, H., and Wang, W. (2019b). TGF-β1 elevates P-gp and BCRP in hepatocellular carcinoma through HOTAIR/miR-145 axis. Biopharm. Drug Dispos. 40 (2), 70–80. doi:10.1002/bdd.2172

PubMed Abstract | CrossRef Full Text | Google Scholar

Kong, X., Duan, Y., Sang, Y., Li, Y., Zhang, H., Liang, Y., et al. (2019a). LncRNA-CDC6 promotes breast cancer progression and function as ceRNA to target CDC6 by sponging microRNA-215. J. Cell. Physiol. 234 (6), 9105–9117. doi:10.1002/jcp.27587

PubMed Abstract | CrossRef Full Text | Google Scholar

Lan, W., Dong, Y., Chen, Q., Zheng, R., Liu, J., Pan, Y., et al. (2022). Kgancda: Predicting circRNA-disease associations based on knowledge graph attention network. Brief. Bioinform. 23 (1), bbab494. doi:10.1093/bib/bbab494

PubMed Abstract | CrossRef Full Text | Google Scholar

Lan, W., Lai, D., and Chen, Q. (2020). Ldicdl: LncRNA-disease association identification based on collaborative deep learning[J]. IEEE/ACM Trans. Comput. Biol. Bioinforma. 19, 1715–1723. doi:10.1109/TCBB.2020.3034910

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, G., Luo, J., Liang, C., Xiao, Q., Ding, P., and Zhang, Y. (2019). Prediction of LncRNA-disease associations based on network consistency projection. Ieee Access 7, 58849–58856. doi:10.1109/access.2019.2914533

CrossRef Full Text | Google Scholar

Liang, Y., Song, X., Li, Y., Chen, B., Zhao, W., Wang, L., et al. (2020). Retraction note to: LncRNA BCRT1 promotes breast cancer progression by targeting miR-1303/PTBP3 axis. Mol. Cancer 19 (1), 131–220. doi:10.1186/s12943-022-01576-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Liang, Y., Wu, Y., and Zhang, Z. (2022a). Hyb4mC: A hybrid DNA2vec-based model for DNA N4-methylcytosine sites prediction[J]. BMC Bioinforma. 23 (1), 1–18. doi:10.1186/s12859-022-04789-6

CrossRef Full Text | Google Scholar

Liang, Y., Zhang, Z. Q., Liu, N. N., Wu, Y. N., Gu, C. L., and Wang, Y. L. (2022b). Magcnse: Predicting lncRNA-disease associations using multi-view attention graph convolutional network and stacking ensemble model. BMC Bioinforma. 23 (1), 1–22. doi:10.1186/s12859-022-04715-w

CrossRef Full Text | Google Scholar

Liu, C., Wei, D., Xiang, J., Ren, F., Huang, L., Lang, J., et al. (2020). An improved anticancer drug-response prediction based on an ensemble method integrating matrix completion and ridge regression. Mol. Ther. Nucleic Acids 21, 676–686. doi:10.1016/j.omtn.2020.07.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, T., Han, Z., Li, H., Zhu, Y., Sun, Z., and Zhu, A. (2018). LncRNA DLEU1 contributes to colorectal cancer progression via activation of KPNA3. Mol. Cancer 17 (1), 1–13. doi:10.1186/s12943-018-0873-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Niknafs, Y. S., Han, S., Ma, T., Speers, C., Zhang, C., Wilder-Romans, K., et al. (2016). The lncRNA landscape of breast cancer reveals a role for DSCAM-AS1 in breast cancer progression. Nat. Commun. 7 (1), 12791–12813. doi:10.1038/ncomms12791

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L. H., Chen, Y. Q., Ma, N., and Chen, X. (2017). Narrmda: Negative-aware and rating-based recommendation algorithm for miRNA-disease association prediction. Mol. Biosyst. 13 (12), 2650–2659. doi:10.1039/c7mb00499k

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L. H., Sun, C. N., Guan, N. N., Qiang, J., and Chen, X. (2018). Hnmda: Heterogeneous network-based miRNA-disease association prediction. Mol. Genet. Genomics 293 (4), 983–995. doi:10.1007/s00438-018-1438-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L. H., Tian, X. F., Shen, L., Kuang, M., Li, T. B., Tian, G., et al. (2020a). Identifying effective antiviral drugs against SARS-CoV-2 by drug repositioning through virus-drug association prediction. Front. Genet. 11, 577387. doi:10.3389/fgene.2020.577387

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L., Shen, L., Liao, L., Liu, G., and Zhou, L. (2020b). Rnmfmda: A microbe-disease association identification method based on reliable negative sample selection and logistic matrix factorization with neighborhood regularization. Front. Microbiol. 11, 592430. doi:10.3389/fmicb.2020.592430

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L., Wang, C., Tian, X., Zhou, L., and Li, K. (2021). Finding lncRNA-protein interactions based on deep learning with dual-net neural architecture[J]. IEEE/ACM Trans. Comput. Biol. Bioinforma. doi:10.1109/TCBB.2021.3116232

CrossRef Full Text | Google Scholar

Peng, L. H., Shen, L., Xu, J. L., Tian, X. F., Liu, F. X., Wang, J. J., et al. (2021b). Prioritizing antiviral drugs against SARS-CoV-2 by integrating viral complete genome sequences and drug chemical structures[J]. Sci. Rep. 11 (1), 1–11.

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L., Wang, F., Wang, Z., Tan, J., Huang, L., Tian, X., et al. (2022a). Cell-cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: Data resources and computational strategies. Brief. Bioinform. 23 (4), bbac234. doi:10.1093/bib/bbac234

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L. H., Tan, J. W., Tian, X. F., and Zhou, L. Q. (2022b). EnANNDeep: An ensemble-based lncRNA–protein interaction prediction framework with adaptive k-nearest neighbor classifier and deep models[J]. Interdiscip. Sci. Comput. Life Sci., 1–24. doi:10.1007/s12539-021-00483-y

CrossRef Full Text | Google Scholar

Qiao, K., Ning, S., Wan, L., Wu, H., Wang, Q., Zhang, X., et al. (2019). LINC00673 is activated by YY1 and promotes the proliferation of breast cancer cells via the miR-515-5p/MARK4/Hippo signaling pathway. J. Exp. Clin. Cancer Res. 38 (1), 418–515. doi:10.1186/s13046-019-1421-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Sarrafzadeh, S., Geranpayeh, L., and Ghafouri-Fard, S. (2017). Expression analysis of long non-coding PCAT-1in breast cancer. Int. J. Hematol. Oncol. Stem Cell Res. 11 (3), 185–191.

PubMed Abstract | Google Scholar

Sharma, G. N., Dave, R., Sanadya, J., Sharma, P., and Sharma, K. K. (2010). Various types and management of breast cancer: An overview. J. Adv. Pharm. Technol. Res. 1 (2), 109–126.

PubMed Abstract | Google Scholar

Shen, L., Liu, F. X., Huang, L., Liu, G. Y., Zhou, L. Q., and Peng, L. H. (2022). VDA-RWLRLS: An anti-SARS-CoV-2 drug prioritizing framework combining an unbalanced bi-random walk and Laplacian regularized least squares. Comput. Biol. Med. 140, 105119. doi:10.1016/j.compbiomed.2021.105119

CrossRef Full Text | Google Scholar

Shi, F., Xiao, F., Ding, P., Qin, H., and Huang, R. (2016). Long noncoding RNA highly up-regulated in liver cancer predicts unfavorable outcome and regulates metastasis by MMPs in triple-negative breast cancer. Arch. Med. Res. 47 (6), 446–453. doi:10.1016/j.arcmed.2016.11.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Shi, Z., Luo, Y., Zhu, M., Zhou, Y., Zheng, B., Wu, D., et al. (2019). Expression analysis of long non-coding RNA HAR1A and HAR1B in HBV-induced hepatocullular carcinoma in Chinese patients. Lab. Med. 50 (2), 150–157. doi:10.1093/labmed/lmy055

PubMed Abstract | CrossRef Full Text | Google Scholar

Shi, Z., Zhang, H., Jin, C., Quan, X., and Yin, Y. (2021). A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations. BMC Bioinforma. 22 (1), 136–220. doi:10.1186/s12859-021-04073-z

CrossRef Full Text | Google Scholar

Siegel, R., and Naishadhamjemal, D. A. (2013). Cancer statistics, 2013. Ca. Cancer J. Clin. 63, 11–30. doi:10.3322/caac.21166

PubMed Abstract | CrossRef Full Text | Google Scholar

Sledge, G. W., Mamounas, E. P., Hortobagyi, G. N., Burstein, H. J., Goodwin, P. J., and Wolff, A. C. (2014). Past, present, and future challenges in breast cancer treatment. J. Clin. Oncol. 32 (19), 1979–1986. doi:10.1200/JCO.2014.55.4139

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, F., Sun, J., and Zhao, Q. (2022). A deep learning method for predicting metabolite-disease associations via graph neural network. Brief. Bioinform. 23 (4), bbac266. doi:10.1093/bib/bbac266

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, W., Wu, Y., Yu, X., Liu, Y., Song, H., Xia, T., et al. (2013). Decreased expression of long noncoding RNA AC096655.1-002 in gastric cancer and its clinical significance. Tumour Biol. 34 (5), 2697–2701. doi:10.1007/s13277-013-0821-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, Y. S., Zhao, Z., Yang, Z. N., Xu, F., Lu, H. J., Zhu, Z. Y., et al. (2017). Risk factors and preventions of breast cancer. Int. J. Biol. Sci. 13 (11), 1387–1397. doi:10.7150/ijbs.21635

PubMed Abstract | CrossRef Full Text | Google Scholar

Tang, W., Lu, G., and Ji, Y. (2022). Long non-coding RNA PCAT1 sponges miR-134-3p to regulate PITX2 expression in breast cancer[J]. Mol. Med. Rep. 25 (3), 1–10.

PubMed Abstract | CrossRef Full Text | Google Scholar

Tang, X., Cai, L., Meng, Y., Xu, J., Lu, C., and Yang, J. (2021). Indicator regularized non-negative matrix factorization method-based drug repurposing for COVID-19. Front. Immunol. 11, 3824. doi:10.3389/fimmu.2020.603615

CrossRef Full Text | Google Scholar

Tian, X., Shen, L., Gao, P., Huang, L., Liu, G., Zhou, L., et al. (2022). Discovery of potential therapeutic drugs for COVID-19 through logistic matrix factorization with kernel diffusion. Front. Microbiol. 13, 13. doi:10.3389/fmicb.2022.740382

PubMed Abstract | CrossRef Full Text | Google Scholar

Wahlestedt, C. (2013). Targeting long non-coding RNA to therapeutically upregulate gene expression. Nat. Rev. Drug Discov. 12 (6), 433–446. doi:10.1038/nrd4018

PubMed Abstract | CrossRef Full Text | Google Scholar

Waks, A. G., and Winer, E. P. (2019). Breast cancer treatment: A review. Jama 321 (3), 288–300. doi:10.1001/jama.2018.19323

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, D., Wang, J., Lu, M., Song, F., and Cui, Q. (2010). Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics 26 (13), 1644–1650. doi:10.1093/bioinformatics/btq241

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, J., Chen, X., Hu, H., Yao, M., Song, Y., Yang, A., et al. (2021b). PCAT-1 facilitates breast cancer progression via binding to RACK1 and enhancing oxygen-independent stability of HIF-1α. Mol. Ther. - Nucleic Acids 24, 310–324. doi:10.1016/j.omtn.2021.02.034

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, M. N., You, Z. H., Wang, L., Li, L. P., and Zheng, K. (2021a). Ldgrnmf: LncRNA-disease associations prediction based on graph regularized non-negative matrix factorization. Neurocomputing 424, 236–245. doi:10.1016/j.neucom.2020.02.062

CrossRef Full Text | Google Scholar

Wang, N., Zhong, C., Fu, M., Li, L., Wang, F., Lv, P., et al. (2019). Long non-coding RNA HULC promotes the development of breast cancer through regulating LYPD1 expression by sponging miR-6754-5p. Onco. Targets. Ther. 12, 10671–10679. doi:10.2147/OTT.S226040

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, Y., Yang, X., and Chen, Z. (2019). m6A-induced lncRNA RP11 triggers the dissemination of colorectal cancer cells via upregulation of Zeb1[J]. Mol. cancer 18 (1), 1–16.

PubMed Abstract | CrossRef Full Text | Google Scholar

Xi, Y., and Xu, P. (2021). Global colorectal cancer burden in 2020 and projections to 2040. Transl. Oncol. 14 (10), 101174. doi:10.1016/j.tranon.2021.101174

PubMed Abstract | CrossRef Full Text | Google Scholar

Xing, Z., Park, P. K., Lin, C., and Yang, L. (2015). LncRNA BCAR4 wires up signaling transduction in breast cancer. RNA Biol. 12 (7), 681–689. doi:10.1080/15476286.2015.1053687

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, W., Zhou, G., Wang, H., Liu, Y., Chen, B., Chen, W., et al. (2020). Circulating lncRNA SNHG11 as a novel biomarker for early diagnosis and prognosis of colorectal cancer. Int. J. Cancer 146 (10), 2901–2912. doi:10.1002/ijc.32747

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, J., Grünewald, S., and Wan, X. F. (2013). Quartet-net: A quartet-based method to reconstruct phylogenetic networks[J]. Mol. Biol. 30 (5), 1206–1217.

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, J., Peng, S., and Zhang, B. (2020). Human geroprotector discovery by targeting the converging subnetworks of aging and age-related diseases[J]. Geroscience 42 (1), 353–372.

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, H., Jiang, L., Zhong, S., Li, J., Sun, D., Hou, J., et al. (2021b). The role of long non-coding RNAs in drug resistance of cancer. Clin. Genet. 99 (1), 84–92. doi:10.1111/cge.13800

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, L., Yang, P., Feng, H., Zhao, Q., and Liu, H. (2021a). Using network distance analysis to predict lncRNA–miRNA interactions. Interdiscip. Sci. Comput. Life Sci. 13 (3), 535–545. doi:10.1007/s12539-021-00458-z

CrossRef Full Text | Google Scholar

Zhang, W., Li, Z., Guo, W., Yang, W., and Huang, F. (2019b). A fast linear neighborhood similarity-based network link inference method to predict MicroRNA-disease associations. IEEE/ACM Trans. Comput. Biol. Bioinform. 18 (2), 405–415. doi:10.1109/tcbb.2019.2931546

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, W., Jing, K., Huang, F., Chen, Y., Li, B., Li, J., et al. (2019a). Sflln: A sparse feature learning ensemble method with linear neighborhood regularization for predicting drug-drug interactions. Inf. Sci. (N. Y). 497, 189–201. doi:10.1016/j.ins.2019.05.017

CrossRef Full Text | Google Scholar

Zhao, Q., Yang, Y., Ren, G., and Fan, C. (2019). Integrating bipartite network projection and KATZ measure to identify novel CircRNA-disease associations. IEEE Trans. Nanobioscience 18 (4), 578–584. doi:10.1109/TNB.2019.2922214

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, L., Duan, Q., Tian, X., Tang, J., and Peng, L. H. (2021a). LPI-HyADBS: A hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification[J]. BMC Bioinforma. 22 (1), 1–31.

CrossRef Full Text | Google Scholar

Zhou, L., Wang, Z., Tian, X., and Peng, L. (2021b). LPI-deepGBDT: A multiple-layer deep framework based on gradient boosting decision trees for lncRNA-protein interaction identification. BMC Bioinforma. 22, 479. doi:10.1186/s12859-021-04399-8

CrossRef Full Text | Google Scholar

Keywords: breast cancer, colorectal cancer, lncRNA, biomarker, lncRNA-disease association, random walk, logistic matrix factorization

Citation: Li S, Chang M, Tong L, Wang Y, Wang M and Wang F (2023) Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization. Front. Genet. 13:1023615. doi: 10.3389/fgene.2022.1023615

Received: 20 August 2022; Accepted: 10 October 2022;
Published: 20 January 2023.

Edited by:

Lihong Peng, Hunan University of Technology, China

Reviewed by:

Guanghui Li, East China Jiaotong University, China
Li Zejun, Professional Services Review, Australia

Copyright © 2023 Li, Chang, Tong, Wang, Wang and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shijun Li, Y2ZsaXNoaWp1bjY1ODhAc2luYS5jb20=

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.