Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM

Su, Zhenguo; Lu, Huihui; Wu, Yan; Li, Zejun; Duan, Lian

doi:10.3389/fgene.2023.1238095

ORIGINAL RESEARCH article

Front. Genet., 16 August 2023

Sec. RNA

Volume 14 - 2023 | https://doi.org/10.3389/fgene.2023.1238095

This article is part of the Research TopicApplications of RNA-Seq in Cancer and Tumor ResearchView all 12 articles

Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM

Zhenguo Su¹^†

Huihui Lu²^†

Yan Wu³

Zejun Li⁴*

Lian Duan^5,6,7,8*

¹Clinical Lab, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, China
²Department of Thoracic Cardiovascular Surgery, Hunan Province Directly Affiliated TCM Hospital, Zhuzhou, China
³Geneis (Beijing) Co., Ltd., Beijing, China
⁴School of Computer Science, Hunan Institute of Technology, Hengyang, China
⁵Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
⁶Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
⁷National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
⁸Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China

Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases.

Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA–disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma.

Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA-disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross-validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma.

Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.

1 Introduction

Long non-coding RNAs (lncRNAs) are non-coding RNAs with more than 200 nucleotides (Bertone et al., 2004; Peng et al., 2022a; Peng et al., 2022b). LncRNAs play an important role in the development and progression of various diseases (Lanjanian et al., 2021; Meng et al., 2021; Yang and Li 2021; Peng et al., 2022c). LncRNAs have dense associations with many diseases, for example, lung cancer, colorectal cancer, prostate cancer, and Alzheimer’s disease (Klattenhoff et al., 2013; Tan et al., 2013; Chakravarty et al., 2014; He et al., 2014; Zhang et al., 2014). LncRNA H19 is associated with the under-regulation of renal carcinoma cells (Wang et al., 2015). The expression of EGOT in breast cancer is much lower than one in adjacent noncancerous tissues (Broadbent et al., 2008). NEAT1 is overexpressed in prostate cancer cells (Pasmant et al., 2011). The identification of lncRNA-disease associations (LDAs) helps us to further understand the biological processes and the molecular mechanisms of various complex diseases. However, the number of known and experimentally validated LDAs is very small. Thus, it is important to identify potential LDAs. Determining LDAs through in vivo experiments is costly and time-consuming, therefore, it is necessary to design efficient computational approaches for identifying potential LDAs (Meng et al., 2021; Peng et al., 2022d). Computational LDA prediction methods are categorized as biological network-based methods and machine learning-based methods.

Biological network-based methods use network algorithms for association prediction (Liu et al., 2023a). This type of method first constructs heterogeneous networks of lncRNAs and diseases and then identifies LDAs via matrix decomposition, random walk, and so on. To predict potential LDAs, LRWRHLDA combined Laplace normalized random walk with restart (Wang et al., 2022), LDGRNMF used graph regularized nonnegative matrix factorization (Wang et al., 2021), DSCMF developed a dual sparse collaborative matrix factorization approach (Liu et al., 2021a), RWSF-BLP added random walk-based multi-similarity fusion to bidirectional label propagation (Xie et al., 2021), HBRWRLDA utilized bi-random walk on hypergraphs (Xie et al., 2022), and MHRWRLDA exploited a random walk model with restart through multiplex and heterogeneous networks (Yao et al., 2021).

With the fast advance of RNA sequencing technologies, artificial intelligence has obtained wide applications in biomedical data analysis (Peng et al., 2023a; Peng et al., 2023b; Xu et al., 2023). Notably, artificial intelligence technologies, especially machine learning methods, have been widely applied to predict miRNA-disease associations (Liu et al., 2022) and circRNA-disease associations (Liu et al., 2023b). To find new LDAs, HGATLDA developed a novel heterogeneous graph attention network model (Zhao et al., 2022), DeepMNE extracted multi-omics data and designed a deep multi-network embedding model (Ma, 2022), iLncDA-LTR is a rank-based method (Wu et al., 2022), MAGCNSE utilized a graph convolutional network (Liang et al., 2022), LDAformer extracted topological features and used a transformer encoder for LDA classification (Zhou et al., 2022), BiGAN explored a bidirectional generative adversarial network (Yang et al., 2021), and SVDNVLDA extracted linear and non-linear features and used an XGBoost for LDA prediction (Li et al., 2021).

Computational methods have found many potential LDAs, however, network-based methods were more likely to favor well-investigated lncRNAs or diseases and can not predict LDAs for new lncRNAs or new diseases. Machine learning-based methods failed to effectively integrate different kernels from multiple data sources. Thus, in this study, we developed a machine learning-based method named LDAenDL to detect potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM.

2 Materials and methods

As shown in Figure 1, LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network (GCN) (Kipf and Welling, 2016), graph attention network (GAT) (Velickovic et al., 2017), and convolutional neural network (Gu et al., 2018) to learn the biological features of lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network (DNN) and LightGBM to find new LDAs. Finally, LDAenDL was applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma.

FIGURE 1

FIGURE 1. The pipeline of LDAenDL.

2.1 Data preparation

We used two human LDA datasets that were provided by Chen et al. (2012) and Cui et al. (2018). Dataset 1 contains 605 LDAs between 157 diseases and 82 lncRNAs. Dataset 2 contains 1,529 LDAs between 190 diseases and 89 lncRNAs. An LDA network can be denoted as $Y \in R^{n \times m}$ where $y_{i j} = 1$ if lncRNA $l_{i}$ interacts with disease $d_{j}$ , otherwise, it equals 0.

2.2 Similarity computation

Inspired by the LDA-DLPU method (Peng et al., 2022a), we computed the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases. Based on the computed lncRNA similarity and disease similarity matrices, we learned the features of lncRNAs and diseases by combining a GCN, GAT, and CNN.

2.3 Feature learning

Dai et al. (2022) designed a hybrid graph representation learning model (GraphCDA) to represent the features of circRNAs and diseases and obtained better circRNA-disease association prediction performance. Inspired by GraphCDA proposed by Dai et al. (2022), we exploit a GraphCDA-based LDA feature learning model.

2.3.1 Graph convolutional network

A GCN was applied to obtain the feature representations of lncRNAs and diseases based on their similarity networks. For a GCN G, it is denoted as an adjacency matrix $S \in R^{N \times N}$ with $N$ nodes where each node can be described as an $F$ -dimensional vector. And GCN outputs node representation matrix $H^{n e w}$ in Eqs 1, 2:

$H^{n e w} = G C N (S, H) (1)$

$G C N (S, H) = σ (A^{- \frac{1}{2}} S^{'} A^{- \frac{1}{2}} H Q) (2)$

where $S^{'} = I + S$ , $A = \sum_{j} S_{i, j}^{'}$ and $Q \in R^{F \times F}$ denote degree matrix and trainable weight matrix, and σ(·) denotes a ReLU activation function.

2.3.2 Graph attention network

A GAT (Veličković et al., 2017) uses multi-head attention to set weights for all adjacent nodes based on their importance. LDAenDL introduces a GAT layer between two GCN layers to help the GCN to extract high-level features of lncRNAs and diseases.

For the GCN G, a GAT layer outputs node representations $H^{n e w}$ in Eq. 3:

$H^{n e w} = G A T (S, H) (3)$

For $K$ attention mechanisms in multi-head attention and its weight matrix $W_{k}$ , let $\vec{H_{i}}$ denote the input feature vector of the $i$ -th lncRNA, its feature representation ${\vec{H}}_{i}^{n e w}$ in $H^{n e w}$ can be denoted as Eq. 4:

${\vec{H}}_{i}^{n e w} = σ (\frac{1}{K} \sum_{k = 1}^{K} \sum_{j \neq i}^{n} ϕ_{i j}^{k} W_{k} {\vec{H}}_{i}) (4)$

where $ϕ_{i t}^{k}$ denotes the $k$ -th attention coefficients between two lncRNA nodes $i$ and $t$ :

$ϕ_{i j}^{k} = \frac{\exp (f (a_{k}^{T} [W_{k} {\vec{H}}_{i} ∥ W_{k} {\vec{H}}_{j} ∥ B_{k} S_{i j}]))}{\sum_{t \neq i} \exp (f (a_{k}^{T} [W_{k} {\vec{H}}_{i} ∥ W_{k} {\vec{H}}_{t} ∥ B_{k} S_{i t}]))} (5)$

where || denotes a concatenation operation, $f$ denotes the LeaklyReLU activation function, $a_{k} \in R^{2 F + 1}$ denotes a weight vector related to the $k$ -th attention mechanism, and $B_{k}$ denotes the weight of an edge $S_{i j}$ .

2.3.3 Feature representation of lncRNAs and diseases

For a lncRNA similarity network $G_{c}$ , its adjacency matrix $C$ , and node feature matrix $H_{c}^{(0)} \in R^{N_{c} \times F_{c}}$ , we alternately use GCN and GAT layers to obtain the graph feature representation of lncRNAs at different levels in Eq. 6:

$\{\begin{array}{c} H_{c}^{(1)} = G C N (C, H_{c}^{(0)}) \\ H_{c}^{(2)} = G A T (C, H_{c}^{(1)}) \\ H_{c}^{(3)} = G C N (C, H_{c}^{(2)}) \end{array} (6)$

Thus, a 1D CNN is used to produce the lncRNA feature representation matrix $X_{c}$ by combining the output features $H_{c}^{(1)}$ and $H_{c}^{(3)}$ in the different GCN layers.

Similarly, the graph feature representations of diseases at different levels are denoted by Eq. 7:

$\{\begin{array}{c} H_{d}^{(1)} = G C N (D, H_{d}^{(0)}) \\ H_{d}^{(2)} = G A T (D, H_{d}^{(1)}) \\ H_{d}^{(3)} = G C N (D, H_{d}^{(2)}) \end{array} (7)$

A 1D CNN is used to produce the disease feature representation matrix $X_{c}$ by combining the output features $H_{d}^{(1)}$ and $H_{d}^{(3)}$ in the different GCN layers.

2.3.4 Preference matrix construction

The preference matrix $U$ that describes all lncRNA-disease pairs can be represented as Eq. 8 based on $X_{c}$ and $X_{d}$ :

$U = {X_{c}}^{T} X_{d} (8)$

We used binary cross-entropy as the activation function to evaluate the difference between the preference matrix $U$ and the known adjacency matrix $R$ . By minimizing the loss function on two LDA datasets, the feature representation matrices $X_{c}$ and $X_{d}$ of lncRNAs and diseases are learned.

2.4 LDA prediction

2.4.1 DNN

We built a DNN to predict new LDAs based on known LDAs and the learned LDA features. The DNN contains an input layer, an output layer, and multiple hidden layers. In the input layer, there are F neurons that are the same as the number of LDA features.

Given an LDA sample $x$ , the input layer with $k$ inputs is represented by Eq. 9:

${x = [x}_{1}, x_{2}, \dots x_{k}] (9)$

where $x_{i}$ denotes the $i$ -th feature in a sample $x$ .

The hidden layer is represented by Eq. 10:

$h_{j} = \sum_{i = 1}^{k} w_{i} x_{i} + b_{j} (10)$

where $w_{i}$ and $b_{j}$ denote the weight of $x_{i}$ and the bias in the $j$ -th hidden layer, respectively.

The output in the $j$ -th hidden layer is denoted by Eq. 11:

$h = f (h_{j}) (11)$

where $f$ denotes a ReLU activation function. Finally, the output layer with the sigmoid function outputs the LDA prediction results in Eq. 12:

$σ (h) = \frac{1}{1 + e^{- h}} (12)$

2.4.2 LightGBM

In this section, we built a LightGBM (Ke et al., 2017) to identify new LDAs. For a training set $X = {\{(x_{i}, y_{i})\}}_{i = 1}^{n}$ with $n$ lncRNA-disease pair, LightGBM intends to build an approximation of $\hat{f}$ to a certain function $f (x)$ by minimizing the expected value of loss function $L (y, f (x))$ by Eq. 13:

$\hat{f} = \arg \min_{f} E_{x, y} [L (y, f (x))] (13)$

LightGBM integrates $T$ regression trees $\sum_{t = 1}^{T} f_{t} (X)$ to approximate the final model by Eq. 14:

$f_{T} (X) = \sum_{t = 1}^{T} f_{t} (X) (14)$

The regression trees are expressed as $w_{q (x)}, q \in \{1,2, \dots, J\}$ , where $J$ , $q$ , and $w$ denote the number of leaves, the decision rules of the tree, and the sample weight of leaf nodes, respectively.

At step $t$ , LightGBM is trained in an additive form:

$Γ_{t} = \sum_{i = 1}^{n} L (y_{i}, F_{t - 1} (x_{i}) + f_{t} (x_{i})) (15)$

The objective function (15) is rapidly approximated with Newton’s method (Sun et al., 2020).

To solve the objective function of LightGBM, we removed the constant term for simplicity, and model (15) can be represented as Eq. 16:

$Γ_{t} ≅ \sum_{i = 1}^{n} (g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})) (16)$

where $g_{i}$ and $h_{i}$ are the first-order and second-order gradients related to the loss function. Given the sample set $I_{j}$ related to leaf $j$ , Eq. 16 is transformed to Eq. 17:

$Γ_{t} = \sum_{j = 1}^{J} ((\sum_{i \in I_{j}} g_{i}) w_{j} + \frac{1}{2} (\sum_{i \in I_{j}} h_{i} + λ) w_{j}^{2}) (17)$

Given a certain tree structure $q (x)$ , for each leaf node $w_{j}^{*}$ , its optimal leaf weight and the extreme value of $Γ_{k}$ could be computed by Eq. 18:

$\begin{array}{c} w_{j}^{*} = - \frac{\sum_{i \in I_{j}} g_{i}}{\sum_{i \in I_{j}} h_{i} + λ} \\ Γ_{T}^{*} = - \frac{1}{2} \sum_{j = 1}^{J} \frac{{(\sum_{i \in I_{j}} g_{i})}^{2}}{\sum_{i \in I_{j}} h_{i} + λ} \end{array} (18)$

where $Γ_{T}^{*}$ is a scoring function used to evaluate the quality of a tree structure $q$ . Finally, Model (15) can be denoted as:

$G = \frac{1}{2} (\frac{{(\sum_{i \in I_{L}} g_{i})}^{2}}{\sum_{i \in I_{L}} h_{i} + λ} + \frac{{(\sum_{i \in I_{R}} g_{i})}^{2}}{\sum_{i \in I_{R}} h_{i} + λ} - \frac{{(\sum_{i \in I} g_{i})}^{2}}{\sum_{i \in I} h_{i} + λ}) (19)$

where $I_{L}$ and $I_{R}$ denote the example sets in the left and right subtrees of $q$ , respectively.

2.4.3 Ensemble learning

Through the solution of models (12) and (15), we can identify potential LDAs based on a DNN and LightGBM. Ensemble learning has better prediction accuracy than a single model. To further improve LDA prediction accuracy, we combined a DNN and LightGBM and developed an ensemble model for LDA identification through soft voting in Eq. 16:

$S c o r e = α C_{D N N} + β C_{L i g h t G B M} (20)$

where $C_{D N N}$ and $C_{L i g h t G B M}$ denote LDA prediction results from the DNN and LightGBM, respectively. $α$ and $β$ are their weights with values of 0.4 and 0.6, respectively. In particular, a lncRNA–disease pair is taken as an LDA if its association probability is greater than 0.5; otherwise, the pair is taken as a negative LDA.

3 Results

3.1 Evaluation metrics

In this article, we compared our proposed LDAenDL method with four LDA prediction methods, SDLDA, LDNFSGB, IPCAF, and LDASR. Precision, recall, accuracy, F1-score, AUC, and AUPR were used to compare the performance of LDAenDL with the four methods. The six metrics have been defined by Peng et al. (2022b) (Shen et al., 2022).

3.2 Comparison of LDAenDL with the other four methods

To implement the performance evaluation, inspired by the three cross-validations proposed by Zhou et al. (2021), we conducted cross-validations on lncRNAs (CV1), diseases (CV2), and lncRNA-disease pairs (CV3). Tables 1–3 give the precision, recall, accuracy, F1-score, AUC, and AUPR under CV1, CV2, and CV3 on two LDA datasets. In Tables 1–6, the bold font in each row denotes the best performance.

TABLE 1

TABLE 1. Comparison of LDAenDL with the other four methods under CV1.

TABLE 2

TABLE 2. Comparison of LDAenDL with the other four methods under CV2.

TABLE 3

TABLE 3. Comparison of LDAenDL with the other four methods under CV3.

TABLE 4

TABLE 4. Comparison of LDAenDL with individual models under CV1.

TABLE 5

TABLE 5. Comparison of LDAenDL with individual models under CV2.

TABLE 6

TABLE 6. Comparison of LDAenDL with individual models under CV3.

Under CV1, LDAenDL randomly took 80% of lncRNAs as training samples, and the rest were taken as test samples to investigate the LDA prediction ability for new lncRNAs. The results from Table 1 show that our proposed LDAenDL approach obtained the best precision, recall, accuracy, F1-score, AUC, and AUPR on two datasets under CV1 except that it computed slightly lower precision on Dataset 2 (0.9391 vs. 0.9399). It computed the highest AUPRs of 0.8903 and 0.9582, and far exceeded the AUPR values computed by SDLDA (i.e., 0.8461 and 0.9533).

Figure 2 shows the AUC and AUPR values computed by LDAenDL and the other four methods on two datasets under CV1. The results demonstrated that LDAenDL can discover possible diseases associated with a new lncRNA.

FIGURE 2

FIGURE 2. The AUC and AUPR values of five LDA prediction methods under CV1.

Under CV2, LDAenDL randomly took 80% of diseases as training samples, and the rest were taken as test samples to investigate the LDA prediction ability for new diseases. The results from Table 2 show that our proposed LDAenDL approach obtained better precision, AUC, and AUPR on two datasets under CV2. However, SDLDA computed higher recall, accuracy, and F1-score than LDAenDL, which may be caused by smaller disease samples.

Figure 3 shows the AUC and AUPR values computed by LDAenDL and the other four methods on two datasets under CV2. The results show that LDAenDL can be applied to screen possible lncRNAs associated with a new disease.

FIGURE 3

FIGURE 3. The AUC and AUPR values of five LDA prediction methods under CV2.

Under CV3, LDAenDL randomly took 80% of lncRNA-disease pairs as training samples, and the rest were taken as test samples to investigate the LDA prediction ability. The results from Table 3 show that our proposed LDAenDL approach obtained the best precision, recall, accuracy, F1-score, AUC, and AUPR on two datasets under CV3. It computed the highest AUCs of 0.9110 and 0.9708 and far exceeded those computed by SDLDA (i.e., 0.8774 and 0.9560). Furthermore, our LDAenDL approach computed the highest AUPRs of 0.9166 and 0.9743 and far exceeded those computed by SDLDA (i.e., 0.8952, and 0.9639).

Figure 4 shows the AUC and AUPR values computed by LDAenDL and the other four methods on two datasets under CV3. The results demonstrated that LDAenDL could find potential LDAs based on known LDAs.

FIGURE 4

FIGURE 4. The AUC and AUPR values of five LDA prediction methods under CV3.

3.3 Comparison of LDAenDL with individual models

To measure the effect of the ensemble algorithm on LDA prediction performance, we compared LDAenDL with two individual models, DNN, and LightGBM. Tables 4–6 show the precision, recall, accuracy, F1-score, AUC, and AUPR of the DNN, LightGBM, and LDAenDL under CV1, CV2, and CV3, respectively.

Under CV1, as shown in Table 4, LDAenDL outperformed the DNN and LightGBM on two LDA datasets for the majority of conditions. LDAenDL computed the best accuracy and F1-score on the two datasets. Although LDAenDL computed slightly lower AUC value than the DNN on dataset 1, and still slightly lower AUC than LightGBM on dataset 2, their differences were very small. For example, the DNN computed an AUC of 0.8712 while LDAenDL computed 0.8701 on dataset 1, and the DNN calculated an AUC of 0.9497 while LDAenDL calculated 0.9490 on dataset 2. LDAenDL obtained the best AUPR on dataset 1, and LightGBM obtained an AUPR of 0.9586 while LDAenDL obtained an AUPR of 0.9582.

Under CV2, as shown in Table 5, LDAenDL outperformed the DNN under all conditions on two LDA datasets. Recall, accuracy, and F1-score computed by LightGBM were slightly better than LDAenDL on the two datasets. But it calculated the best AUC and AUPR on dataset 1.

Under CV3, as shown in Table 6, LDAenDL computed the highest precision, recall, accuracy, F1-score, AUC, and AUPR on the two LDA datasets except that it computed a slightly lower recall on dataset 1. The results demonstrate that LDAenDL is appropriate to predict possible LDAs from unknown lncRNA-disease pairs.

3.4 Case study

3.4.1 Identifying possible lncRNA biomarkers for lung cancer

Lung cancer is one of the most prevalent causes of mortality globally. It mainly contains small cell lung cancer and non-small cell lung cancer. Targeted drug therapy is its one therapeutic option (Lahiri et al., 2023). We used the proposed LDAenDL method to predict possible lncRNA biomarkers for lung cancer. Table 7 shows the predicted top 20 lncRNA biomarkers for lung cancer. The 20 lncRNA biomarkers associated with lung cancer have no known association information with lung cancer in the two datasets.

TABLE 7

TABLE 7. The predicted top 20 lncRNA biomarkers for lung cancer in each of the two datasets.

In dataset 1, LDAenDL predicted that CCDC26 could be associated with lung cancer. CCDC26 can enhance thyroid cancer malignant progression (Ma et al., 2021). It promotes imatinib resistance in human gastrointestinal stromal tumors (Yan et al., 2019). Its inhibition could increase the sensitivity of doxorubicin in MDR-CML cells (Liu et al., 2021b). In this study, we predicted that CCDC26 could be associated with lung cancer in dataset 1.

In dataset 2, LDAenDL predicted that IFNG-AS1 could be associated with lung cancer. IFNG-AS1 has been reported in long-lasting memory T cells (Castellucci et al., 2021). It can boost interferon gamma generation in human natural killer cells (Stein et al., 2019). We identified that IFNG-AS1 could be associated with lung cancer in Dataset 2.

Figure 5 shows the top 20 predicted lncRNAs associated with lung cancer in each of the two datasets. Yellow solid lines and blue solid lines denote lncRNA-lung cancer associations confirmed by the literatures among the predicted top 20 associations on datasets 1 and 2, respectively. Grey solid lines denote the predicted and co-occurring lncRNA-lung cancer associations that can be confirmed by the literatures in the two datasets, and grey dashed lines denote the predicted and unconfirmed lncRNA-lung cancer associations in the two datasets. The repeated lncRNAs in the two datasets have been removed.

FIGURE 5

FIGURE 5. The top 20 predicted lncRNA biomarkers for lung cancer in each of the two datasets (The repeated lncRNAs in the two datasets have been removed). This figure was drawn using Cytoscape (Shannon et al., 2003).

3.4.2 Identifying possible lncRNAs associated with PDL1 for lung cancer

Recent advances in lung cancer treatment have demonstrated significant responses in patients when they were treated with programmed death-1/programmed death-ligand 1 (PD-1/PD-L1) checkpoint blockade immunotherapies (Lahiri et al., 2023). To find possible lncRNAs associated with PDL1 for lung cancer, inspired by LPI-DLDN proposed by Peng et al. (2022a), we first downloaded the sequence of PDL1 from the UniProt database. Next, we extracted the biological features of PDL1 and depicted PDL1 as a 10,029-dimensional vector using BioTriangle. Finally, we used cosine similarity to compute the similarities between PDL1 and the other proteins in a lncRNA-protein interaction dataset (Li et al., 2015) and found the top 3 proteins with the highest interaction probabilities with PDL1. The results show that SNHG3 has a higher interaction probability with PDL1 and has been reported to be associated with lung cancer.

3.4.3 Identifying possible lncRNA biomarkers for neuroblastoma

Neuroblastoma is the most frequent pediatric solid tumor and accounts for approximately 15% of childhood cancer-related mortality (Zafar et al., 2021). We used the proposed LDAenDL method to identify possible lncRNA biomarkers for neuroblastoma. Table 8 shows the top 20 predicted lncRNA biomarkers for neuroblastoma in each of the two datasets. The repeated lncRNAs in the two datasets have been removed.

TABLE 8

TABLE 8. The top 20 predicted lncRNA biomarkers for neuroblastoma in each of the two datasets.

In dataset 1, we predicted that HOTAIR could be associated with neuroblastoma with the highest probability. HOTAIR is a novel oncogenic biomarker in human cancer (Rajagopal et al., 2020). Its knockdown can promote radiosensitivity in colorectal cancer (Liu et al., 2020). It also can enhance the carcinogenesis of gastric (Zhang et al., 2020). We identified that HOTAIR may be one biomarker of neuroblastoma in dataset 1.

In dataset 2, we predicted that BDNF-AS could be associated with neuroblastoma with the highest probability. PABPC1-induced stabilization of BDNF-AS helps the inhibition of malignant progression in glioblastoma cells (Su et al., 2020). It can regulate the miR-9-5p/BACE1 pathway that affects neurotoxicity in Alzheimer’s disease (Ding et al., 2022). We identified that BDNF-AS is a possible biomarker of neuroblastoma in dataset 2.

Figure 6 shows the top 20 predicted lncRNAs associated with neuroblastoma in each of the two datasets. Yellow solid lines and blue solid lines denote lncRNA-neuroblastoma associations confirmed by the literatures among the predicted top 20 associations on datasets 1 and 2, respectively. Grey solid lines denote the predicted and co-occurring lncRNA-neuroblastoma associations that can be confirmed by the literatures in the two datasets, and grey dashed lines denote the predicted and unconfirmed lncRNA-neuroblastoma associations in the two datasets. The repeated lncRNAs in the two datasets have been removed.

FIGURE 6

FIGURE 6. The top 20 predicted lncRNA biomarkers for neuroblastoma in each of the two datasets. (The repeated lncRNAs in the two datasets have been removed). This figure was drawn using Cytoscape (Shannon et al., 2003).

4 Conclusion

Lung cancer and neuroblastoma are two human diseases that severely affect the human body. Detecting new biomarkers for them contributes to their diagnosis and therapy. Experimental biomarker identification methods are costly and laborious. Thus, we developed a machine learning-based method named LDAenDL to predict possible lncRNA biomarkers for the two diseases based on an ensemble of a deep neural network and LightGBM. LDAenDL first computed lncRNA similarity and disease similarity and then combined a GCN, GAT, and CNN to learn the biological features of lncRNAs and diseases. Finally, these features were fed to a DNN and LightGBM to find new LDAs.

LDAenDL was compared with the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). The results showed that LDAenDL computed the best AUCs and AUPRs under three cross-validations on two LDA datasets, demonstrating the optimal LDA prediction performance of LDAenDL. We further identified possible lncRNA biomarkers for lung cancer and neuroblastoma. The results demonstrated that CCDC26 and IFNG-AS1 may be new biomarkers for lung cancer, SNHG3 may be associated with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers for neuroblastoma.

In the future, we will combine data from multiple sources, for example, miRNA, circRNA, and drugs, to improve LDA identification performance. We will also design a new deep-learning model to efficiently extract the biological features of lncRNAs and diseases for LDA prediction. We hope that the proposed LDAenDL can help the development of targeted therapies for these two diseases.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

Author contributions

Conceptualization: ZS, HL, ZL, and LD; Investigation: ZS and HL; Methodology: ZS, HL, ZL, and LD; Project administration: YW and LD; Software: ZS and ZL; Writing-original draft: ZS and HL; Writing-review and editing: ZS, HL, ZL, and LD. All authors contributed to the article and approved the submitted version.

Conflict of interest

Author YW was employed by Geneis (Beijing) Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Bertone, P., Stolc, V., Royce, T. E., Rozowsky, J. S., Urban, A. E., Zhu, X., et al. (2004). Global identification of human transcribed sequences with genome tiling arrays. Science 306 (5705), 2242–2246. doi:10.1126/science.1103388

PubMed Abstract | CrossRef Full Text | Google Scholar

Broadbent, H. M., Peden, J. F., Lorkowski, S., Goel, A., Ongen, H., Green, F., et al. (2008). Susceptibility to coronary artery disease and diabetes is encoded by distinct, tightly linked SNPs in the ANRIL locus on chromosome 9p. Hum. Mol. Genet. 17 (6), 806–814. doi:10.1093/hmg/ddm352

PubMed Abstract | CrossRef Full Text | Google Scholar

Castellucci, L. C., Almeida, L., Cherlin, S., Fakiola, M., Francis, R. W., Carvalho, E. M., et al. (2021). A genome-wide association study identifies SERPINB10, CRLF3, STX7, LAMP3, IFNG-AS1, and KRT80 as risk loci contributing to cutaneous leishmaniasis in Brazil. Clin. Infect. Dis. 72 (10), e515–e525. doi:10.1093/cid/ciaa1230

PubMed Abstract | CrossRef Full Text | Google Scholar

Chakravarty, D., Sboner, A., Nair, S. S., Giannopoulou, E., Li, R., Hennig, S., et al. (2014). The oestrogen receptor alpha-regulated lncRNA NEAT1 is a critical modulator of prostate cancer. Nat. Commun. 5 (1), 5383. doi:10.1038/ncomms6383

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, G., Wang, Z., Wang, D., Qiu, C., Liu, M., Chen, X., et al. (2012). LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic acids Res. 41 (D1), D983–D986. doi:10.1093/nar/gks1099

PubMed Abstract | CrossRef Full Text | Google Scholar

Cui, T., Zhang, L., Huang, Y., Yi, Y., Tan, P., Zhao, Y., et al. (2018). MNDR v2. 0: an updated resource of ncRNA–disease associations in mammals. Nucleic acids Res. 46 (D1), D371–D374. doi:10.1093/nar/gkx1025

PubMed Abstract | CrossRef Full Text | Google Scholar

Dai, Q., Liu, Z., Wang, Z., Duan, X., and Guo, M. (2022). GraphCDA: a hybrid graph representation learning framework based on GCN and GAT for predicting disease associated circRNAs. Briefings in Bioinformatics 23 (5), bbac379. doi:10.1093/bib/bbac379

PubMed Abstract | CrossRef Full Text | Google Scholar

Ding, Y., Luan, W., Wang, Z., and Cao, Y. (2022). LncRNA BDNF-AS as ceRNA regulates the miR-9-5p/BACE1 pathway affecting neurotoxicity in Alzheimer's disease. Archives Gerontology Geriatrics 99, 104614. doi:10.1016/j.archger.2021.104614

CrossRef Full Text | Google Scholar

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., et al. (2018). Recent advances in convolutional neural networks. Pattern Recognit. 77, 354–377. doi:10.1016/j.patcog.2017.10.013

CrossRef Full Text | Google Scholar

He, X., Tan, X., Wang, X., Jin, H., Liu, L., Ma, L., et al. (2014). C-Myc-activated long noncoding RNA CCAT1 promotes colon cancer cell proliferation and invasion. Tumor Biol. 35, 12181–12188. doi:10.1007/s13277-014-2526-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). Lightgbm: a highly efficient gradient boosting decision tree. Adv. neural Inf. Process. Syst. 30. doi:10.5555/3294996.3295074

CrossRef Full Text | Google Scholar

Kipf, T. N., and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

Google Scholar

Klattenhoff, C. A., Scheuermann, J. C., Surface, L. E., Bradley, R. K., Fields, P. A., Steinhauser, M. L., et al. (2013). Braveheart, a long noncoding RNA required for cardiovascular lineage commitment. Cell. 152 (3), 570–583. doi:10.1016/j.cell.2013.01.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Lahiri, A., Maji, A., Potdar, P. D., Singh, N., Parikh, P., Bisht, B., et al. (2023). Lung cancer immunotherapy: progress, pitfalls, and promises. Mol. Cancer 22 (1), 40–37. doi:10.1186/s12943-023-01740-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Lanjanian, H., Nematzadeh, S., Hosseini, S., Torkamanian-Afshar, M., Kiani, F., Moazzam-Jazi, M., et al. (2021). High-throughput analysis of the interactions between viral proteins and host cell RNAs. Comput. Biol. Med. 135, 104611. doi:10.1016/j.compbiomed.2021.104611

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, A., Ge, M., Zhang, Y., Peng, C., and Wang, M. (2015). Predicting long noncoding RNA and protein interactions using heterogeneous network model. BioMed Res. Int. 2015, 671950. doi:10.1155/2015/671950

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, J., Li, J., Kong, M., Wang, D., Fu, K., and Shi, J. (2021). Svdnvlda: predicting lncRNA-disease associations by singular value decomposition and node2vec. BMC Bioinforma. 22, 538. doi:10.1186/s12859-021-04457-1

CrossRef Full Text | Google Scholar

Liang, Y., Zhang, Z. Q., Liu, N. N., Wu, Y. N., Gu, C. L., and Wang, Y. L. (2022). Magcnse: predicting lncRNA-disease associations using multi-view attention graph convolutional network and stacking ensemble model. BMC Bioinforma. 23 (1), 189. doi:10.1186/s12859-022-04715-w

CrossRef Full Text | Google Scholar

Liu, Y., Chen, X., Chen, X., Liu, J., Gu, H., Fan, R., et al. (2020). Long non-coding RNA HOTAIR knockdown enhances radiosensitivity through regulating microRNA-93/ATG12 axis in colorectal cancer. Cell. Death Dis. 11 (3), 175. doi:10.1038/s41419-020-2268-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, J. X., Gao, M. M., Cui, Z., Gao, Y. L., and Li, F. (2021a). Dscmf: prediction of LncRNA-disease associations based on dual sparse collaborative matrix factorization. BMC Bioinforma. 22 (3), 241. doi:10.1186/s12859-020-03868-w

CrossRef Full Text | Google Scholar

Liu, Z., Wang, Y., Xu, Z., Yuan, S., Ou, Y., Luo, Z., et al. (2021b). Analysis of ceRNA networks and identification of potential drug targets for drug-resistant leukemia cell K562/ADR. PeerJ 9, e11429. doi:10.7717/peerj.11429

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, W., Lin, H., Huang, L., Peng, L., Tang, T., Zhao, Q., et al. (2022). Identification of miRNA–disease associations via deep forest ensemble learning based on autoencoder. Briefings Bioinforma. 23 (3), bbac104. doi:10.1093/bib/bbac104

CrossRef Full Text | Google Scholar

Liu, W., Yang, Y., Lu, X., Fu, X., Sun, R., Yang, L., et al. (2023a). Nsrgrn: a network structure refinement method for gene regulatory network inference. Briefings Bioinforma. 24 (3), bbad129. doi:10.1093/bib/bbad129

CrossRef Full Text | Google Scholar

Liu, W., Tang, T., Lu, X., Fu, X., Yang, Y., and Peng, L. (2023b). Mpclcda: predicting circRNA–disease associations by using automatically selected meta-path and contrastive learning. Briefings Bioinforma. 24, bbad227. doi:10.1093/bib/bbad227

CrossRef Full Text | Google Scholar

Ma, X., Li, Y., Song, Y., and Xu, G. (2021). Long noncoding RNA CCDC26 promotes thyroid cancer malignant progression via miR-422a/EZH2/Sirt6 axis. OncoTargets Ther. 14, 3083–3094. doi:10.2147/OTT.S282011

CrossRef Full Text | Google Scholar

Ma, Y. (2022). Deepmne: deep multi-network embedding for lncRNA-disease association prediction. IEEE J. Biomed. Health Inf. 26 (7), 3539–3549. doi:10.1109/JBHI.2022.3152619

CrossRef Full Text | Google Scholar

Meng, J., Kang, Q., Chang, Z., and Luan, Y. (2021). PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles. BMC Bioinforma. 22 (3), 242. doi:10.1186/s12859-020-03870-2

CrossRef Full Text | Google Scholar

Pasmant, E., Sabbagh, A., Vidaud, M., and Bièche, I. (2011). ANRIL, a long, noncoding RNA, is an unexpected major hotspot in GWAS. FASEB J. 25 (2), 444–448. doi:10.1096/fj.10-172452

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, L., Huang, L., Lu, Y., Liu, G., Chen, M., and Han, G. (2022a). “Identifying possible lncRNA-disease associations based on deep learning and positive-unlabeled learning,” in 2022 IEEE international conference on bioinformatics and biomedicine (BIBM) (IEEE), 168–173.

CrossRef Full Text | Google Scholar

Peng, L., Tan, J., Tian, X., and Zhou, L. (2022b). EnANNDeep: an ensemble-based lncRNA–protein interaction prediction framework with adaptive k-nearest neighbor classifier and deep models. Interdiscip. Sci. Comput. Life Sci. 14 (1), 209–232. doi:10.1007/s12539-021-00483-y

CrossRef Full Text | Google Scholar

Peng, L., Wang, C., Tian, X., Zhou, L., and Li, K. (2022c). Finding lncrna-protein interactions based on deep learning with dual-net neural architecture. IEEE/ACM Trans. Comput. Biol. Bioinforma. 19 (6), 3456–3468. doi:10.1109/TCBB.2021.3116232

CrossRef Full Text | Google Scholar

Peng, L., Wang, F., Wang, Z., Tan, J., Huang, L., Tian, X., et al. (2022d). Cell–cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: data resources and computational strategies. Briefings Bioinforma. 23 (4), bbac234. doi:10.1093/bib/bbac234

CrossRef Full Text | Google Scholar

Peng, L., Tan, J., Xiong, W., Zhang, L., Wang, Z., Yuan, R., et al. (2023a). Deciphering ligand–receptor-mediated intercellular communication based on ensemble deep learning and the joint scoring strategy from single-cell transcriptomic data. Comput. Biol. Med. 16 (2023), 107137. doi:10.1016/j.compbiomed.2023.107137

CrossRef Full Text | Google Scholar

Peng, L., Yuan, R., Han, C., Han, G., Tan, J., Wang, Z., et al. (2023b). CellEnBoost: a boosting-based ligand-receptor interaction identification model for cell-to-cell communication inference. IEEE Trans. NanoBioscience, 1–11. doi:10.1109/TNB.2023.3278685

CrossRef Full Text | Google Scholar

Rajagopal, T., Talluri, S., Akshaya, R. L., and Dunna, N. R. (2020). HOTAIR LncRNA: a novel oncogenic propellant in human cancer. Clin. Chim. acta 503, 1–18. doi:10.1016/j.cca.2019.12.028

PubMed Abstract | CrossRef Full Text | Google Scholar

Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., et al. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13 (11), 2498–2504. doi:10.1101/gr.1239303

PubMed Abstract | CrossRef Full Text | Google Scholar

Shen, L., Liu, F., Huang, L., Liu, G., Zhou, L., and Peng, L. (2022). VDA-RWLRLS: an anti-SARS-CoV-2 drug prioritizing framework combining an unbalanced bi-random walk and Laplacian regularized least squares. Comput. Biol. Med. 140, 105119. doi:10.1016/j.compbiomed.2021.105119

CrossRef Full Text | Google Scholar

Stein, N., Berhani, O., Schmiedel, D., Duev-Cohen, A., Seidel, E., Kol, I., et al. (2019). IFNG-AS1 enhances interferon gamma production in human natural killer cells. Iscience 11, 466–473. doi:10.1016/j.isci.2018.12.034

PubMed Abstract | CrossRef Full Text | Google Scholar

Su, R., Ma, J., Zheng, J., Liu, X., Liu, Y., Ruan, X., et al. (2020). PABPC1-induced stabilization of BDNF-AS inhibits malignant progression of glioblastoma cells through STAU1-mediated decay. Cell. Death Dis. 11 (2), 81. doi:10.1038/s41419-020-2267-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, X., Liu, M., and Sima, Z. (2020). A novel cryptocurrency price trend forecasting model based on LightGBM. Finance Res. Lett. 32, 101084. doi:10.1016/j.frl.2018.12.032

CrossRef Full Text | Google Scholar

Tan, L., Yu, J. T., Hu, N., and Tan, L. (2013). Non-coding RNAs in Alzheimer's disease. Mol. Neurobiol. 47, 382–393. doi:10.1007/s12035-012-8359-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.

Google Scholar

Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. stat 1050 (20), 10–48550. doi:10.48550/arXiv.1710.10903

CrossRef Full Text | Google Scholar

Wang, L., Cai, Y., Zhao, X., Jia, X., Zhang, J., Liu, J., et al. (2015). Down-regulated long non-coding RNA H19 inhibits carcinogenesis of renal cell carcinoma. Neoplasma 62 (3), 412–418. doi:10.4149/neo_2015_049

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, M. N., You, Z. H., Wang, L., Li, L. P., and Zheng, K. (2021). Ldgrnmf: lncRNA-disease associations prediction based on graph regularized non-negative matrix factorization. Neurocomputing 424, 236–245. doi:10.1016/j.neucom.2020.02.062

CrossRef Full Text | Google Scholar

Wang, L., Shang, M., Dai, Q., and He, P. A. (2022). Prediction of lncRNA-disease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks. BMC Bioinforma. 23 (1), 5–20. doi:10.1186/s12859-021-04538-1

CrossRef Full Text | Google Scholar

Wu, H., Liang, Q., Zhang, W., Zou, Q., Hesham, A. E. L., and Liu, B. (2022). iLncDA-LTR: identification of lncRNA-disease associations by learning to rank. Comput. Biol. Med. 146, 105605. doi:10.1016/j.compbiomed.2022.105605

PubMed Abstract | CrossRef Full Text | Google Scholar

Xie, G., Huang, B., Sun, Y., Wu, C., and Han, Y. (2021). RWSF-BLP: a novel lncRNA-disease association prediction model using random walk-based multi-similarity fusion and bidirectional label propagation. Mol. Genet. Genomics 296, 473–483. doi:10.1007/s00438-021-01764-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Xie, G., Zhu, Y., Lin, Z., Sun, Y., Gu, G., Li, J., et al. (2022). Hbrwrlda: predicting potential lncRNA–disease associations based on hypergraph bi-random walk with restart. Mol. Genet. Genomics 297 (5), 1215–1228. doi:10.1007/s00438-022-01909-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, J., Xu, J., Meng, Y., Lu, C., Cai, L., Zeng, X., et al. (2023). Graph embedding and Gaussian mixture variational autoencoder network for end-to-end analysis of single-cell RNA sequencing data. Cell. Rep. Methods 3, 100382. doi:10.1016/j.crmeth.2022.100382

PubMed Abstract | CrossRef Full Text | Google Scholar

Yan, J., Chen, D., Chen, X., Sun, X., Dong, Q., Hu, C., et al. (2019). Downregulation of lncRNA CCDC26 contributes to imatinib resistance in human gastrointestinal stromal tumors through IGF-1R upregulation. Braz. J. Med. Biol. Res. 52, e8399. doi:10.1590/1414-431x20198399

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, Q., and Li, X. (2021). BiGAN: lncRNA-disease association prediction based on bidirectional generative adversarial network. BMC Bioinforma. 22, 357. doi:10.1186/s12859-021-04273-7

CrossRef Full Text | Google Scholar

Yang, M., Zhao, L., Hu, X., Feng, H., and Kang, X. (2021). Identification of key mRNAs and lncRNAs associated with the effects of anti-TWEAK on osteosarcoma. Curr. Bioinforma. 16 (1), 154–161. doi:10.2174/1574893615999200626191405

CrossRef Full Text | Google Scholar

Yao, Y., Ji, B., Lv, Y., Li, L., Xiang, J., Liao, B., et al. (2021). Predicting LncRNA–disease association by a random walk with restart on multiplex and heterogeneous networks. Front. Genet. 12, 712170. doi:10.3389/fgene.2021.712170

PubMed Abstract | CrossRef Full Text | Google Scholar

Zafar, A., Wang, W., Liu, G., Wang, X., Xian, W., McKeon, F., et al. (2021). Molecular targeting therapies for neuroblastoma: progress and challenges. Med. Res. Rev. 41 (2), 961–1021. doi:10.1002/med.21750

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, E. B., Yin, D. D., Sun, M., Kong, R., Liu, X. H., You, L. H., et al. (2014). P53-regulated long non-coding RNA TUG1 affects cell proliferation in human non-small cell lung cancer, partly through epigenetically regulating HOXB7 expression. Cell. death Dis. 5 (5), e1243. doi:10.1038/cddis.2014.201

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, J., Qiu, W. Q., Zhu, H., Liu, H., Sun, J. H., Chen, Y., et al. (2020). HOTAIR contributes to the carcinogenesis of gastric cancer via modulating cellular and exosomal miRNAs level. Cell. death Dis. 11 (9), 780. doi:10.1038/s41419-020-02946-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhao, X., Zhao, X., and Yin, M. (2022). Heterogeneous graph attention network based on meta-paths for lncrna–disease association prediction. Briefings Bioinforma. 23 (1), bbab407. doi:10.1093/bib/bbab407

CrossRef Full Text | Google Scholar

Zhou, L., Wang, Z., Tian, X., and Peng, L. (2021). LPI-deepGBDT: a multiple-layer deep framework based on gradient boosting decision trees for lncRNA–protein interaction identification. BMC Bioinforma. 22 (1), 479. doi:10.1186/s12859-021-04399-8

CrossRef Full Text | Google Scholar

Zhou, Y., Wang, X., Yao, L., and Zhu, M. (2022). LDAformer: predicting lncRNA-disease associations based on topological feature extraction and transformer encoder. Briefings Bioinforma. 23 (6), bbac370. doi:10.1093/bib/bbac370

CrossRef Full Text | Google Scholar

Keywords: lncRNA, biomarker, lung cancer, neuroblastoma, deep neural network, LightGBM

Citation: Su Z, Lu H, Wu Y, Li Z and Duan L (2023) Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM. Front. Genet. 14:1238095. doi: 10.3389/fgene.2023.1238095

Received: 10 June 2023; Accepted: 19 July 2023;
Published: 16 August 2023.

Edited by:

Junlin Xu, Hunan University, China

Reviewed by:

XianFang Tang, Wuhan Textile University, China
Wenyan Wang, Anhui University of Technology, China

Copyright © 2023 Su, Lu, Wu, Li and Duan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Zejun Li, lzjfox@hnit.edu.cn; Lian Duan, duanlian301@163.com

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.