Graph Embedding Methods for Multiple-Omics Data Analysis

Editorial

20 September 2021

Editorial: Graph Embedding Methods for Multiple-Omics Data Analysis

Wei Lan

Qingfeng Chen

Yi-Ping Phoebe Chen

and

Wilson Wen Bin Goh

1,841 views

0 citations

Editors

Nanyang Technological University

Impact

Bootstrapped R2 scores for genes involved in the renin-angiotensin system for lung, heart (left ventricle), kidney (cortex), and pancreas. The input variables are the expressions of genes in whole blood belonging to the chemokine, TNF, and TGF-β pathways.

Original Research

16 September 2021

Graph Representation Forecasting of Patient's Medical Conditions: Toward a Digital Twin

Pietro Barbiero

, 1 more and

Pietro Lió

Objective: Modern medicine needs to shift from a wait and react, curative discipline to a preventative, interdisciplinary science aiming at providing personalized, systemic, and precise treatment plans to patients. To this purpose, we propose a “digital twin” of patients modeling the human body as a whole and providing a panoramic view over individuals' conditions.

Methods: We propose a general framework that composes advanced artificial intelligence (AI) approaches and integrates mathematical modeling in order to provide a panoramic view over current and future pathophysiological conditions. Our modular architecture is based on a graph neural network (GNN) forecasting clinically relevant endpoints (such as blood pressure) and a generative adversarial network (GAN) providing a proof of concept of transcriptomic integrability.

Results: We tested our digital twin model on two simulated clinical case studies combining information at organ, tissue, and cellular level. We provided a panoramic overview over current and future patient's conditions by monitoring and forecasting clinically relevant endpoints representing the evolution of patient's vital parameters using the GNN model. We showed how to use the GAN to generate multi-tissue expression data for blood and lung to find associations between cytokines conditioned on the expression of genes in the renin–angiotensin pathway. Our approach was to detect inflammatory cytokines, which are known to have effects on blood pressure and have previously been associated with SARS-CoV-2 infection (e.g., CXCR6, XCL1, and others).

Significance: The graph representation of a computational patient has potential to solve important technological challenges in integrating multiscale computational modeling with AI. We believe that this work represents a step forward toward next-generation devices for precision and predictive medicine.

15,614 views

45 citations

An example of bipartite graph for drug–target interactions. (A) Is the part of visualization of the Enzyme in benchmark data sets, in which the red nodes are drugs and the green nodes are targets. (B) Is the bipartite graph model of a part of Figure 1A.

Original Research

24 August 2021

Cascade Deep Forest With Heterogeneous Similarity Measures for Drug–Target Interaction Prediction

Ying Zheng

and

Zheng Wu

2,297 views

2 citations

Potential lipid biomarkers of basal and luminal MIBC subtypes. (A) VIP score of altered lipid elements. (B) Heatmap of the top 25 altered lipid elements in basal and luminal MIBC subtypes. (C) The levels of the top 10 significantly differential lipid constituents in basal and luminal MIBC subtypes. (D,E) FFA and SL levels and AUC values. * indicates p < 0.05; ** indicates p < 0.01; and *** indicates p < 0.001.

Original Research

16 August 2021

Integrative Transcriptomic, Lipidomic, and Metabolomic Analysis Reveals Potential Biomarkers of Basal and Luminal Muscle Invasive Bladder Cancer Subtypes

Chao Feng

, 9 more and

Tianyu Li

2,869 views

5 citations

Original Research

27 July 2021

RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest

Yuan Zhao

, 4 more and

Hong-Dong Li

3,568 views

3 citations

The workflow of MI_DenseNetCAM. (A) Cancer classification and prediction through MI and deep learning combined analysis from pan-cancer datasets. (B) Principle diagram of the Guided Grad-Cam algorithm.

Original Research

03 June 2021

MI_DenseNetCAM: A Novel Pan-Cancer Classification and Prediction Method Based on Mutual Information and Deep Learning Model

Jianlin Wang

, 4 more and

Junwei Luo

3,426 views

10 citations

Original Research

28 June 2021

Investigating Different DNA Methylation Patterns at the Resolution of Methylation Haplotypes

Xiaoqing Peng

, 3 more and

Xiaojun Ding

3,209 views

6 citations

Original Research

02 July 2021

Transcriptomic Analysis of Gene Networks Regulated by U11 Small Nuclear RNA in Bladder Cancer

Zhenxing Wang

, 9 more and

Zhong Tang

2,459 views

5 citations

Methods

28 June 2021

A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data

Guojun Liu

and

Junying Zhang

2,665 views

0 citations

Original Research

27 May 2021

m6AGE: A Predictor for N6-Methyladenosine Sites Identification Utilizing Sequence Characteristics and Graph Embedding-Based Geometrical Information

Yan Wang

, 4 more and

Kai He

2,597 views

15 citations

Original Research

03 May 2021

Predicting Drug-Disease Association Based on Ensemble Strategy

Jianlin Wang

, 3 more and

Ge Zhang

3,462 views

17 citations

Original Research

13 April 2021

Predicting Metabolite–Disease Associations Based on LightGBM Model

Cheng Zhang

, 1 more and

Lian Liu

3,346 views

24 citations

Classification accuracy for six gene expression data sets with different values of σ.

Original Research

30 March 2021

Feature Selection Using Approximate Conditional Entropy Based on Fuzzy Information Granule for Gene Expression Data Classification

Hengyi Zhang

2,069 views

11 citations

Accuracy and timing experiments on two benchmark datasets. (A) Model performance with respect to the coarsening level on PPI network dataset. (B) Model performance with respect to the coarsening level on the GraphSAGE-PPI dataset. (C) Model performance about fusion parameter on PPI network dataset. (D) Model performance about fusion parameter on the GraphSAGE-PPI dataset.

Original Research

26 February 2021

An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding

Xiao-Rui Su

, 4 more and

Hai-Cheng Yi

3,028 views

10 citations

Model architecture of DeCban. The network consists of three convolutional layers and three branches (shown in green, orange, and red, respectively). An attention layer (shown in light blue) is used to integrate the outputs of the three branches. Then, the feature embeddings learned by the three layers are concatenated and fed to a fully connected layer to yield the final output.

Methods

22 January 2021

DeCban: Prediction of circRNA-RBP Interaction Sites by Using Double Embeddings and Cross-Branch Attention Networks

Liangliang Yuan

and

Yang Yang

5,530 views

16 citations

Visualization of one-hot encoded functional features and the learned embeddings. (A) One-hot encode representation of functional data KEGG/GO terms; (B) the learned network embeddings from a protein-protein network; (C) the learned embeddings of functional data using word2vec; (D) the combined network and functional embeddings.

Original Research

20 January 2021

Identification of Protein Subcellular Localization With Network and Functional Embeddings

Xiaoyong Pan

, 5 more and

Yu-Dong Cai

The functions of proteins are mainly determined by their subcellular localizations in cells. Currently, many computational methods for predicting the subcellular localization of proteins have been proposed. However, these methods require further improvement, especially when used in protein representations. In this study, we present an embedding-based method for predicting the subcellular localization of proteins. We first learn the functional embeddings of KEGG/GO terms, which are further used in representing proteins. Then, we characterize the network embeddings of proteins on a protein–protein network. The functional and network embeddings are combined as novel representations of protein locations for the construction of the final classification model. In our collected benchmark dataset with 4,861 proteins from 16 locations, the best model shows a Matthews correlation coefficient of 0.872 and is thus superior to multiple conventional methods.

11,178 views

56 citations

Schematic illustration of the framework for predicting microbial functions using HOPE. (A) 16s rRNA sequencing reads from a microbial community are adopted for network construction. (B) Pipeline for constructing microbial networks. OTUs are binned by clustering reads from the same source population. Then, the abundance matrix that describes the relative abundance of OTUs in every microbiota sample is calculated. Pairwise scores between OTUs are then computed gaining the correlation matrix, and OTU pairs with correlation score over the threshold are connected by an edge. Gray areas in the correlation matrix indicate similarity of OTUs. Finally, the whole microbial community is visualized as a network wherein nodes represent OTUs, and edges represent the correlation between them. (C) Embedding representations of each OTU via the HOPE algorithm. (D) Function prediction matrix of OTUs. Different colors indicate different KO functions.

Original Research

18 January 2021

Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding

Yujie Hou

, 3 more and

Ying Wang

2,674 views

6 citations

Workflow of the hybrid combination of the MKL model with the mRMR feature selection method to integrate five types of molecular data for the prognosis of breast cancer. (1) The most N-informative features were separately selected using the mRMR method for each type of data in the learning dataset; (2) SimpleMKL with 10-fold cross-validation was deployed on the learning dataset for breast cancer prognosis to train an optimal model; and (3) the prediction model on learning dataset and the validation dataset were evaluated.

Methods

18 January 2021

Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods

Zongzhen He

, 2 more and

Yuanyuan Zhang

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.

5,387 views

23 citations

Original Research

15 January 2021

Identification of Common Genes and Pathways in Eight Fibrosis Diseases

Chang Gu

, 6 more and

Tao Huang

The number of overlapped genes among the eight fibrotic diseases. A circular plot illustrating all possible intersections and the corresponding statistics. The eight circles from inside to outside represent the eight fibrotic diseases (1, eye fibrosis; 2, heart fibrosis; 3, hepatic fibrosis; 4, intestinal fibrosis; 5, lung fibrosis; 6, pancreas fibrosis; 7, renal fibrosis; and 8, skin fibrosis), respectively. The height of the bars in the outer layer is proportional to the intersection sizes, as indicated by the numbers on the top of the bars. The color intensity of the bars represents the P-value significance of the intersections.

Acute and chronic inflammation often leads to fibrosis, which is also the common and final pathological outcome of chronic inflammatory diseases. To explore the common genes and pathogenic pathways among different fibrotic diseases, we collected all the reported genes of the eight fibrotic diseases: eye fibrosis, heart fibrosis, hepatic fibrosis, intestinal fibrosis, lung fibrosis, pancreas fibrosis, renal fibrosis, and skin fibrosis. We calculated the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment scores of all fibrotic disease genes. Each gene was encoded using KEGG and GO enrichment scores, which reflected how much a gene can affect this function. For each fibrotic disease, by comparing the KEGG and GO enrichment scores between reported disease genes and other genes using the Monte Carlo feature selection (MCFS) method, the key KEGG and GO features were identified. We compared the gene overlaps among eight fibrotic diseases and connective tissue growth factor (CTGF) was finally identified as the common key molecule. The key KEGG and GO features of the eight fibrotic diseases were all screened by MCFS method. Moreover, we interestingly found overlaps of pathways between renal fibrosis and skin fibrosis, such as GO:1901890-positive regulation of cell junction assembly, as well as common regulatory genes, such as CTGF, which is the key molecule regulating fibrogenesis. We hope to offer a new insight into the cellular and molecular mechanisms underlying fibrosis and therefore help leading to the development of new drugs, which specifically delay or even improve the symptoms of fibrosis.

13,586 views

38 citations