AUTHOR=Zhu Siyi , Bing Jiaxin , Min Xiaoping , Lin Chen , Zeng Xiangxiang TITLE=Prediction of Drug–Gene Interaction by Using Metapath2vec JOURNAL=Frontiers in Genetics VOLUME=9 YEAR=2018 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2018.00248 DOI=10.3389/fgene.2018.00248 ISSN=1664-8021 ABSTRACT=

Heterogeneous information networks (HINs) currently play an important role in daily life. HINs are applied in many fields, such as science research, e-commerce, recommendation systems, and bioinformatics. Particularly, HINs have been used in biomedical research. Algorithms have been proposed to calculate the correlations between drugs and targets and between diseases and genes. Recently, the interaction between drugs and human genes has become an important subject in the research on drug efficacy and human genomics. In previous studies, numerous prediction methods using machine learning and statistical prediction models were proposed to explore this interaction on the biological network. In the current work, we introduce a representation learning method into the biological heterogeneous network and use the representation learning models metapath2vec and metapath2vec++ on our dataset. We combine the adverse drug reaction (ADR) data in the drug–gene network with causal relationship between drugs and ADRs. This article first presents an analysis of the importance of predicting drug–gene relationships and discusses the existing prediction methods. Second, the skip-gram model commonly used in representation learning for natural language processing tasks is explained. Third, the metapath2vec and metapath2vec++ models for the example of drug–gene-ADR network are described. Next, the kernelized Bayesian matrix factorization algorithm is used to complete the prediction. Finally, the experimental results of both models are compared with Katz, CATAPULT, and matrix factorization, the prediction visualized using the receiver operating characteristic curves are presented, and the area under the receiver operating characteristic values for three varying algorithm parameters are calculated.