AUTHOR=Qiu Wangren , Lv Zhe , Hong Yaoqiu , Jia Jianhua , Xiao Xuan TITLE=BOW-GBDT: A GBDT Classifier Combining With Artificial Neural Network for Identifying GPCR–Drug Interaction Based on Wordbook Learning From Sequences JOURNAL=Frontiers in Cell and Developmental Biology VOLUME=8 YEAR=2021 URL=https://www.frontiersin.org/journals/cell-and-developmental-biology/articles/10.3389/fcell.2020.623858 DOI=10.3389/fcell.2020.623858 ISSN=2296-634X ABSTRACT=

Background: As a class of membrane protein receptors, G protein-coupled receptors (GPCRs) are very important for cells to complete normal life function and have been proven to be a major drug target for widespread clinical application. Hence, it is of great significance to find GPCR targets that interact with drugs in the process of drug development. However, identifying the interaction of the GPCR–drug pairs by experimental methods is very expensive and time-consuming on a large scale. As more and more database about GPCR–drug pairs are opened, it is viable to develop machine learning models to accurately predict whether there is an interaction existing in a GPCR–drug pair.

Methods: In this paper, the proposed model aims to improve the accuracy of predicting the interactions of GPCR–drug pairs. For GPCRs, the work extracts protein sequence features based on a novel bag-of-words (BOW) model improved with weighted Silhouette Coefficient and has been confirmed that it can extract more pattern information and limit the dimension of feature. For drug molecules, discrete wavelet transform (DWT) is used to extract features from the original molecular fingerprints. Subsequently, the above-mentioned two types of features are contacted, and SMOTE algorithm is selected to balance the training dataset. Then, artificial neural network is used to extract features further. Finally, a gradient boosting decision tree (GBDT) model is trained with the selected features. In this paper, the proposed model is named as BOW-GBDT.

Results: D92M and Check390 are selected for testing BOW-GBDT. D92M is used for a cross-validation dataset which contains 635 interactive GPCR–drug pairs and 1,225 non-interactive pairs. Check390 is used for an independent test dataset which consists of 130 interactive GPCR–drug pairs and 260 non-interactive GPCR–drug pairs, and each element in Check390 cannot be found in D92M. According to the results, the proposed model has a better performance in generation ability compared with the existing machine learning models.

Conclusion: The proposed predictor improves the accuracy of the interactions of GPCR–drug pairs. In order to facilitate more researchers to use the BOW-GBDT, the predictor has been settled into a brand-new server, which is available at http://www.jci-bioinfo.cn/bowgbdt.