AUTHOR=Ikeda Kazuyoshi , Doi Takuo , Ikeda Masami , Tomii Kentaro TITLE=PreBINDS: An Interactive Web Tool to Create Appropriate Datasets for Predicting Compound–Protein Interactions JOURNAL=Frontiers in Molecular Biosciences VOLUME=8 YEAR=2021 URL=https://www.frontiersin.org/journals/molecular-biosciences/articles/10.3389/fmolb.2021.758480 DOI=10.3389/fmolb.2021.758480 ISSN=2296-889X ABSTRACT=

Given the abundant computational resources and the huge amount of data of compound–protein interactions (CPIs), constructing appropriate datasets for learning and evaluating prediction models for CPIs is not always easy. For this study, we have developed a web server to facilitate the development and evaluation of prediction models by providing an appropriate dataset according to the task. Our web server provides an environment and dataset that aid model developers and evaluators in obtaining a suitable dataset for both proteins and compounds, in addition to attributes necessary for deep learning. With the web server interface, users can customize the CPI dataset derived from ChEMBL by setting positive and negative thresholds to be adjusted according to the user’s definitions. We have also implemented a function for graphic display of the distribution of activity values in the dataset as a histogram to set appropriate thresholds for positive and negative examples. These functions enable effective development and evaluation of models. Furthermore, users can prepare their task-specific datasets by selecting a set of target proteins based on various criteria such as Pfam families, ChEMBL’s classification, and sequence similarities. The accuracy and efficiency of in silico screening and drug design using machine learning including deep learning can therefore be improved by facilitating access to an appropriate dataset prepared using our web server (https://binds.lifematics.work/).