AUTHOR=Pogodin Pavel V. , Lagunin Alexey A. , Rudik Anastasia V. , Filimonov Dmitry A. , Druzhilovskiy Dmitry S. , Nicklaus Mark C. , Poroikov Vladimir V. TITLE=How to Achieve Better Results Using PASS-Based Virtual Screening: Case Study for Kinase Inhibitors JOURNAL=Frontiers in Chemistry VOLUME=6 YEAR=2018 URL=https://www.frontiersin.org/journals/chemistry/articles/10.3389/fchem.2018.00133 DOI=10.3389/fchem.2018.00133 ISSN=2296-2646 ABSTRACT=

Discovery of new pharmaceutical substances is currently boosted by the possibility of utilization of the Synthetically Accessible Virtual Inventory (SAVI) library, which includes about 283 million molecules, each annotated with a proposed synthetic one-step route from commercially available starting materials. The SAVI database is well-suited for ligand-based methods of virtual screening to select molecules for experimental testing. In this study, we compare the performance of three approaches for the analysis of structure-activity relationships that differ in their criteria for selecting of “active” and “inactive” compounds included in the training sets. PASS (Prediction of Activity Spectra for Substances), which is based on a modified Naïve Bayes algorithm, was applied since it had been shown to be robust and to provide good predictions of many biological activities based on just the structural formula of a compound even if the information in the training set is incomplete. We used different subsets of kinase inhibitors for this case study because many data are currently available on this important class of drug-like molecules. Based on the subsets of kinase inhibitors extracted from the ChEMBL 20 database we performed the PASS training, and then applied the model to ChEMBL 23 compounds not yet present in ChEMBL 20 to identify novel kinase inhibitors. As one may expect, the best prediction accuracy was obtained if only the experimentally confirmed active and inactive compounds for distinct kinases in the training procedure were used. However, for some kinases, reasonable results were obtained even if we used merged training sets, in which we designated as inactives the compounds not tested against the particular kinase. Thus, depending on the availability of data for a particular biological activity, one may choose the first or the second approach for creating ligand-based computational tools to achieve the best possible results in virtual screening.