Drug-induced cardiotoxicity is a common side effect of drugs in clinical use or under postmarket surveillance and is commonly due to off-target interactions with the cardiac human-ether-a-go-go-related (hERG) potassium channel. Therefore, prioritizing drug candidates based on their hERG blocking potential is a mandatory step in the early preclinical stage of a drug discovery program. Herein, we trained and properly validated 30 ligand-based classifiers of hERG-related cardiotoxicity based on 7,963 curated compounds extracted by the freely accessible repository ChEMBL (version 25). Different machine learning algorithms were tested, namely, random forest, K-nearest neighbors, gradient boosting, extreme gradient boosting, multilayer perceptron, and support vector machine. The application of 1) the best practices for data curation, 2) the feature selection method VSURF, and 3) the synthetic minority oversampling technique (SMOTE) to properly handle the unbalanced data, allowed for the development of highly predictive models (BAMAX = 0.91, AUCMAX = 0.95). Remarkably, the undertaken temporal validation approach not only supported the predictivity of the herein presented classifiers but also suggested their ability to outperform those models commonly used in the literature. From a more methodological point of view, the study put forward a new computational workflow, freely available in the GitHub repository (https://github.com/PDelre93/hERG-QSAR), as valuable for building highly predictive models of hERG-mediated cardiotoxicity.
The early prediction of drug adverse effects is of great interest to pharmaceutical research, as toxicity is one of the leading reasons for drug attrition. Understanding the cell signaling and regulatory pathways affected by a drug candidate is crucial to the study of drug toxicity. In this study, we present a computational technique that employs the propagation of drug-protein interactions to connect compounds to biological pathways. Target profiles for drugs were built by retrieving drug target proteins from public repositories such as ChEMBL, DrugBank, IUPHAR, PharmGKB, and TTD. Subsequent enrichment test of the protein pool using Reactome revealed potential pathways affected by the drugs. Furthermore, an optional tissue filter utilizing the Human Protein Atlas was applied to identify tissue-specific pathways. The analysis pipeline was implemented in an open-source KNIME workflow called Path4Drug to allow automated data retrieval and reconstruction for any given drug present in ChEMBL. The pipeline was applied to withdrawn drugs and cardio- and hepatotoxic drugs with black box warnings to identify biochemical pathways they affect and to find pathways that can be potentially connected to the toxic events. To complement this approach, drugs used in cardiac therapy without any record of toxicity were also analyzed. The results provide already known associations as well as a large amount of additional potential connections. Consequently, our approach can link drugs to biological pathways by leveraging big data available in public resources. The developed tool is openly available and modifiable to support other systems biology analyses.
Random forest, support vector machine, logistic regression, neural networks and k-nearest neighbor (lazar) algorithms, were applied to a new Salmonella mutagenicity dataset with 8,290 unique chemical structures utilizing MolPrint2D and Chemistry Development Kit (CDK) descriptors. Crossvalidation accuracies of all investigated models ranged from 80 to 85% which is comparable with the interlaboratory variability of the Salmonella mutagenicity assay. Pyrrolizidine alkaloid predictions showed a clear distinction between chemical groups, where otonecines had the highest proportion of positive mutagenicity predictions and monoesters the lowest.